Commit 497f2066 authored by Murukesh Mohanan's avatar Murukesh Mohanan

Split up the updates

parent 0846c784
...@@ -186,127 +186,14 @@ service). ...@@ -186,127 +186,14 @@ service).
<!-- section --> <!-- section -->
## Update (August 2023): systemd 254 and `PrivateMounts` ## Updates
An earlier version of this post used bind mounts (`mount --bind /proc/self/ns/net /var/run/netns/vpn`) instead of Since the time this post was originally published, I have updated the unit files above twice. Details on the updates can
symbolic links (`ln -sf /proc/self/ns/net /var/run/netns/vpn`) to name the private namespace. The `mount` method created be found in these follow-up posts:
two complications:
1. The `/var/run/netns/vpn` had to exist (the old unit ran `ip netns add vpn` to create yet another namespace). 1. [Network Namespaces and systemd Private Mounts]({% post_url 2023-08-26-netns-systemd %})
2. Mounts are problematic with `PrivateMounts=yes`. 2. [Naming Network Namespaces in systemd Private Mounts]({% post_url 2024-03-25-netns-systemd-ii %})
The latter only became a problem with [systemd's v254 release][systemd-v254]:
> * PrivateNetwork=yes and NetworkNamespacePath= now imply
> PrivateMounts=yes unless PrivateMounts=no is explicitly specified.
Since I run Arch Linux, I get the latest versions of systemd, and one day this setup just started failing, because:
> File system namespaces are set up individually for each process forked off by the service manager. Mounts established
> in the namespace of the process created by `ExecStartPre=` will hence be cleaned up automatically as soon as that
> process exits and will not be available to subsequent processes forked off for `ExecStart=` (and similar applies to
> the various other commands configured for units).
>
> &mdash; _[`man 5 systemd.exec`][systemd.exec]_, `PrivateMounts=`
This meant that while I had a `vpn` netns created, the commands that I was running assuming that it was in the `vpn`
netns were actually being run in the unit's private namespace. So, when the VPN interface started up, there was nothing
useful in the netns. No `veth` devices, no route to the internet.
I solved this problem by:
1. Using symbolic links to name the namespace, and
2. Using `JoinsNamespaceOf` instead of using name for the namespace in the other units.
This actually improved and simplified the setup, in addition to not having to create the `vpn` netns unnecessarily:
1. `JoinsNamespaceOf` directly expresses the relationship between the VPN service and the netns service.
2. Any other services that should be in the namespace of the VPN service can now use
`JoinsNamespaceOf=openvpn-client@<whatever>`, instead of having to join some arbitrary namespace.
This creates a nice tree of namespace relationships. The `vpn` name for the netns now exists only for convenience, for
when (if) I need to run `ip netns exec` to do something in it.
(Note that each unit that uses `JoinsNamespaceOf` must also have `PrivateNetwork` enabled.)
<!-- section -->
## Update (March 2024): `PrivateMounts` and naming them
In my previous update, I used a symbolic link to set up the name for new namespace, like so:
```
ln -sf /proc/self/ns/net /var/run/netns/vpn
```
Months later, this resulted in a face-meet-palm moment as I realized that since `/proc/self` changes for every process,
this link was meaningless. It didn't prevent the rest of the setup from working fine, of course, as none of that uses
the `vpn` name to refer to the netns. However, if I wanted to run a command inside, then I had a problem. I embarked
upon yet another journey to see how I could name if `/proc/self` wasn't option. A couple of ways came to mind:
```
/bin/sh -c 'ln -sf /proc/$$/ns/net /var/run/netns/vpn'
/bin/sh -c 'ip netns attach vpn $$'
```
The first option didn't work. Since the process in question died immediately, the link would become invalid. The second
option, in which `ip netns attach` does [`mount` shenanigans], seemed like it should work. Apparently, it remounts
`/var/run/netns` as a bind-mount to itself, making it shared in the process, so that mounts in it are propagated to
child namespaces. Then it mounts the netns in a subdirectory there, so it can be accessed independently of any process.
However, once systemd starts our services in private namespaces, it is too late - even using [the `+` prefix][table 2]
to elevate our commands beyond these namespaces doesn't seem to work, and the mounts aren't propagated correctly.
So this should be something that's done before our services start. I first tried using a separate `netns-default`
service just for naming the original netns. Once the initial setup was done, one would think that `ip netns attach`
should then start working even in restricted services if run with elevated privileges. _Quelle surprise_, `ip` tries to
do the `mount` shenanigans all over again and fails:
```
ip[471]: mount --make-shared /var/run/netns failed: Operation not permitted
```
Then I fell back to using this in the VPN service override:
```
ExecStartPost=/usr/bin/ln -sf /proc/${MAINPID}/ns/net /var/run/netns/vpn
```
This way, the netns will be accessible at least as long as the VPN process stays alive. After thinking a bit more, I
decided to go for a _third_ service:
{% highlight shell linenos %}
# /etc/systemd/system/netns-vpn-post.service
[Unit]
Description=VPN network namespace (post)
ConditionPathExists=!/var/run/netns/vpn
After=<vpn>.service
[Install]
WantedBy=<vpn>.service
[Service]
Type=oneshot
RemainAfterExit=yes
# Hat-tip to A.B. here: https://serverfault.com/a/1097323/229499
ExecStartPre=:/bin/bash -c 'declare $(systemctl show --property MainPID <vpn>.service); ip netns attach vpn $MainPID'
{% endhighlight %}
This service doesn't have to deal with private namespaces, and just sets up the name using `ip netns attach`. Note the
`:` at the start of the command so that systemd leaves `$` alone. Now everything looks nice:
```
% ip netns list
default
vpn (id: 0)
```
In the end, though, I went with just disabling `PrivateMounts` for the netns setup service, for which it doesn't really
matter, and running the `ip netns attach` commands there. Having `PrivateMounts` for the services run in that service
might be fine, but for this one, it really was more trouble than it was worth.
[tb-linux]: https://www.tunnelbear.com/blog/linux_support/ "TunnelBear Befriends Penguins with Limited Linux Support" [tb-linux]: https://www.tunnelbear.com/blog/linux_support/ "TunnelBear Befriends Penguins with Limited Linux Support"
[`veth`]: https://man7.org/linux/man-pages/man4/veth.4.html [`veth`]: https://man7.org/linux/man-pages/man4/veth.4.html
[systemd-v254]: https://github.com/systemd/systemd/releases/tag/v254
[systemd.exec]: https://www.freedesktop.org/software/systemd/man/systemd.exec.html#PrivateMounts=
[`mount` shenanigans]: https://7bits.nl/journal/posts/what-does-ip-netns-add-actually-do/ "What does ip netns add actually do? - Peter van Dijk"
[table 2]: https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#id-1.9.8 "man systemd.service — Table 2. Special executable prefixes"
---
layout: post
title: 'Private Mounts in systemd and netns'
tags: [tech, linux]
description: Fixing breakage due to systemd updates
---
My [previous post][prev] originally used bind mounts (`mount --bind /proc/self/ns/net /var/run/netns/vpn`) instead of
symbolic links (`ln -sf /proc/self/ns/net /var/run/netns/vpn`) to name the private namespace. The `mount` method created
two complications:
1. The `/var/run/netns/vpn` had to exist (the old unit ran `ip netns add vpn` to create yet another namespace).
2. Mounts are problematic with `PrivateMounts=yes`.
The latter only became a problem with [systemd's v254 release][systemd-v254]:
> * PrivateNetwork=yes and NetworkNamespacePath= now imply
> PrivateMounts=yes unless PrivateMounts=no is explicitly specified.
Since I run Arch Linux, I get the latest versions of systemd, and one day this setup just started failing, because:
> File system namespaces are set up individually for each process forked off by the service manager. Mounts established
> in the namespace of the process created by `ExecStartPre=` will hence be cleaned up automatically as soon as that
> process exits and will not be available to subsequent processes forked off for `ExecStart=` (and similar applies to
> the various other commands configured for units).
>
> &mdash; _[`man 5 systemd.exec`][systemd.exec]_, `PrivateMounts=`
This meant that while I had a `vpn` netns created, the commands that I was running assuming that it was in the `vpn`
netns were actually being run in the unit's private namespace. So, when the VPN interface started up, there was nothing
useful in the netns. No `veth` devices, no route to the internet.
I solved this problem by:
1. Using symbolic links to name the namespace, and
2. Using `JoinsNamespaceOf` instead of using name for the namespace in the other units.
This actually improved and simplified the setup, in addition to not having to create the `vpn` netns unnecessarily:
1. `JoinsNamespaceOf` directly expresses the relationship between the VPN service and the netns service.
2. Any other services that should be in the namespace of the VPN service can now use
`JoinsNamespaceOf=openvpn-client@<whatever>`, instead of having to join some arbitrary namespace.
This creates a nice tree of namespace relationships. The `vpn` name for the netns now exists only for convenience, for
when (if) I need to run `ip netns exec` to do something in it.
(Note that each unit that uses `JoinsNamespaceOf` must also have `PrivateNetwork` enabled.)
[systemd-v254]: https://github.com/systemd/systemd/releases/tag/v254
[systemd.exec]: https://www.freedesktop.org/software/systemd/man/systemd.exec.html#PrivateMounts=
[prev]: {% post_url 2020-12-03-poking-pi-ii %} "VPNs and Network Namespaces"
---
layout: post
title: 'Naming a netns with systemd Private Mounts'
tags: [tech, linux]
description: Wrestling with private mounts to name a netns
---
In [my previous update][prev], I used a symbolic link to set up the name for new namespace, like so:
```
ln -sf /proc/self/ns/net /var/run/netns/vpn
```
Months later, this resulted in a face-meet-palm moment as I realized that since `/proc/self` changes for every process,
this link was meaningless. It didn't prevent the rest of the setup from working fine, of course, as none of that uses
the `vpn` name to refer to the netns. However, if I wanted to run a command inside, then I had a problem. I embarked
upon yet another journey to see how I could name if `/proc/self` wasn't option. A couple of ways came to mind:
```
/bin/sh -c 'ln -sf /proc/$$/ns/net /var/run/netns/vpn'
/bin/sh -c 'ip netns attach vpn $$'
```
The first option didn't work. Since the process in question died immediately, the link would become invalid. The second
option, in which `ip netns attach` does [`mount` shenanigans], seemed like it should work. Apparently, it remounts
`/var/run/netns` as a bind-mount to itself, making it shared in the process, so that mounts in it are propagated to
child namespaces. Then it mounts the netns in a subdirectory there, so it can be accessed independently of any process.
However, once systemd starts our services in private namespaces, it is too late - even using [the `+` prefix][table 2]
to elevate our commands beyond these namespaces doesn't seem to work, and the mounts aren't propagated correctly.
So this should be something that's done before our services start. I first tried using a separate `netns-default`
service just for naming the original netns. Once the initial setup was done, one would think that `ip netns attach`
should then start working even in restricted services if run with elevated privileges. _Quelle surprise_, `ip` tries to
do the `mount` shenanigans all over again and fails:
```
ip[471]: mount --make-shared /var/run/netns failed: Operation not permitted
```
Then I fell back to using this in the VPN service override:
```
ExecStartPost=/usr/bin/ln -sf /proc/${MAINPID}/ns/net /var/run/netns/vpn
```
This way, the netns will be accessible at least as long as the VPN process stays alive. After thinking a bit more, I
decided to go for a _third_ service:
{% highlight shell linenos %}
# /etc/systemd/system/netns-vpn-post.service
[Unit]
Description=VPN network namespace (post)
ConditionPathExists=!/var/run/netns/vpn
After=<vpn>.service
[Install]
WantedBy=<vpn>.service
[Service]
Type=oneshot
RemainAfterExit=yes
# Hat-tip to A.B. here: https://serverfault.com/a/1097323/229499
ExecStartPre=:/bin/bash -c 'declare $(systemctl show --property MainPID <vpn>.service); ip netns attach vpn $MainPID'
{% endhighlight %}
This service doesn't have to deal with private namespaces, and just sets up the name using `ip netns attach`. Note the
`:` at the start of the command so that systemd leaves `$` alone. Now everything looks nice:
```
% ip netns list
default
vpn (id: 0)
```
In the end, though, I went with just disabling `PrivateMounts` for the netns setup service, for which it doesn't really
matter, and running the `ip netns attach` commands there. Having `PrivateMounts` for the services run in that service
might be fine, but for this one, it really was more trouble than it was worth.
[`mount` shenanigans]: https://7bits.nl/journal/posts/what-does-ip-netns-add-actually-do/ "What does ip netns add actually do? - Peter van Dijk"
[table 2]: https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#id-1.9.8 "man systemd.service — Table 2. Special executable prefixes"
[prev]: {% post_url 2023-08-26-netns-systemd %} "Private Mounts in systemd and netns"
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment