my tech blog » systemd

Un-ignore /usr/lib/systemd/ in .gitignore with git repo on root filesystem

eli — Tue, 23 Dec 2025 14:27:35 +0000

Actually, this is about un-ignoring any subdirectory that is grandchild to an ignored directory.

Running Linux Mint 22.2 (based upon Ubuntu 24.04), and having a git repository on root filesystem to keep track of the computer’s configuration, the vast majority of directories are ignored. One of the is /lib, however /lib/systemd/ should not be ignored, as it contains crucial files for the system’s configuration.

On other distributions, the relevant part in .gitignore usually goes:

[ ... ]
bin/
boot/
dev/
home/
lib/*
!lib/systemd/
lib64/
lib32/
libx32/
lost+found/
media/
mnt/
opt/
proc/
root/
run/
sbin/
[ ... ]

So lib/ isn’t ignored as a directory, but all its content, including subdirectories is. That allows for un-ignoring lib/systemd/ on the following row. That’s why lib/ isn’t ignore-listed like the other ones.

But on Linux Mint 22.2, /lib is a symbolic link to /usr/lib. And since git treats a symbolic link just like a file, /lib/systemd/ is treated as /usr/lib/systemd. Ignoring /lib as a directory has no effect, and un-ignoring /lib/systemd has no effect, because to git, this directory doesn’t even exist.

So go

$ man gitignore

and try to figure out what to do. It’s quite difficult actually, but it boils down to this:

usr/*
!usr/lib/
usr/lib/*
!usr/lib/systemd/

It’s a bit tangled, but the point is that /usr/lib is un-ignored, then all its files are ignored, and then /usr/lib/systemd is un-ignored.

The only good part about this solution is that it works.

Measuring how much RAM a Linux service eats

eli — Fri, 13 Dec 2024 17:20:02 +0000

Introduction

Motivation: I wanted to move a service to another server that is dedicated only to that service. But how much RAM does this new server need? RAM is $$$, so too much is a waste of money, too little means problems.

The method is to run the service and expose it to a scenario that causes it to consume RAM. And then look at the maximal consumption.

This can be done with “top” and similar programs, but these show the current use. I needed the maximal RAM use. Besides, a service may spread out its RAM consumption across several processes. It’s the cumulative consumption that is interesting.

The appealing solution is to use the fact that systemd creates a cgroup for the service. The answer hence lies in the RAM consumption of the cgroup as a whole. It’s also possible to create a dedicated cgroup and run a program within that one, as shown in another post of mine.

This method is somewhat crude, because this memory consumption includes disk cache as well. In other words, this method shows how much RAM is consumed when there’s plenty of memory, and hence when there’s no pressure to reclaim any RAM. Therefore, if the service runs on a server with less RAM (or the service’s RAM consumption is limited in the systemd unit file), it’s more than possible that everything will work just fine. It might run somewhat slower due to disk access that was previously substituted by the cache.

So using a server with as much memory as measured by the test described below (plus some extra for the OS itself) will result in quick execution, but it might be OK to go for less RAM. A tight RAM limit will cause a lot of disk activity at first, and only afterwards will processes be killed by the OOM killer.

Where the information is

All said in this post relates to Linux kernel v4.15. Things are different with later kernels, not necessarily for the better.

There are in principle two versions of the interface with cgroup’s memory management: First, the one I won’t use, which is cgroup-v2 (or maybe this doc for v2 is better?). The sysfs files for this interface for a service named “theservice” reside in /sys/fs/cgroup/unified/system.slice/theservice.service.

I shall be working with the memory control of cgroup-v1. The sysfs files in question are in /sys/fs/cgroup/memory/system.slice/theservice.service/.

If /sys/fs/cgroup/memory/ doesn’t exist, it might be necessary to mount it explicitly. Also, if system.slice doesn’t exist under /sys/fs/cgroup/memory/ it’s most likely because systemd’s memory accounting is not in action. This can be enabled globally, or by setting MemoryAccounting=true on the service’s systemd unit (or maybe any unit?).

Speaking of which, it might be a good idea to set MemoryMax in the service’s systemd unit in order to see what happens when the RAM is really restricted. Or change the limit dynamically, as shown below.

And there’s always the alternative of creating a separate cgroup and running the service in that group. I’ll refer to my own blog post again.

Getting the info

All files mentioned below are in /sys/fs/cgroup/unified/system.slice/theservice.service/ (assuming that the systemd service in question is theservice).

The maximal memory used: memory.max_usage_in_bytes. As it’s name implies this is the maximal amount of RAM used, measured in bytes. This includes disk cache, so the number is higher than what appears in “top”.

The memory currently used: memory.usage_in_bytes.

For more detailed info about memory use: memory.stat. For example:

$ cat memory.stat 
cache 1138688
rss 4268224512
rss_huge 0
shmem 0
mapped_file 516096
dirty 0
writeback 0
pgpgin 36038063
pgpgout 34995738
pgfault 21217095
pgmajfault 176307
inactive_anon 0
active_anon 4268224512
inactive_file 581632
active_file 401408
unevictable 0
hierarchical_memory_limit 4294967296
total_cache 1138688
total_rss 4268224512
total_rss_huge 0
total_shmem 0
total_mapped_file 516096
total_dirty 0
total_writeback 0
total_pgpgin 36038063
total_pgpgout 34995738
total_pgfault 21217095
total_pgmajfault 176307
total_inactive_anon 0
total_active_anon 4268224512
total_inactive_file 581632
total_active_file 401408
total_unevictable 0

Note the “cache” part at the beginning. It’s no coincidence that it’s first. That’s the most important part: How much can be reclaimed just by flushing the cache.

On a 6.1.0 kernel, I’ve seen memory.peak and memory.current instead of memory.max_usage_in_bytes and memory.usage_in_bytes. memory.peak wasn’t writable however (neither in its permissions nor was it possible to write to it), so it wasn’t possible to reset the max level.

Setting memory limits

It’s possible to set memory limits in systemd’s unit file, but it can be more convenient to do this on the fly. In order to set the hard limit of memory use to 40 MiB, go (as root)

# echo 40M > memory.limit_in_bytes

To disable the limit, pick an unreasonably high number, e.g.

# echo 100G > memory.limit_in_bytes

Note that restarting the systemd service has no effect on these parameters (unless a memory limit is required in the unit file). The cgroup directory remains intact.

Resetting between tests

To reset the maximal value that has been recorded for RAM use (as root)

# echo 0 > memory.max_usage_in_bytes

But to really want to start from fresh, all disk cache needs to be cleared as well. The sledge-hammer way is going

# echo 1 > /proc/sys/vm/drop_caches

This frees the page caches system-wide, so everything running on the computer will need to re-read things again from the disk. There’s a slight and temporary global impact on the performance. On a GUI desktop, it gets a bit slow for a while.

A message like this will appear in the kernel log in response:

bash (43262): drop_caches: 1

This is perfectly fine, and indicates no error.

Alternatively, set a low limit for the RAM usage with memory.limit_in_bytes, as shown above. This impacts the cgroup only, forcing a reclaim of disk cache.

Two things that have no effect:

Reducing the soft limit (memory.soft_limit_in_bytes). This limit is relevant only when the system is in a shortage of RAM overall. Otherwise, it does nothing.
Restarting the service with systemd. It wouldn’t make any sense to flush a disk cache when restarting a service.

It’s of course a good idea to get rid of the disk cache before clearing memory.max_usage_in_bytes, so the max value starts without taking the disk cache into account.

Migrating an OpenVZ container to KVM

eli — Fri, 12 Jul 2024 15:11:13 +0000

Introduction

My Debian 8-based web server had been running for several years as an OpenVZ container, when the web host told me that containers are phased out, and it’s time to move on to a KVM.

This is an opportunity to upgrade to a newer distribution, most of you would say, but if a machine works flawlessly for a long period of time, I’m very reluctant to change anything. Don’t touch a stable system. It just happened to have an uptime of 426 days, and the last time this server caused me trouble was way before that.

So the question is if it’s possible to convert a container into a KVM machine, just by copying the filesystem. After all, what’s the difference if /sbin/init (systemd) is kicked off as a plain process inside a container or if the kernel does the same thing?

The answer is yes-ish, this manipulation is possible, but it requires some adjustments.

These are my notes and action items while I found my way to get it done. Everything below is very specific to my own slightly bizarre case, and at times I ended up carrying out tasks in a different order than as listed here. But this can be useful for understanding what’s ahead.

By the way, the wisest thing I did throughout this process, was to go through the whole process on a KVM machine that I built on my own local computer. This virtual machine functioned as a mockup of the server to be installed. Not only did it make the trial and error much easier, but it also allowed me to test all kind of things after the real server was up and running without messing the real machine.

Faking Ubuntu 24.04 LTS

To make things even more interesting, I also wanted to push the next time I’ll be required to mess with the virtual machine as long as possible into the future. Put differently, I wanted to hide the fact that the machine runs on ancient software. There should not be a request to upgrade in the foreseeable future because the old system isn’t compatible with some future version of KVM.

So to the KVM hypervisor, my machine should feel like an Ubuntu 24.04, which was the latest server distribution offered at the time I did this trick. Which brings the question: What does the hypervisor see?

The KVM guest interfaces with its hypervisor in three ways:

With GRUB, which accesses the virtual disk.
Through the kernel, which interacts with the virtual hardware.
Through the guest’s DHCP client, which fetches the IP address, default gateway and DNS from the hypervisor’s dnsmasq.

Or so I hope. Maybe there’s some aspect I’m not aware of. It’s not like I’m such an expert in virtualization.

So the idea was that both GRUB and the kernel should be the same as in Ubuntu 24.04. This way, any KVM setting that works with this distribution will work with my machine. The Naphthalene smell from the user-space software underneath will not reach the hypervisor.

This presumption can turn out to be wrong, and the third item in the list above demonstrates that: The guest machine gets its IP address from the hypervisor through a DHCP request issued by systemd-networkd, which is part of systemd version 215. So the bluff is exposed. Will there be some kind of incompatibility between the old systemd’s DHCP client and some future hypervisor’s response?

Regarding this specific issue, I doubt there will be a problem, as DHCP is such a simple and well-established protocol. And even if that functionality broke, the IP address is fixed anyhow, so the virtual NIC can be configured statically.

But who knows, maybe there is some kind of interaction with systemd that I’m not aware of? Future will tell.

So it boils down to faking GRUB and using a recent kernel.

Solving the GRUB problem

Debian 8 comes with GRUB version 0.97. Could we call that GRUB 1? I can already imagine the answer to my support ticket saying “please upgrade your system, as our KVM hypervisor doesn’t support old versions of GRUB”.

So I need a new one.

Unfortunately, the common way to install GRUB is with a couple of hocus-pocus tools that do the work well in the usual scenario.

As it turns out, there are two parts that need to be installed: The first part consists of the GRUB binary on the boot partition (GRUB partition or EFI, pick your choice), plus several files (modules and other) in /boot/grub/. The second part is a script file, grub.cfg, which is a textual file that can be edited manually.

To make a long story short, I installed the distribution on a virtual machine with the same layout, and made a copy of the grub.cfg file that was created. I then edited this file directly to fit into the new machine. As for installing GRUB binary, I did this from a Live ISO Ubuntu 24.04, so it’s genuine and legit.

For the full and explained story, I’ve written a separate post.

Fitting a decent kernel

This way or another, a kernel and its modules must be added to the filesystem in order to convert it from a container to a KVM machine. This is the essential difference: With a container, one kernel runs all containers and gives them the illusion that they’re the only one. With KVM, the boot starts from the very beginning.

If there was something I didn’t worry about, it was the concept of running an ancient distribution with a very recent kernel. I have a lot of experience with compiling the hot-hot-latest-out kernel and run it with steam engine distributions, and very rarely have I seen any issue with that. The Linux kernel is backward compatible in a remarkable way.

My original idea was to grab the kernel image and the modules from a running installation of Ubuntu 24.04. However, the module format of this distro is incompatible with old Debian 8 (ZST compression seems to have been the crux), and as a result, no modules were loaded.

So I took config-6.8.0-36-generic from Ubuntu 24.04 and used it as the starting point for the .config file used for compiling the vanilla stable kernel with version v6.8.12.

And then there were a few modifications to .config:

“make oldconfig” asked a few questions and made some minor modifications, nothing apparently related.
Dropped kernel module compression (CONFIG_MODULE_COMPRESS_ZSTD off) and set kernel’s own compression to gzip. This was probably the reason the distribution’s modules didn’t load.
Some crypto stuff was disabled: CONFIG_INTEGRITY_PLATFORM_KEYRING, CONFIG_SYSTEM_BLACKLIST_KEYRING and CONFIG_INTEGRITY_MACHINE_KEYRING were dropped, same with CONFIG_LOAD_UEFI_KEYS and most important, CONFIG_SYSTEM_REVOCATION_KEYS was set to “”. Its previous value, “debian/canonical-revoked-certs.pem” made the compilation fail.
Dropped CONFIG_DRM_I915, which caused some weird compilation error.
After making a test run with the kernel, I also dropped CONFIG_UBSAN with everything that comes with it. UBSAN spat a lot of warning messages on mainstream drivers, and it’s really annoying. It’s still unclear to me why these warnings don’t appear on the distribution kernel. Maybe because a difference between compiler versions (the warnings stem from checks inserted by gcc).

The compilation took 32 minutes on a machine with 12 cores (6 hyperthreaded). By far, the longest and most difficult kernel compilation I can remember for a long time.

Based upon my own post, I created the Debian packages for the whole thing, using the bindeb-pkg make target.

That took additional 20 minutes, running on all cores. I used two of these packages in the installation of the KVM machine, as shown in the cookbook below.

Methodology

So the deal with my web host was like this: They started a KVM machine (with a different IP address, of course). I prepared this KVM machine, and when that was ready, I sent a support ticket asking for swapping the IP addresses. This way, the KVM machine became the new server, and the old container machine went to the junkyard.

As this machine involved a mail server and web sites with user content (comments to my blog, for example), I decided to stop the active server, copy “all data”, and restart the server only after the IP swap. In other words, the net result should be as if the same server had been shut down for an hour, and then restarted. No discontinuities.

As it turned out, everything that is related to the web server and email, including the logs of everything, are in /var/ and /home/. So I could therefore copy all files from the old server to the new one for the sake of setting it up, and verify that everything is smooth as a first stage.

Then I shut down the services and copied /var/ and /home/. And then came the IP swap.

This simple command is handy for checking which files have changed during the past week. The first finds the directories, and the second the plain files.

# find / -xdev -ctime -7 -type d | sort
# find / -xdev -ctime -7 -type f | sort

The purpose of the -xdev flag is to remain on one filesystem. Otherwise, a lot of files from /proc and such are printed out. If your system has several relevant filesystems, be sure to add them to “/” in this example.

The next few sections below are the cookbook I wrote for myself in order to get it done without messing around (and hence mess up).

In hindsight, I can say that except for dealing with GRUB and the kernel, most of the hassle had to with the NIC: Its name changed from venet0 to eth0, and it got its address through DHCP relatively late in the boot process. And that required some adaptations.

Preparing the virtual machine

Start the installation Ubuntu 24.04 LTS server edition (or whatever is available, it doesn’t matter much). Possible stop the installation as soon as files are being copied: The only purpose of this step is to partition the disk neatly, so that /dev/vda1 is a small partition for GRUB, and /dev/vda3 is the root filesystem (/dev/vda2 is a swap partition).
Start the KVM machine with a rescue image (preferable graphical or with sshd running). I went for Ubuntu 24.04 LTS server Live ISO (the best choice provided by my web host). See notes below on using Ubuntu’s server ISO as a rescue image.
Wipe the existing root filesystem, if such has been installed. I considered this necessary at the time, because the default inode size may be 256, and GRUB version 1 won’t play ball with that. But later on I decided on GRUB 2. Anyhow, I forced it to be 128 bytes, despite the warning that 128-byte inodes cannot handle dates beyond 2038 and are deprecated:
```
# mkfs.ext4 -I 128 /dev/vda3
```
And since I was at it, no automatic fsck check. Ever. It’s really annoying when you want to kick off the server quickly.
```
# tune2fs -c 0 -i 0 /dev/vda3
```

Mount new system as /mnt/new:

# mkdir /mnt/new
# mount /dev/vda3 /mnt/new

Copy the filesystem. On the OpenVZ machine:
```
# tar --one-file-system -cz / | nc -q 0 185.250.251.160 1234 > /dev/null
```
and the other side goes (run this before the command above):
```
# nc -l 1234 < /dev/null | time tar -C /mnt/new/ -xzv
```
This took about 30 minutes. The purpose of the “-q 0″ flag and those /dev/null redirections is merely to make nc quit when the tar finishes.
Or, doing the same from a backup tarball:
```
$ cat myserver-all-24.07.08-08.22.tar.gz | nc -q 0 -l 1234 > /dev/null
```
and the other side goes
```
# nc 10.1.1.3 1234 < /dev/null | time tar -C /mnt/new/ -xzv
```

Remove old /lib/modules and boot directory:

# rm -rf /mnt/new/lib/modules/ /mnt/new/boot/

Create /boot/grub and copy the grub.cfg file that I’ve prepared in advance to there. This separate post explains the logic behind doing it this way.
Install GRUB on the boot parition (this also adds a lot of files to /boot/grub/):
```
# grub-install --root-directory=/mnt/new /dev/vda
```

In order to work inside the chroot, some bind and tmpfs mounts are necessary:

# mount -o bind /dev /mnt/new/dev
# mount -o bind /sys /mnt/new/sys
# mount -t proc /proc /mnt/new/proc
# mount -t tmpfs tmpfs /mnt/new/tmp
# mount -t tmpfs tmpfs /mnt/new/run

Copy the two .deb files that contain the Linux kernel files to somewhere in /mnt/new/
Chroot into the new fs:
```
# chroot /mnt/new/
```
Check that /dev, /sys, /proc, /run and /tmp are as expected (mounted correctly).
Disable and stop these services: bind9, sendmail, cron.
This wins the prize for the oddest fix: Probably in relation to the OpenVZ container, the LSB modules_dep service is active, and it deletes all module files in /lib/modules on reboot. So make sure to never see it again. Just disabling it wasn’t good enough.
```
# systemctl mask modules_dep.service
```
Install the Linux kernel and its modules into /boot and /lib/modules:
```
# dpkg -i linux-image-6.8.12-myserver_6.8.12-myserver-2_amd64.deb
```

Also install the headers for compilation (why not?)

# dpkg -i linux-headers-6.8.12-myserver_6.8.12-myserver-2_amd64.deb

Add /etc/systemd/network/20-eth0.network
```
[Match]
Name=eth0

[Network]
DHCP=yes
```
The NIC was a given in a container, but now it has to be raised explicitly and the IP address possibly obtained from the hypervisor via DHCP, as I’ve done here.
Add the two following lines to /etc/sysctl.conf, in order to turn off IPv6:
```
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
```
Adjust the firewall rules, so that they don’t depend on the server having a specific IP address (because a temporary IP address will be used).
Add support for lspci (better do it now if something goes wrong after booting):
```
# apt install pciutils
```
Ban the evbug module, which is intended to generate debug message on input devices. Unfortunately, it floods the kernel log sometimes when the mouse goes over the virtual machine’s console window. So ditch it by adding /etc/modprobe.d/evbug-blacklist.conf having this single line:
```
blacklist evbug
```
Edit /etc/fstab. Remove everything, and leave only this row:
```
/dev/vda3 / ext4 defaults 0 1
```
Remove persistence udev rules, if such exist, at /etc/udev/rules.d. Oddly enough, there was nothing in this directory, not in the existing OpenVZ server and not in a regular Ubuntu 24.04 server installation.
Boot up the system from disk, and perform post-boot fixes as mentioned below.

Post-boot fixes

Verify that /tmp is indeed mounted as a tmpfs.
Disable (actually, mask) the automount service, which is useless and fails. This makes systemd’s status degraded, which is practically harmless, but confusing.
```
# systemctl mask proc-sys-fs-binfmt_misc.automount
```

Install the dbus service:

# apt install dbus

Not only is it the right thing to do on a Linux system, but it also silences this warning:

Cannot add dependency job for unit dbus.socket, ignoring: Unit dbus.socket failed to load: No such file or directory.

Enable login prompt on the default visible console (tty1) so that a prompt appears after all the boot messages:
```
# systemctl enable getty@tty1.service
```
The other tty’s got a login prompt when using Ctrl-Alt-Fn, but not the visible console. So this fixed it. Otherwise, one can be mislead into thinking that the boot process is stuck.
Optionally: Disable vzfifo service and remove /.vzfifo.

Just before the IP address swap

Reboot the openVZ server to make sure that it wakes up OK.
Change the openVZ server’s firewall, so works with a different IP address. Otherwise, it becomes unreachable after the IP swap.
Boot the target KVM machine in rescue mode. No need to set up the ssh server as all will be done through VNC.
On the KVM machine, mount new system as /mnt/new:
```
# mkdir /mnt/new
# mount /dev/vda3 /mnt/new
```

On the OpenVZ server, check for recently changed directories and files:

# find / -xdev -ctime -7 -type d | sort > recently-changed-dirs.txt
# find / -xdev -ctime -7 -type f | sort > recently-changed-files.txt

Verify that the changes are only in the places that are going to be updated. If not, consider if and how to update these other files.
Verify that the mail queue is empty, or let sendmail empty it if possible. Not a good idea to have something firing off as soon as sendmail resumes:
```
# mailq
```

Disable all services except sshd on the OpenVZ server:

# systemctl disable cron dovecot apache2 bind9 sendmail mysql xinetd

Run “mailq” again to verify that the mail queue is empty (unless there was a reason to leave a message there in the previous check).
Reboot OpenVZ server and verify that none of these is running. This is the point at which this machine is dismissed as a server, and the downtime clock begins ticking.
Verify that this server doesn’t listen to any ports except ssh, as an indication that all services are down:
```
# netstat -n -a | less
```
Repeat the check of recently changed files.
On KVM machine, remove /var and /home.
```
# rm -rf /mnt/new/var /mnt/new/home
```

Copy these parts:
On the KVM machine, using the VNC console, go

# nc -l 1234 < /dev/null | time tar -C /mnt/new/ -xzv

and on myserver:

# tar --one-file-system -cz /var /home | nc -q 0 185.250.251.160 1234 > /dev/null

Took 28 minutes.

Check that /mnt/new/tmp and /mnt/tmp/run are empty and remove whatever is found, if there’s something there. There’s no reason for anything to be there, and it would be weird if there was, given the way the filesystem was copied from the original machine. But if there are any files, it’s just confusing, as /tmp and /run are tmpfs on the running machine, so any files there will be invisible anyhow.
Reboot the KVM machine with a reboot command. It will stop anyhow for removing the CDROM.
Remove the KVM’s CDROM and continue the reboot normally.
Login to the KVM machine with ssh.
Check that all is OK: systemctl status as well as journalctl. Note that the apache, mysql and dovecot should be running now.
Power down both virtual machines.
Request an IP address swap. Let them do whatever they want with the IPv6 addresses, as they are ignored anyhow.

After IP address swap

Start the KVM server normally, and login normally through ssh.
Try to browse into the web sites: The web server should already be working properly (even though the DNS is off, but there’s a backup DNS).
Check journalctl and systemctl status.
Resume the original firewall rules and verify that the firewall works properly:
```
# systemctl restart netfilter-persistent
# iptables -vn -L
```

Start all services, and check status and journalctl again:

# systemctl start cron dovecot apache2 bind9 sendmail mysql xinetd

If all is fine, enable these services:

# systemctl enable cron dovecot apache2 bind9 sendmail mysql xinetd

Reboot (with reboot command), and check that all is fine.
In particular, send DNS queries directly to the server with dig, and also send an email to a foreign address (e.g. gmail). My web host blocked outgoing connections to port 25 on the new server, for example.
Delete ifcfg-venet0 and ifcfg-venet0:0 in /etc/sysconfig/network-scripts/, as they relate to the venet0 interface that exists only in the container machine. It’s just misleading to have it there.
Compare /etc/rc* and /etc/systemd with the situation before the transition in the git repo, to verify that everything is like it should be.

Check the server with nmap (run this from another machine):
```
$ nmap -v -A server
$ sudo nmap -v -sU server
```

And then the DNS didn’t work

I knew very well why I left plenty of time free for after the IP swap. Something will always go wrong after a maneuver like this, and this time was no different. And for some odd reason, it was the bind9 DNS that played two different kinds of pranks.

I noted immediately that the server didn’t answer to DNS queries. As it turned out, there were two apparently independent reasons for it.

The first was that when I re-enabled the bind9 service (after disabling it for the sake of moving), systemctl went for the SYSV scripts instead of its own. So I got:

# systemctl enable bind9
Synchronizing state for bind9.service with sysvinit using update-rc.d...
Executing /usr/sbin/update-rc.d bind9 defaults
insserv: warning: current start runlevel(s) (empty) of script `bind9' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `bind9' overrides LSB defaults (0 1 6).
Executing /usr/sbin/update-rc.d bind9 enable

This could have been harmless and gone unnoticed, had it not been that I’ve added a “-4″ flag to bind9′s command, or else it wouldn’t work. So by running the SYSV scripts, my change in /etc/systemd/system/bind9.service wasn’t in effect.

Solution: Delete all files related to bind9 in /etc/init.d/ and /etc/rc*.d/. Quite aggressive, but did the job.

Having that fixed, it still didn’t work. The problem now was that eth0 was configured through DHCP after the bind9 had begun running. As a result, the DNS didn’t listen to eth0.

I slapped myself for thinking about adding a “sleep” command before launching bind9, and went for the right way to do this. Namely:

$ cat /etc/systemd/system/bind9.service
[Unit]
Description=BIND Domain Name Server
Documentation=man:named(8)
After=network-online.target systemd-networkd-wait-online.service
Wants=network-online.target systemd-networkd-wait-online.service

[Service]
ExecStart=/usr/sbin/named -4 -f -u bind
ExecReload=/usr/sbin/rndc reload
ExecStop=/usr/sbin/rndc stop

[Install]
WantedBy=multi-user.target

The systemd-networkd-wait-online.service is not there by coincidence. Without it, bind9 was launched before eth0 had received an address. With this, systemd consistently waited for the DHCP to finish, and then launched bind9. As it turned out, this also delayed the start of apache2 and sendmail.

If anything, network-online.target is most likely redundant.

And with this fix, the crucial row appeared in the log:

named[379]: listening on IPv4 interface eth0, 193.29.56.92#53

Another solution could have been to assign an address to eth0 statically. For some odd reason, I prefer to let DHCP do this, even though the firewall will block all traffic anyhow if the IP address changes.

Using Live Ubuntu as rescue mode

Set Ubuntu 24.04 server amd64 as the CDROM image.

After the machine has booted, send a Ctrl-Alt-F2 to switch to the second console. Don’t go on with the installation wizard, as it will of course wipe the server.

In order to establish an ssh connection:

Choose a password for the default user (ubuntu-server).
```
$ passwd
```
If you insist on a weak password, remember that you can do that only as root.
Use ssh to log in:
```
$ ssh ubuntu-server@185.250.251.160
```

Root login is forbidden (by default), so don’t even try.

Note that even though sshd apparently listens only to IPv6 ports, it’s actually accepting IPv4 connection by virtue of IPv4-mapped IPv6 addresses:

# lsof -n -P -i tcp 2>/dev/null
COMMAND    PID            USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
systemd      1            root  143u  IPv6   5323      0t0  TCP *:22 (LISTEN)
systemd-r  911 systemd-resolve   15u  IPv4   1766      0t0  TCP 127.0.0.53:53 (LISTEN)
systemd-r  911 systemd-resolve   17u  IPv4   1768      0t0  TCP 127.0.0.54:53 (LISTEN)
sshd      1687            root    3u  IPv6   5323      0t0  TCP *:22 (LISTEN)
sshd      1847            root    4u  IPv6  11147      0t0  TCP 185.250.251.160:22->85.64.140.6:57208 (ESTABLISHED)
sshd      1902   ubuntu-server    4u  IPv6  11147      0t0  TCP 185.250.251.160:22->85.64.140.6:57208 (ESTABLISHED)One can get the impression that sshd listens only to IPv6. But somehow, it also accepts

So don’t get confused by e.g. netstat and other similar utilities.

To NTP or not?

I wasn’t sure if I should run an NTP client inside a KVM virtual machine. So these are the notes I took.

This is a nice tutorial to start with.
It’s probably a good idea to run an NTP client on the client. It would have been better to utilize the PTP protocol, and get the host’s clock directly. But this is really an overkill. The drawback with these daemons is that if the client goes down and back up again, it will start with the old time, and then jump.
It’s also a good idea to use kvm_clock in addition to NTP. This kernel feature uses the pvclock protocol to lets guest virtual machines read the host physical machine’s wall clock time as well as its TSC. See this post for a nice tutorial about kvm_clock.
In order to know which clock source the kernel uses, look in /sys/devices/system/clocksource/clocksource0/current_clocksource. Quite expectedly, it was kvm-clock (available sources were kvm-clock, tsc and acpi_pm).
It so turned out that systemd-timesyncd started running without my intervention when moving from a container to KVM.

On a working KVM machine, timesyncd tells about its presence in the log:

Jul 11 20:52:52 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.001s/0.007s/0.003s/+0ppm
Jul 11 21:27:00 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.007s/0.001s/+0ppm
Jul 11 22:01:08 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.002s/0.007s/0.001s/+0ppm
Jul 11 22:35:17 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.001s/0.007s/0.001s/+0ppm
Jul 11 23:09:25 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.007s/0.007s/0.003s/+0ppm
Jul 11 23:43:33 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.003s/0.007s/0.005s/+0ppm (ignored)
Jul 12 00:17:41 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.006s/0.007s/0.005s/-1ppm
Jul 12 00:51:50 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.001s/0.007s/0.005s/+0ppm
Jul 12 01:25:58 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.002s/0.007s/0.005s/+0ppm
Jul 12 02:00:06 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.002s/0.007s/0.005s/+0ppm
Jul 12 02:34:14 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.001s/0.007s/0.005s/+0ppm
Jul 12 03:08:23 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.007s/0.005s/+0ppm
Jul 12 03:42:31 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.001s/0.007s/0.004s/+0ppm
Jul 12 04:17:11 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.007s/0.003s/+0ppm

So a resync takes place every 2048 seconds (34 minutes and 8 seconds), like a clockwork. As apparent from the values, there’s no dispute about the time between Debian’s NTP server and the web host’s hypervisor.

systemd dependencies and processing /etc/fstab

eli — Wed, 06 Dec 2023 10:05:42 +0000

The problem I wanted to solve

On an embedded arm system running LUbuntu 16.04 LTS (systemd 229), I wanted to add a mount point to the /etc/fstab file, which originally was like this:

# UNCONFIGURED FSTAB FOR BASE SYSTEM

Empty, except for this message. Is that a bad omen or what? This is the time to mention that the system was set up with debootstrap.

So I added this row to /etc/fstab:

#/dev/mmcblk0p1 /mnt/tfcard vfat defaults 0 0

The result: Boot failed, and systemd asked me for the root password for the sake of getting a rescue shell (the overall systemd status was “maintenance”, cutely enough).

Journalctl was kind enough to offer some not-so-helpful hints:

systemd[1]: dev-mmcblk0p1.device: Job dev-mmcblk0p1.device/start timed out.
systemd[1]: Timed out waiting for device dev-mmcblk0p1.device.
systemd[1]: Dependency failed for /mnt/tfcard.
systemd[1]: Dependency failed for Local File Systems.
systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
systemd[1]: mnt-tfcard.mount: Job mnt-tfcard.mount/start failed with result 'dependency'.
systemd[1]: dev-mmcblk0p1.device: Job dev-mmcblk0p1.device/start failed with result 'timeout'.

Changing the “defaults” option to “nofail” got the system up and running, but without the required mount. That said, running “mount -a” did successfully mount the requested partition.

Spoiler: I didn’t really find a way to solve this, but I offer a fairly digestible workaround at the end of this post.

This is nevertheless an opportunity to discuss systemd dependencies and how systemd relates with /etc/fstab.

The main takeaway: Use “nofail”

This is only slightly related to the main topic, but nevertheless the most important conclusion: Open your /etc/fstab, and add “nofail” as an option to all mount points, except those that really are crucial for booting the system. That includes “/home” if it’s on a separate file system, as well as those huge file systems for storing a lot of large files. The rationale is simple: If any of these file systems are corrupt during boot, you really don’t want to be stuck on that basic textual screen (possibly with very small fonts), not having an idea what happened, and fsck asking you if you want to fix this and that. It’s much easier to figure out the problem when the failing mount points simply don’t mount, and take it from there. Not to mention that it’s much less stressful when the system is up and running. Even if not fully so.

Citing “man systemd.mount”: “With nofail, this mount will be only wanted, not required, by local-fs.target or remote-fs.target. This means that the boot will continue even if this mount point is not mounted successfully”.

To do this properly, make a tarball of /run/systemd/generator. Keep an eye on what’s in local-fs.target.requires/ vs. local-fs.target.wants/ in this directory. Only the absolutely necessary mounts should be in local-fs.target.requires/. In my system there’s only one symlink for “-.mount”, which is the root filesystem. That’s it.

Regardless, it’s a good idea that the root filesystem has the errors=remount-ro option, so if it’s mountable albeit with errors, the system still goes up.

This brief discussion will be clearer after reading through this post. Also see “nofail and dependencies” below for an elaboration on how systemd treats the “nofail” option.

/etc/fstab and systemd

At an early stage of the boot process, systemd-fstab-generator reads through /etc/fstab, and generates *.mount systemd units in /run/systemd/generator/. Inside this directory, it also generates and populates a local-fs.target.requires/ directory as well local-fs.target.wants/ in order to reflect the necessity of these units for the boot process (more about dependencies below).

So for example, in response to this row in /etc/fstab,

/dev/mmcblk0p1 /mnt/tfcard vfat nofail 0 0

the automatically generated file can be found as /run/systemd/generator/mnt-tfcard.mount as follows:

# Automatically generated by systemd-fstab-generator

[Unit]
SourcePath=/etc/fstab
Documentation=man:fstab(5) man:systemd-fstab-generator(8)

[Mount]
What=/dev/mmcblk0p1
Where=/mnt/tfcard
Type=vfat
Options=nofail

Note that the file name of a mount unit must match the mount point (as it does in this case). This is only relevant when writing mount units manually, of course.

Which brings me to my first attempt to solve the problem: Namely, to copy the automatically generated file into /etc/systemd/system/, and enable it as if I wrote it myself. And then remove the row in /etc/fstab.

More precisely, I created this file as /etc/systemd/system/mnt-tfcard.mount:

[Unit]
Description=Mount /dev/mmcblk0p1 as /mnt/tfcard

[Mount]
What=/dev/mmcblk0p1
Where=/mnt/tfcard
Type=vfat
Options=nofail

[Install]
WantedBy=multi-user.target

Note that the WantedBy is necessary for enabling the unit. multi-user.target is a bit inaccurate for this purpose, but it failed for a different reason anyhow, so who cares. It should have been “WantedBy=local-fs.target”, I believe.

Anyhow, I then went:

# systemctl daemon-reload
# systemctl enable mnt-tfcard.mount
Created symlink from /etc/systemd/system/multi-user.target.wants/mnt-tfcard.mount to /etc/systemd/system/mnt-tfcard.mount.

And then ran

# systemctl start mnt-tfcard.mount

but that was just stuck for a few seconds, and then an error message. There was no problem booting (because of “nofail”), but the mount wasn’t performed.

To investigate further, I attached strace to the systemd process (PID 1) while running the “systemctl start” command, and there was no fork. In other words, I had already then a good reason to suspect that the problem wasn’t a failed mount attempt, but that systemd didn’t go as far as trying. Which isn’t so surprising, given that the original complaint was a dependency problem.

Also, the command

# systemctl start dev-mmcblk0p1.device

didn’t finish. Checking with “systemctl”, it just said:

UNIT                 LOAD   ACTIVE     SUB       JOB   DESCRIPTION
dev-mmcblk0p1.device loaded inactive   dead      start dev-mmcblk0p1.device

[ ... ]

So have we just learned? That it’s possible to create .mount units instead of populating /etc/fstab. And that the result is exactly the same. Why that would be useful, I don’t know.

See “man systemd-fstab-generator” about how the content of fstab is translated automatically to systemd units. Also see “man systemd-fsck@.service” regarding the service that runs fsck, and refer to “man systemd.mount” regarding mount units.

A quick recap on systemd dependencies

It’s important to make a distinction between dependencies that express requirements and dependencies that express the order of launching units (“temporal dependencies”).

I’ll start with the first sort: Those that express that if one unit is activated, other units need to be activated too (“pulled in” as it’s often referred to). Note that the requirement kind of dependencies don’t say anything about one unit waiting for another to complete its activation or anything of that sort. The whole point is to start a lot of units in parallel.

So first, we have Requires=. When it appears in a unit file, it means that the listed unit must be started if the current unit is started, and that if that listed unit fails, the current unit will fail as well.

In the opposite direction, there’s RequiredBy=. It’s exactly like putting a Required= directive in the unit that is listed.

Then we have another famous couple, Wants= and its opposite WantedBy=. These are the merciful counterparts of Requires= and RequriedBy=. The difference: If the listed unit fails, the other unit may activate successfully nevertheless. The failure is marked by the fact that “systemctl status” will declare the system as “degraded”.

So the fact that basically every service unit ends with “WantedBy=multi-user.target” means that the unit file says “if multi-user.target is a requested target (i.e. practically always), you need to activate me”, but at the same time it says “if I fail, don’t fail the entire boot process”. In other words, had RequriedBy= been used all over the place instead, it would have worked just the same, as long as all units started successfully. But if one of them didn’t, one would get a textual rescue shell instead of an almost fully functional (“degraded”) system.

There are a whole lot of other directives for defining dependencies. In particular, there are also Requisite=, ConsistsOf=, BindsTo=, PartOf=, Conflicts=, plus tons of possible directives for fine-tuning the behavior of the unit, including conditions for starting and whatnot. See “man systemd.unit”. There’s also a nice summary table on that man page.

It’s important to note that even if unit X requires unit Y by virtue of the directives mentioned above, systemd may (and often will) activate them at the same time. That’s true for both “require” and “want” kind of directives. In order to control the order of activation, we have After= and Before=. After= and Before= also ensure that the stopping of units occur in the reverse order.

After= and Before= only define when the units are started, and are often used in conjunction with the directives that define requirements. But by themselves, After= and Before= don’t define a relation of requirement between units.

It’s therefore common to use Required= and After= on the same listed unit, meaning that the listed unit must complete successfully before activating the unit for which these directives are given. Same with RequiredBy= and Before=, meaning that this unit must complete before the listed unit can start.

Once again, there are more commands, and more to say on those I just mentioned. See “man systemd.unit” for detailed info on this. Really, do. It’s a good man page. There’s a lot of “but if” to be aware of.

“nofail” and dependencies

One somewhat scary side-effect of adding “nofail” is that the line saying “Before=local-fs.target” doesn’t appear in the *.mount unit files that are automatically generated. Does it mean that the services might be started before these mounts have taken place?

I can’t say I’m 100% sure about this, but it would be really poor design if it was that way, as it would have have added a very unexpected side-effect to “nofail”. Looking at the system log of a boot with almost all mounts with “nofail”, it’s evident that they were mounted after “Reached target Local File Systems” is announced, but before “Reached target System Initialization”. Looking at a system log before adding these “nofail” options, all mounts were made before “Reached target Local File Systems”.

So what happens here? It’s worth mentioning that in local-fs.target, it only says After=local-fs-pre.target in relation to temporal dependencies. So that can explain why “Reached target Local File Systems” is announced before all mounts have been completed.

But then, local-fs.target is listed in sysinit.target’s After= as well as Wants= directives. I haven’t found any clear-cut definition to whether “After=” waits until all “wanted” units have been fully started (or officially failed). The term that is used in “man systemd.unit” in relation to waiting is “finished started up”, but what does that mean exactly? This manpage also says “Most importantly, for service units start-up is considered completed for the purpose of Before=/After= when all its configured start-up commands have been invoked and they either failed or reported start-up success”. But what about “want” relations?

So I guess systemd interprets the “After” directive on local-fs.target in sysinit.target as “wait for local-fs.target’s temporal dependencies as well, even if local-fs.target doesn’t wait for them”.

Querying dependencies

So how can we know which unit depends on which? In order to get the system’s tree of dependencies, go

$ systemctl list-dependencies

This recursively shows the dependency tree that is created by Requires, RequiredBy, Wants, WantedBy and similar relations. Usually, only target units are followed recursively. In order to display the entire tree, add the –all flag.

The units that are listed are those that are required (or “wanted”) in order to reach the target that is being queried (“default.target” if no target specified).

As can be seen from the output, the filesystem mounts are made in order to reach the local-fs.target target. To get the tree of only this target:

$ systemctl list-dependencies local-fs.target

The output of this command are the units that are activated for the sake of reaching local-fs.target. “-.mount” is the root mount, by the way.

It’s also possible to obtain the reverse dependencies with the –reverse flag. In other words, in order to see which targets rely on a local-fs.target, go

$ systemctl list-dependencies --reverse local-fs.target

list-dependencies doesn’t make a distinction between “required” or “wanted”, and neither does it show the nuances of the various other possibilities for creating dependencies. For this, use “systemctl show”, e.g.

$ systemctl show local-fs.target

This lists all directives, explicit and implicit, that have been made on this unit. A lot of information to go through, but that’s the full picture.

If the –after flag is added, the tree of targets with an After= directive (and other implicit temporal dependencies, e.g. mirrored Before= directives) is printed out. Recall that this only tells us something about the order of execution, and nothing about which unit requires which. For example:

$ systemctl list-dependencies --after local-fs.target

Likewise, there’s –before, which does the opposite.

Note that the names of the flags are confusing: –after tells us about the targets that are started before local-fs.target, and –before tells us about the targets started after local-fs.target. The rationale: The spirit of list-dependencies (without –reverse) is that it tells us what is needed to reach the target. Hence following the After= directives tells us what had to be before. And vice versa.

Red herring #1

After this long general discussion about dependencies, let’s go back to the original problem: The mnt-tfcard.mount unit failed with status “dependency” and dev-mmcblk0p1.device had the status inactive / dead.

Could this be a dependency thing?

# systemctl list-dependencies dev-mmcblk0p1.device
dev-mmcblk0p2.device
● └─mnt-tfcard.mount
# systemctl list-dependencies mnt-tfcard.mount
mnt-tfcard.mount
● ├─dev-mmcblk0p1.device
● └─system.slice

Say what? dev-mmcblk0p2.device depends on mnt-tfcard.mount? And even more intriguing, mnt-tfcard.mount depends on dev-mmcblk0p2.device, so it’s a circular dependency! This must be the problem! (not)

This was the result regardless of whether I used /etc/fstab or my own mount unit file instead. I also tried adding DefaultDependencies=no on mnt-tfcard.mount’s unit file, but that made no difference.

Neither did this dependency go away when the last parameter in /dev/fstab was changed from 0 to 2, indicating that fsck should be run on the block device prior to mount. The dependency changed, though. What did change was mnt-tfcard.mount’s dependencies, but that’s not the problem.

# systemctl list-dependencies mnt-tfcard.mount
mnt-tfcard.mount
● ├─dev-mmcblk0p1.device
● ├─system.slice
● └─systemd-fsck@dev-mmcblk0p1.service
# systemctl list-dependencies systemd-fsck@dev-mmcblk0p1.service
systemd-fsck@dev-mmcblk0p1.service
● ├─dev-mmcblk0p1.device
● ├─system-systemd\x2dfsck.slice
● └─systemd-fsckd.socket

But the thing is that the cyclic dependency isn’t an error. This is the same queries on the /boot mount (/dev/sda1, ext4) of another computer:

$ systemctl list-dependencies boot.mount
boot.mount
● ├─-.mount
● ├─dev-disk-by\x2duuid-063f1689\x2d3729\x2d425e\x2d9319\x2dc815ccd8ecaf.device
● ├─system.slice
● └─systemd-fsck@dev-disk-by\x2duuid-063f1689\x2d3729\x2d425e\x2d9319\x2dc815ccd8ecaf.service

And the back dependency:

$ systemctl list-dependencies 'dev-disk-by\x2duuid-063f1689\x2d3729\x2d425e\x2d9319\x2dc815ccd8ecaf.device'
dev-disk-by\x2duuid-063f1689\x2d3729\x2d425e\x2d9319\x2dc815ccd8ecaf.device
● └─boot.mount

The mount unit depends on the device unit and vice versa. Same thing, but this time it works.

Why does the device unit depend on the mount unit, one may ask. Not clear.

Red herring #2

At some point, I thought that the reason for the problem was that the filesystem’s “dirty bit” was set:

# fsck /dev/mmcblk0p1 
fsck from util-linux 2.27.1
fsck.fat 3.0.28 (2015-05-16)
0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
1) Remove dirty bit
2) No action

By the way, using “fsck.vfat” instead of just “fsck” did exactly the same. The former is what the systemd-fsck service uses, according to the man page.

But this wasn’t the problem. Even after removing the dirty bit, and with fsck reporting success, dev-mmcblk0p1.device would not start.

The really stinking fish

What really is fishy on the system is this output of “systemctl –all”:

dev-mmcblk0p2.device      loaded activating tentative /dev/mmcblk0p2

/dev/mmcblk0p2 is the partition that is mounted as root! It should be “loaded active plugged”! All devices have that status, except for this one.

Lennart Poettering (i.e. the horse’s mouth) mentions that “If the device stays around in “tentative” state, then this indicates that a device appears in /proc/self/mountinfo with some name, and systemd can’t find a matching device in /sys for it, probably because for some reason it has a different name”.

And indeed, this is the relevant row in /proc/self/mountinfo:

16 0 179:2 / / rw,relatime shared:1 - ext4 /dev/root rw,data=ordered

So the root partition appears as /dev/root. This device file doesn’t even exist. The real one is /dev/mmcblk0p2, and it’s mentioned in the kernel’s command line. On this platform, Linux boots without any initrd image. The kernel mounts root by itself, and takes it from there.

As Lennart points out in a different place, /dev/root is a special creature that is made up in relation to mounting a root filesystem by the kernel.

At this point, I realized that I’m up against a quirk that has probably been solved silently during the six years since the relevant distribution was released. In other words, no point wasting time looking for the root cause. So this calls for…

The workaround

With systemd around, writing a service is a piece of cake. So I wrote a trivial service which runs mount when it’s started and umount when it’s stopped. Not a masterpiece in terms of software engineering, but it gets the job done without polluting too much.

The service file, /etc/systemd/system/mount-card.service is as follows:

[Unit]
Description=TF card automatic mount

[Service]
ExecStart=/bin/mount /dev/mmcblk0p1 /mnt/tfcard/
ExecStop=/bin/umount /mnt/tfcard/
Type=simple
RemainAfterExit=yes

[Install]
WantedBy=local-fs.target

Activating the service:

# systemctl daemon-reload
# systemctl enable mount-card
Created symlink from /etc/systemd/system/local-fs.target.wants/mount-card.service to /etc/systemd/system/mount-card.service.

And reboot.

The purpose of “RemainAfterExit=yes” is to make systemd consider the service active even after the command exits. Without this row, the command for ExecStop runs immediately after ExecStart, so the device is unmounted immediately after it has been mounted. Setting Type to oneshot doesn’t solve this issue, by the way. The only difference between “oneshot” and “simple” is that oneshot delays the execution of other units until it has exited.

There is no need to add an explicit dependency on the existence of the root filesystem, because all services have an implicit Required= and After= dependency on sysinit.target, which in turn depends on local-fs.target. This also ensures that the service is stopped before an attempt to remount root as read-only during shutdown.

As for choosing local-fs.target for the WantedBy, it’s somewhat questionable, because no service can run until sysinit.target has been reached, and the latter depends on local-fs.target, as just mentioned. More precisely (citing “man systemd.service”), “service units will have dependencies of type Requires= and After= on sysinit.target, a dependency of type After= on basic.target”. So this ensures that the service is started after the Basic Target has been reached and shuts down before this target is shut down.

That said, setting WantedBy this way is a more accurate expression of the purpose of this service. Either way, the result is that the unit is marked for activation on every boot.

I don’t know if this is the place to write “all well that ends well”. I usually try to avoid workarounds. But in this case it was really not worth the effort to insist on a clean solution. Not sure if it’s actually possible.

Linux Mint + Cinnamon: Volume buttons stop to work suddenly

eli — Wed, 31 May 2023 09:58:16 +0000

The nuisance

After trying to fix another problem with Pulseaudio, and messing up a bit with its configuration files, the option to control the volume by virtue of keyboard buttons suddenly stopped to work. The cursor would blink briefly when pressing the button, but nothing else happened.

I tried all the fairly normal things, including to restart the Pulseaudio daemon, to restart Cinnamon (with Alt-F2 + “r”), restart the session by changing the user (to the same user), disconnect and reconnect the (USB) keyboard, and a whole lot of other things.

It took me quite some time to fix this, so here’s what worked for me this time, along with some debug info.

The machine was a Linux Mint 19 with Cinnamon.

At an earlier stage, even the sound applet was gone from its desktop panel. To fix that, I right-clicked the panel, picked Troubleshoot > Looking Glass, and picked the Extension tab. There’s an applet listed called “Sound”. So I right-clicked it, and chose “Reload Code”. That got the applet back to the panel, and it worked flawlessly. But then I was stuck with the volume button problem.

The fix

What I actually did:

Kill the process named csd-keyboard (probably irrelevant)
Kill the process named csd-media-keys
Restart pulseaudio with “pulseaudio -k”

This was the output of journalctl as a result of these:

May 31 12:17:03 cinnamon-session[2115]: WARNING: t+1132491.55749s: Application 'cinnamon-settings-daemon-keyboard.desktop' killed by signal 15
May 31 12:17:17 cinnamon-session[2115]: WARNING: t+1132505.78238s: Application 'cinnamon-settings-daemon-media-keys.desktop' killed by signal 15
May 31 12:17:17 rtkit-daemon[2284]: Successfully made thread 4304 of process 4304 (n/a) owned by '1010' high priority at nice level -11.
May 31 12:17:17 rtkit-daemon[2284]: Supervising 8 threads of 4 processes of 1 users.
May 31 12:17:17 pulseaudio[4304]: [pulseaudio] pid.c: Daemon already running.
May 31 12:18:04 bluetoothd[12009]: Endpoint unregistered: sender=:1.120210 path=/MediaEndpoint/A2DPSource
May 31 12:18:04 bluetoothd[12009]: Endpoint unregistered: sender=:1.120210 path=/MediaEndpoint/A2DPSink
May 31 12:18:04 rtkit-daemon[2284]: Successfully made thread 4345 of process 4345 (n/a) owned by '1010' high priority at nice level -11.
May 31 12:18:04 rtkit-daemon[2284]: Supervising 4 threads of 4 processes of 1 users.
May 31 12:18:04 pulseaudio[4345]: [pulseaudio] sink.c: Default and alternate sample rates are the same.
May 31 12:18:04 rtkit-daemon[2284]: Supervising 3 threads of 3 processes of 1 users.
May 31 12:18:04 rtkit-daemon[2284]: Successfully made thread 4347 of process 4345 (n/a) owned by '1010' RT at priority 5.
May 31 12:18:04 rtkit-daemon[2284]: Supervising 4 threads of 3 processes of 1 users.
May 31 12:18:04 rtkit-daemon[2284]: Supervising 4 threads of 3 processes of 1 users.
May 31 12:18:04 rtkit-daemon[2284]: Successfully made thread 4348 of process 4345 (n/a) owned by '1010' RT at priority 5.
May 31 12:18:04 rtkit-daemon[2284]: Supervising 5 threads of 3 processes of 1 users.
May 31 12:18:04 rtkit-daemon[2284]: Supervising 5 threads of 3 processes of 1 users.
May 31 12:18:04 rtkit-daemon[2284]: Successfully made thread 4349 of process 4345 (n/a) owned by '1010' RT at priority 5.
May 31 12:18:04 rtkit-daemon[2284]: Supervising 6 threads of 3 processes of 1 users.
May 31 12:18:05 rtkit-daemon[2284]: Supervising 6 threads of 3 processes of 1 users.
May 31 12:18:05 rtkit-daemon[2284]: Successfully made thread 4350 of process 4345 (n/a) owned by '1010' RT at priority 5.
May 31 12:18:05 rtkit-daemon[2284]: Supervising 7 threads of 3 processes of 1 users.
May 31 12:18:05 bluetoothd[12009]: Endpoint registered: sender=:1.120482 path=/MediaEndpoint/A2DPSource
May 31 12:18:05 bluetoothd[12009]: Endpoint registered: sender=:1.120482 path=/MediaEndpoint/A2DPSink
May 31 12:18:05 pulseaudio[4345]: [pulseaudio] backend-ofono.c: Failed to register as a handsfree audio agent with ofono: org.freedesktop.DBus.Error.ServiceUnknown: The name org.ofono was not provided by any .service files
May 31 12:18:05 rtkit-daemon[2284]: Successfully made thread 4352 of process 4352 (n/a) owned by '1010' high priority at nice level -11.
May 31 12:18:05 rtkit-daemon[2284]: Supervising 8 threads of 4 processes of 1 users.
May 31 12:18:05 pulseaudio[4352]: [pulseaudio] pid.c: Daemon already running.
May 31 12:18:58 gnome-keyring-daemon[2204]: asked to register item /org/freedesktop/secrets/collection/login/9, but it's already registered

I suppose that killing csd-media-keys was the part that really did the trick, because that caused a request to start Pulseaudio.

Why it worked (presumably)

The cinnamon-settings-daemon is the component responsible for passing the keyboard request to Pulseaudio. Apparently, messing up with Pulseaudio caused one of its components, csd-media-keys, to mess up as well. Killing it caused cinnamon-settings-daemon to restart it automatically. From fresh.

It’s fine to kill the csd-* processes. Nothing dramatic happens. The killed process is just restarted. But I haven’t tried to kill cinnamon-settings-daemon itself. I believe it will restart Cinnamon completely, something I wanted to avoid.

Dbus activity

I’ve already written a post mentioning how to monitor the Dbus. So here’s a different take on the same topic.

The difference is quite evident when running dbus-monitor (not as root) as follows in order to monitor the activity for the current session:

$ dbus-monitor

This is the output as a result of pressing a volume up button, when it works correctly (i.e. after fixing the problem):

method call time=1685525392.173838 sender=:1.6531 -> destination=:1.6616 serial=1431 path=/org/cinnamon/SettingsDaemon/KeybindingHandler; interface=org.cinnamon.SettingsDaemon.KeybindingHandler; member=HandleKeybinding
   uint32 2
method call time=1685525392.174967 sender=:1.6616 -> destination=org.freedesktop.DBus serial=566 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=AddMatch
   string "type='signal',interface='ca.desrt.dconf.Writer',path='/ca/desrt/dconf/Writer/user',arg0path='/org/cinnamon/desktop/sound/'"
method return time=1685525392.175012 sender=org.freedesktop.DBus -> destination=:1.6616 serial=382 reply_serial=566
method call time=1685525392.175026 sender=:1.6616 -> destination=:1.6531 serial=567 path=/org/Cinnamon; interface=org.Cinnamon; member=ShowOSD
   array [
      dict entry(
         string "icon"
         variant             string "audio-volume-medium-symbolic"
      )
      dict entry(
         string "level"
         variant             int32 34
      )
   ]
method call time=1685525392.175170 sender=:1.6616 -> destination=org.freedesktop.DBus serial=568 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=RemoveMatch
   string "type='signal',interface='ca.desrt.dconf.Writer',path='/ca/desrt/dconf/Writer/user',arg0path='/org/cinnamon/desktop/sound/'"
method return time=1685525392.175189 sender=org.freedesktop.DBus -> destination=:1.6616 serial=383 reply_serial=568
method call time=1685525392.175198 sender=:1.6616 -> destination=org.freedesktop.DBus serial=569 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=AddMatch
   string "type='signal',interface='ca.desrt.dconf.Writer',path='/ca/desrt/dconf/Writer/user',arg0path='/org/cinnamon/desktop/sound/'"
method return time=1685525392.175216 sender=org.freedesktop.DBus -> destination=:1.6616 serial=384 reply_serial=569
method call time=1685525392.176330 sender=:1.6618 -> destination=org.freedesktop.DBus serial=26 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=RequestName
   string "org.freedesktop.ReserveDevice1.Audio2"
   uint32 5
signal time=1685525392.176383 sender=org.freedesktop.DBus -> destination=(null destination) serial=50 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameOwnerChanged
   string "org.freedesktop.ReserveDevice1.Audio2"
   string ""
   string ":1.6618"
signal time=1685525392.176411 sender=org.freedesktop.DBus -> destination=:1.6618 serial=51 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameAcquired
   string "org.freedesktop.ReserveDevice1.Audio2"
method return time=1685525392.176427 sender=org.freedesktop.DBus -> destination=:1.6618 serial=52 reply_serial=26
   uint32 1
method return time=1685525392.184853 sender=:1.6616 -> destination=:1.6531 serial=570 reply_serial=1431
method call time=1685525392.184915 sender=:1.6616 -> destination=org.freedesktop.DBus serial=571 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=RemoveMatch
   string "type='signal',interface='ca.desrt.dconf.Writer',path='/ca/desrt/dconf/Writer/user',arg0path='/org/cinnamon/desktop/sound/'"
method return time=1685525392.184937 sender=org.freedesktop.DBus -> destination=:1.6616 serial=385 reply_serial=571
method return time=1685525392.190757 sender=:1.6531 -> destination=:1.6616 serial=1432 reply_serial=567

Unfortunately, I didn’t save the output before the fix. But it was exactly two rows of output. As far as I recall, the first and last row from this dump were present when it didn’t work (i.e. the row involving KeybindingHandler’s request, and the return method that closed it).

The bus addresses can be listed with the following command:

$ busctl --user

In this specific session the output was:

NAME                                                       PID PROCESS         USER             CONNECTION    UNIT                      SESSION    DESCRIPTION
:1.0                                                      2103 systemd         eli              :1.0          user@1010.service         -          -
:1.1014                                                  45251 pxgsettings     eli              :1.1014       session-c1.scope          c1         -
:1.1015                                                  45277 scp-dbus-servic eli              :1.1015       user@1010.service         -          -
:1.1016                                                  45277 scp-dbus-servic eli              :1.1016       user@1010.service         -          -
:1.12                                                     2213 csd-a11y-keyboa eli              :1.12         session-c1.scope          c1         -
:1.1234                                                  53331 nemo            eli              :1.1234       session-c1.scope          c1         -
:1.13                                                     2221 csd-a11y-settin eli              :1.13         session-c1.scope          c1         -
:1.131                                                   16564 klauncher       eli              :1.131        session-c1.scope          c1         -
:1.16                                                     2232 csd-cursor      eli              :1.16         session-c1.scope          c1         -
:1.17                                                     2225 csd-color       eli              :1.17         session-c1.scope          c1         -
:1.18                                                     2238 csd-clipboard   eli              :1.18         session-c1.scope          c1         -
:1.19                                                     2242 csd-orientation eli              :1.19         session-c1.scope          c1         -
:1.20                                                     2233 csd-wacom       eli              :1.20         session-c1.scope          c1         -
:1.21                                                     2245 csd-xsettings   eli              :1.21         session-c1.scope          c1         -
:1.22                                                     2248 csd-mouse       eli              :1.22         session-c1.scope          c1         -
:1.24                                                     2256 csd-print-notif eli              :1.24         session-c1.scope          c1         -
:1.26                                                     2258 csd-background  eli              :1.26         session-c1.scope          c1         -
:1.27                                                     2260 gvfsd           eli              :1.27         user@1010.service         -          -
:1.28                                                     2267 csd-housekeepin eli              :1.28         session-c1.scope          c1         -
:1.29                                                     2274 csd-screensaver eli              :1.29         session-c1.scope          c1         -
:1.30                                                     2270 csd-power       eli              :1.30         session-c1.scope          c1         -
:1.31                                                     2299 dconf-service   eli              :1.31         user@1010.service         -          -
:1.32                                                     2285 csd-automount   eli              :1.32         session-c1.scope          c1         -
:1.329                                                   22590 xreader         eli              :1.329        session-c1.scope          c1         -
:1.33                                                     2306 gvfsd-fuse      eli              :1.33         user@1010.service         -          -
:1.330                                                   22596 xreaderd        eli              :1.330        user@1010.service         -          -
:1.332                                                   22606 WebKitNetworkPr eli              :1.332        session-c1.scope          c1         -
:1.35                                                     2313 csd-printer     eli              :1.35         session-c1.scope          c1         -
:1.37                                                     2349 gvfs-udisks2-vo eli              :1.37         user@1010.service         -          -
:1.38                                                     2384 gvfs-gphoto2-vo eli              :1.38         user@1010.service         -          -
:1.39                                                     2277 csd-xrandr      eli              :1.39         session-c1.scope          c1         -
:1.40                                                     2410 gvfs-goa-volume eli              :1.40         user@1010.service         -          -
:1.41                                                     2414 goa-daemon      eli              :1.41         user@1010.service         -          -
:1.42                                                     2427 goa-identity-se eli              :1.42         user@1010.service         -          -
:1.43                                                     2432 gvfs-mtp-volume eli              :1.43         user@1010.service         -          -
:1.44                                                     2436 gvfs-afc-volume eli              :1.44         user@1010.service         -          -
:1.448                                                   31116 gvfsd-http      eli              :1.448        user@1010.service         -          -
:1.457                                                   31367 gvfsd-network   eli              :1.457        user@1010.service         -          -
:1.46                                                     2472 polkit-gnome-au eli              :1.46         session-c1.scope          c1         -
:1.463                                                   31401 gvfsd-dnssd     eli              :1.463        user@1010.service         -          -
:1.47                                                     2481 cinnamon-killer eli              :1.47         session-c1.scope          c1         -
:1.48                                                     2478 nemo-desktop    eli              :1.48         session-c1.scope          c1         -
:1.49                                                     2480 nm-applet       eli              :1.49         session-c1.scope          c1         -
:1.5                                                      2115 cinnamon-sessio eli              :1.5          session-c1.scope          c1         -
:1.50                                                     2474 blueberry-obex- eli              :1.50         session-c1.scope          c1         -
:1.51                                                     2474 blueberry-obex- eli              :1.51         session-c1.scope          c1         -
:1.52                                                     2499 obexd           eli              :1.52         user@1010.service         -          -
:1.54                                                     2506 gvfsd-trash     eli              :1.54         user@1010.service         -          -
:1.55                                                     2512 gvfsd-metadata  eli              :1.55         user@1010.service         -          -
:1.56                                                     2535 python2         eli              :1.56         session-c1.scope          c1         -
:1.5653                                                  62813 chrome          eli              :1.5653       session-c1.scope          c1         -
:1.5654                                                  62813 chrome          eli              :1.5654       session-c1.scope          c1         -
:1.5655                                                  62813 chrome          eli              :1.5655       session-c1.scope          c1         -
:1.5656                                                  62813 chrome          eli              :1.5656       session-c1.scope          c1         -
:1.5661                                                  62813 chrome          eli              :1.5661       session-c1.scope          c1         -
:1.57                                                     2534 cinnamon-screen eli              :1.57         session-c1.scope          c1         -
:1.58                                                     2574 gnome-terminal  eli              :1.58         session-c1.scope          c1         -
:1.5834                                                  52878 xreader         eli              :1.5834       session-c1.scope          c1         -
:1.5836                                                  52891 WebKitNetworkPr eli              :1.5836       session-c1.scope          c1         -
:1.5842                                                  54509 xreader         eli              :1.5842       session-c1.scope          c1         -
:1.5843                                                  54522 WebKitNetworkPr eli              :1.5843       session-c1.scope          c1         -
:1.59                                                     2574 gnome-terminal  eli              :1.59         session-c1.scope          c1         -
:1.6                                                      2115 cinnamon-sessio eli              :1.6          session-c1.scope          c1         -
:1.60                                                     2579 gconfd-2        eli              :1.60         user@1010.service         -          -
:1.61                                                     2574 gnome-terminal  eli              :1.61         session-c1.scope          c1         -
:1.6105                                                  47745 xreader         eli              :1.6105       session-c1.scope          c1         -
:1.6107                                                  47757 WebKitNetworkPr eli              :1.6107       session-c1.scope          c1         -
:1.6477                                                  31538 thunderbird-bin eli              :1.6477       session-c1.scope          c1         -
:1.6478                                                  31538 thunderbird-bin eli              :1.6478       session-c1.scope          c1         -
:1.6531                                                   2451 cinnamon        eli              :1.6531       session-c1.scope          c1         -
:1.6562                                                  69100 csd-sound       eli              :1.6562       session-c1.scope          c1         -
:1.6592                                                   1641 firefox-bin     eli              :1.6592       session-c1.scope          c1         -
:1.6593                                                   1641 firefox-bin     eli              :1.6593       session-c1.scope          c1         -
:1.6594                                                   1700 Web Content     eli              :1.6594       session-c1.scope          c1         -
:1.6595                                                   1760 WebExtensions   eli              :1.6595       session-c1.scope          c1         -
:1.6596                                                   1824 Web Content     eli              :1.6596       session-c1.scope          c1         -
:1.6597                                                   1876 Web Content     eli              :1.6597       session-c1.scope          c1         -
:1.6598                                                   1975 cinnamon-settin eli              :1.6598       session-c1.scope          c1         -
:1.6599                                                   1975 cinnamon-settin eli              :1.6599       session-c1.scope          c1         -
:1.6608                                                   2640 Web Content     eli              :1.6608       session-c1.scope          c1         -
:1.6609                                                   3399 Web Content     eli              :1.6609       session-c1.scope          c1         -
:1.6615                                                   4286 csd-keyboard    eli              :1.6615       session-c1.scope          c1         -
:1.6616                                                   4296 csd-media-keys  eli              :1.6616       session-c1.scope          c1         -
:1.6617                                                   4296 csd-media-keys  eli              :1.6617       session-c1.scope          c1         -
:1.6618                                                   4345 pulseaudio      eli              :1.6618       session-c1.scope          c1         -
:1.6621                                                   4692 xed             eli              :1.6621       session-c1.scope          c1         -
:1.6625                                                   5225 busctl          eli              :1.6625       session-c1.scope          c1         -
:1.67                                                     2860 applet.py       eli              :1.67         session-c1.scope          c1         -
:1.7                                                      2189 at-spi-bus-laun eli              :1.7          user@1010.service         -          -
:1.73                                                     3537 gnome-system-mo eli              :1.73         session-c1.scope          c1         -
:1.8                                                      2196 at-spi2-registr eli              :1.8          user@1010.service         -          -
:1.9                                                      2204 gnome-keyring-d eli              :1.9          session-c1.scope          c1         -
:1.92                                                     8884 xdg-desktop-por eli              :1.92         user@1010.service         -          -
:1.93                                                     8888 xdg-document-po eli              :1.93         user@1010.service         -          -
:1.94                                                     8891 xdg-permission- eli              :1.94         user@1010.service         -          -
:1.95                                                     8902 pxgsettings     eli              :1.95         user@1010.service         -          -
:1.96                                                     8906 xdg-desktop-por eli              :1.96         user@1010.service         -          -
ca.desrt.dconf                                            2299 dconf-service   eli              :1.31         user@1010.service         -          -
ca.desrt.dconf-editor                                        - -               -                (activatable) -                         -
org.Cinnamon                                              2451 cinnamon        eli              :1.6531       session-c1.scope          c1         -
org.Cinnamon.HotplugSniffer                                  - -               -                (activatable) -                         -
org.Cinnamon.LookingGlass                                 2451 cinnamon        eli              :1.6531       session-c1.scope          c1         -
org.Cinnamon.Melange                                         - -               -                (activatable) -                         -
org.Cinnamon.Slideshow                                       - -               -                (activatable) -                         -
org.Nemo                                                 53331 nemo            eli              :1.1234       session-c1.scope          c1         -
org.NemoDesktop                                           2478 nemo-desktop    eli              :1.48         session-c1.scope          c1         -
org.PulseAudio1                                           4345 pulseaudio      eli              :1.6618       session-c1.scope          c1         -
org.a11y.Bus                                              2189 at-spi-bus-laun eli              :1.7          user@1010.service         -          -
org.bluez.obex                                            2499 obexd           eli              :1.52         user@1010.service         -          -
org.cinnamon.ScreenSaver                                  2534 cinnamon-screen eli              :1.57         session-c1.scope          c1         -
org.cinnamon.SettingsDaemon.KeybindingHandler             4296 csd-media-keys  eli              :1.6616       session-c1.scope          c1         -

So it’s quite clear that the players here were cinnamon’s main socket, csd-media-keys and pulseaudio.

systemd: Shut down computer at a certain uptime

eli — Sun, 31 Jan 2021 17:32:17 +0000

Motivation

Paid-per-time cloud services. I don’t want to forget one of those running, just to get a fat bill at the end of the month. And if the intended use is short sessions anyhow, make sure that the machine shuts down by itself after a given amount of time. Just make sure that a shutdown by the machine itself accounts for cutting the costs. And sane cloud provider does that except for, possibly, costs for storing the VM’s disk image.

So this is the cloud computing parallel to “did I lock the door?”.

The examples here are based upon systemd 241 on Debian GNU/Linux 10.

The main service

There is more than one way to do this. I went for two services: One that calls /sbin/shutdown with a five minute delay (so I get a chance to cancel it) and then second is a timer for the uptime limit.

So the main service is this file as /etc/systemd/system/uptime-limiter.service:

[Unit]
Description=Limit uptime service

[Service]
ExecStart=/sbin/shutdown -h +5 "System it taken down by uptime-limit.service"
Type=simple

[Install]
WantedBy=multi-user.target

The naïve approach is to just enable the service and expect it to work. Well, it does work when started manually, but when this service starts as part of the system bringup, the shutdown request is registered but later ignored. Most likely because systemd somehow cancels pending shutdown requests when it reaches the ultimate target.

I should mention that adding After=multi-user.target in the unit file didn’t help. Maybe some other target. Don’t know.

The timer service

So the way to ensure that the shutdown command is respected is to trigger it off with a timer service.

The timer service as /etc/systemd/system/uptime-limiter.timer, in this case allows for 6 hours of uptime (plus the extra 5 minutes given by the main service):

[Unit]
Description=Timer for Limit uptime service

[Timer]
OnBootSec=6h
AccuracySec=1s

[Install]
WantedBy=timers.target

and enable it:

# systemctl enable uptime-limiter.timer
Created symlink /etc/systemd/system/timers.target.wants/uptime-limiter.timer → /etc/systemd/system/uptime-limiter.timer.

Note two things here: That I enabled the timer, not the service itself, by adding the .timer suffix. And I didn’t start it. For that, there’s the –now flag.

So there are two steps: When the timer fires off, the call to /sbin/shutdown takes place, and that causes nagging wall messages to start once a minute, and eventually a shutdown. Mission complete.

What timers are pending

Ah, that’s surprisingly easy:

# systemctl list-timers
NEXT                         LEFT          LAST                         PASSED       UNIT                         ACTIVATES
Sun 2021-01-31 17:38:28 UTC  14min left    n/a                          n/a          systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Sun 2021-01-31 20:50:22 UTC  3h 26min left Sun 2021-01-31 12:36:41 UTC  4h 47min ago apt-daily.timer              apt-daily.service
Sun 2021-01-31 23:23:28 UTC  5h 59min left n/a                          n/a          uptime-limiter.timer         uptime-limiter.service
Sun 2021-01-31 23:23:34 UTC  5h 59min left Sun 2021-01-31 17:23:34 UTC  44s ago      google-oslogin-cache.timer   google-oslogin-cache.service
Mon 2021-02-01 00:00:00 UTC  6h left       Sun 2021-01-31 12:36:41 UTC  4h 47min ago logrotate.timer              logrotate.service
Mon 2021-02-01 00:00:00 UTC  6h left       Sun 2021-01-31 12:36:41 UTC  4h 47min ago man-db.timer                 man-db.service
Mon 2021-02-01 06:49:19 UTC  13h left      Sun 2021-01-31 12:36:41 UTC  4h 47min ago apt-daily-upgrade.timer      apt-daily-upgrade.service

Clean and simple. And this is probably why this method is better than a long delay on shutdown, which is less clear about what it’s about to do, as shown next.

Note that a timer service can be stopped, which is parallel to canceling a shutdown. Restarting it to push the time limit further won’t work in this case, because the service is written related to OnBootSec.

Is there a shutdown pending?

To check if a shutdown is about to happen:

$ cat /run/systemd/shutdown/scheduled
USEC=1612103418427661
WARN_WALL=1
MODE=poweroff
WALL_MESSAGE=System it taken down by uptime-limit.service

There are different reports on what happens when the shutdown is canceled. On my system, the file was deleted in response to “shutdown -c”, but not when the shutdown was canceled because the system had just booted up. There’s other suggested ways too, but in the end, it appears like there’s no definite way to tell if a system has a shutdown scheduled or not. At least not as of systemd 241.

That USEC line is the epoch time for when shutdown will take place. A Perl guy like me goes

$ perl -e 'print scalar gmtime(1612103418427661/1e6)'

but that’s me.

What didn’t work

So this shows what doesn’t work: Enable the main service (as well as start it right away with the –now flag):

# systemctl enable --now uptime-limiter
Created symlink /etc/systemd/system/multi-user.target.wants/uptime-limiter.service → /etc/systemd/system/uptime-limiter.service.

Broadcast message from root@instance-1 (Sun 2021-01-31 14:15:19 UTC):

System it taken down by uptime-limit.service
The system is going down for poweroff at Sun 2021-01-31 14:25:19 UTC!

So the broadcast message is out there right away. But this is misleading: It won’t work at all when the service is started automatically during system boot.

Systemd services as cronjobs: No process runs away

eli — Mon, 18 Jan 2021 06:10:06 +0000

But why?

Cronjobs typically consists of a single utility which we’re pretty confident about. Even if it takes quite some time to complete (updatedb, for example), there’s always a simple story, a single task to complete with a known beginning and end.

If the task involves a shell script that calls a few utilities, that feeling of control fades. It’s therefore reassuring to know that everything can be cleaned up neatly by simple stopping a service. Systemd is good at that, since all processes that are involved in the service are kept in a separate cgroup. So when the service is stopped, all processes that were possibly generated eventually get a SIGKILL, typically 90 seconds after the request to stop the service, unless they terminated voluntarily in response to the initial SIGTERM.

Advantage number two is that the systemd allows for a series of capabilities to limit what the cronjob is capable of doing, thanks to the cgroup arrangement. This doesn’t fall very short from the possibilities of container virtualization, with pretty simple assignments in the unit file. This includes making certain directories inaccessible or accessible for read-only, setting up temporary directories, disallow external network connection, limit the set of allowed syscalls, and of course limit the amount of resources that are consumed by the service. They’re called Control Groups for a reason.

There’s also the RuntimeMaxSec parameter in the service unit file, which is the maximal wall clock time the service is allowed to run. The service is terminated and put in failure state if this time is exceeded. This is however supported from systemd version 229 and later, so check with “systemctl –version”.

My original idea was to use systemd timers to kick off the job, and let RuntimeMaxSec make sure it would get cleaned up if it ran too long (i.e. got stuck somehow). But because the server in question ran a rather old version of systemd, I went for a cron entry for starting the service and another one for stopping it, with a certain time difference between them. In hindsight, cron turned to be neater for kicking off the jobs, because I had multiple variants of them in different times. So one single file enclosed all.

The main practical difference is that if a service reaches RuntimeMaxSec, it’s terminated with a failed status. The cron solution stops the service without this. I guess there’s a systemctl way to achieve the failed status, if that’s really important.

As a side note, I have a separate post on Firejail, which is yet another possibility to use cgroups for controlling what processes do.

Timer basics

The idea is simple: A service can be started as a result of a timer event. That’s all that timer units do.

Timer units are configured like any systemd units (man systemd.unit) but have a .timer suffix and a dedicated [Timer] section. By convention, the timer unit named foo.timer activates the service foo.service, unless specified differently with the Unit= assignment (useful for generating confusion).

Units that are already running when the timer event occurs are not restarted, but are left to keep running. Exactly like systemctl start would do.

For an cronjob-style timer, use OnCalendar= to specify the times. See man systemd.time for the format. Note that AccuracySec= should be set too to control how much systemd can play with the exact time of execution, or systemd’s behavior might be confusing.

To see all active timers, go

$ systemctl list-timers

The unit file

As usual, the unit file (e.g. /etc/systemd/system/cronjob-test@.service) is short and concise:

[Unit]
Description=Cronjob test service

[Service]
ExecStart=/home/eli/shellout/utils/shellout.pl "%I"
Type=simple
User=eli
WorkingDirectory=/home/eli/shellout/utils
KillMode=mixed
NoNewPrivileges=true

This is a simple service, meaning that systemd expects the process launched by ExecStart to run in the foreground.

Note however that the service unit’s file name has a “@” character and that %I is used to choose what to run, based upon the unescaped instance name (see main systemd.unit). This turns the unit file into a template, and allows choosing an arbitrary command (the shellout.pl script is explained below) with something like (really, this works)

# systemctl start cronjob-test@'echo "Hello, world"'

This might seems dangerous, but recall that root privileges are required to start the service, and you get a plain-user process (possibly with no ability to escalate privileges) in return. Not the big jackpot.

For stopping the service, exactly the same service specifier string is required. But it’s also possible to stop all instances of a service with

# systemctl stop 'cronjob-test@*'

How neat is that?

A few comments on this:

The service should not be systemd-wise enabled (i.e. no “systemctl enable”) — that’s what you do to get it started on boot or following some kind of event. This is not the case, as the whole point is to start the service directly by a timer or crond.
Accordingly, the service unit file does not have an [Install] section.
A side effect of this is that the service may not appear in the list made by “systemctl” (without any arguments) unless it has processes running on its behalf currently running (or possibly if it’s in the failed state). Simple logic: It’s not loaded unless it has a cgroup allocated, and the cgroup is removed along with the last process. But it may appear anyhow under some conditions.
ExecStart must have a full path (i.e. not relative) even if the WorkingDirectory is set. In particular, it can’t be ./something.
A “systemctl start” on a service that is marked as failed will be started anyhow (i.e. the fact that it’s marked failed doesn’t prevent that). Quite obvious, but I tested it to be sure.
Also, a “systemctl start” causes the execution of ExecStart if and only if there’s no cgroup for it, which is equivalent to not having a process running on its behalf
KillMode is set to “mixed” which sends a SIGTERM only to the process that is launched directly when the service is stopped. The SIGKILL 90 seconds later, if any, is sent to all processes however. The default is to give all processes in the cgroup the SIGTERM when stopping.
NoNewPrivileges is a little paranoid thing: When no process has any reason to change its privileges or user IDs, block this possibility. This mitigates damage, should the job be successfully attacked in some way. But I ended up not using it, as running sendmail fails (it has some setuid thing to allow access to the mail spooler).

Stopping

There is no log entry for a service of simple type that terminates with a success status. Even though it’s stopped in the sense that it has no allocated cgroup and “systemctl start” behaves as if it was stopped, a successful termination is silent. Not sure if I like this, but that’s the way it is.

When the process doesn’t respond to SIGTERM:

Jan 16 19:13:03 systemd[1]: Stopping Cronjob test service...
Jan 16 19:14:33 systemd[1]: cronjob-test.service stop-sigterm timed out. Killing.
Jan 16 19:14:33 systemd[1]: cronjob-test.service: main process exited, code=killed, status=9/KILL
Jan 16 19:14:33 systemd[1]: Stopped Cronjob test service.
Jan 16 19:14:33 systemd[1]: Unit cronjob-test.service entered failed state.

So there’s always “Stopping” first and then “Stopped”. And if there are processes in the control group 90 seconds after “Stopping”, SIGKILL is sent, and the service gets a “failed” status. Not being able to quit properly is a failure.

A “systemctl stop” on a service that is already stopped is legit: The systemctl utility returns silently with a success status, and a “Stopped” message appears in the log without anything actually taking place. Neither does the service’s status change, so if it was considered failed before, so it remains. And if the target to stop was a group if instances (e.g. systemctl stop ‘cronjob-test@*’) and there were no instances to stop, there’s even not a log message on that.

Same logic with “Starting” and “Started”: A superfluous “systemctl start” does nothing except for a “Started” log message, and the utility is silent, returning success.

Capturing the output

By default, the output (stdout and stderr) of the processes is logged in the journal. This is usually pretty convenient, however I wanted the good old cronjob behavior: An email is sent unless the job is completely silent and exits with a success status (actually, crond doesn’t care, but I wanted this too).

This concept doesn’t fit systemd’s spirit: You don’t start sending mails each time a service has something to say. One could use OnFailure for activating another service that calls home when the service gets into a failure status (which includes a non-success termination of the main process), but that mail won’t tell me the output. To achieve this, I wrote a Perl script. So there’s one extra process, but who cares, systemd kills’em all in the end anyhow.

Here it comes (I called it shellout.pl):

#!/usr/bin/perl

use strict;
use warnings;

# Parameters for sending mail to report errors
my $sender = 'eli';
my $recipient = 'eli';
my $sendmail = "/usr/sbin/sendmail -i -f$sender";

my $cmd = shift;
my $start = time();

my $output = '';

my $catcher = sub { finish("Received signal."); };

$SIG{HUP} = $catcher;
$SIG{TERM} = $catcher;
$SIG{INT} = $catcher;
$SIG{QUIT} = $catcher;

my $pid = open (my $fh, '-|');

finish("Failed to fork: $!")
  unless (defined $pid);

if (!$pid) { # Child process
  # Redirect stderr to stdout for child processes as well
  open (STDERR, ">&STDOUT");

  exec($cmd) or die("Failed to exec $cmd: $!\n");
}

# Parent
while (defined (my $l = <$fh>)) {
  $output .= $l;
}

close $fh
 or finish("Error: $! $?");

finish("Execution successful, but output was generated.")
 if (length $output);

exit 0; # Happy end
sub finish {
  my ($msg) = @_;

  my $elapsed = time() - $start;

  $msg .= "\n\nOutput generated:\n\n$output\n"
    if (length $output);

  open (my $fh, '|-', "$sendmail $recipient") or
    finish("Failed to run sendmail: $!");

  print $fh <<"END";
From: Shellout script <$sender>
Subject: systemd cron job issue
To: $recipient

The script with command \"$cmd\" ran $elapsed seconds.

$msg
END

  close $fh
    or die("Failed to send email: $! $?\n");

  $SIG{TERM} = sub { }; # Not sure this matters
  kill -15, $$; # Kill entire process group

  exit(1);
}

First, let’s pay attention to

open (STDERR, ">&STDOUT");

which makes sure standard error is redirected to standard output. This is inherited by child processes, which is exactly the point.

The script catches the signals (SIGTERM in particular, which is systemd’s first hint that it’s time to pack and leave) and sends a SIGTERM to all other processes in turn. This is combined with KillMode being set to “mixed” in the service unit file, so that only shellout.pl gets the signal, and not the other processes.

The rationale is that if all processes get the signal at once, it may (theoretically?) turn out that the child process terminates before the script reacted to the signal it got itself, so it will fail to report that the reason for the termination was a signal, as opposed to the termination of the child. This could miss a situation where the child process got stuck and said nothing when being killed.

Note that the script kills all processes in the process group just before quitting due to a signal it got, or when the invoked process terminates and there was output. Before doing so, it sets the signal handler to a NOP, to avoid an endless loop, since the script’s process will get it as well (?). This NOP thing appears to be unnecessary, but better safe than sorry.

Also note that the while loop quits when there’s nothing more in <$fh>. This means that if the child process forks and then terminates, the while loop will continue, because unless the forked process closed its output file handles, it will keep the reference count of the script’s stdin above zero. The first child process will remain as a zombie until the forked process is done. Only then will it be reaped by virtue of the close $fh. This machinery is not intended for fork() sorcery.

I took a different approach in another post of mine, where the idea was to fork explicitly and modify the child’s attributes. Another post discusses timing out a child process in general.

Summary

Yes, cronjobs are much simpler. But in the long run, it’s a good idea to acquire the ability to run cronjobs as services for the sake of keeping the system clean from runaway processes.

Root over NFS remains read only with Linux v5.7

eli — Sun, 26 Jul 2020 13:18:22 +0000

Upgrading the kernel should be quick and painless…

After upgrading the kernel from v5.3 to 5.7, a lot of systemd services failed (Debian 8), in particular systemd-remount-fs:

● systemd-remount-fs.service - Remount Root and Kernel File Systems
   Loaded: loaded (/lib/systemd/system/systemd-remount-fs.service; static)
   Active: failed (Result: exit-code) since Sun 2020-07-26 15:28:15 IDT; 17min ago
     Docs: man:systemd-remount-fs.service(8)

http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems

  Process: 223 ExecStart=/lib/systemd/systemd-remount-fs (code=exited, status=1/FAILURE)
 Main PID: 223 (code=exited, status=1/FAILURE)

Jul 26 15:28:15 systemd[1]: systemd-remount-fs.service: main process exited, code=exited, status=1/FAILURE
Jul 26 15:28:15 systemd[1]: Failed to start Remount Root and Kernel File Systems.
Jul 26 15:28:15 systemd[1]: Unit systemd-remount-fs.service entered failed state.

and indeed, the root NFS remained read-only (checked with “mount” command), which explains why so many other services failed.

After an strace session, I managed to nail down the problem: The system call to mount(), which was supposed to do the remount, simply failed:

mount("10.1.1.1:/path/to/debian-82", "/", 0x61a250, MS_REMOUNT, "addr=10.1.1.1") = -1 EINVAL (Invalid argument)

On the other hand, any attempt to remount another read-only NFS mount, which had been mounted the regular way (i.e. after boot) went through clean, of course:

mount("10.1.1.1:/path/to/debian-82", "/mnt/tmp", 0x61a230, MS_REMOUNT, "addr=10.1.1.1") = 0

The only apparent difference between the two cases is the third argument, which is ignored for MS_REMOUNT according to the manpage.

The manpage also says something about the EINVAL return value:

EINVAL A remount operation (MS_REMOUNT) was attempted, but source was not already mounted on target.

A hint to the problem could be that the type of the mount, as listed in /proc/mounts, is “nfs” for the root mounted filesystem, but “nfs4″ for the one in /mnt/tmp. The reason for this difference isn’t completely clear.

The solution

So it’s all about that little hint: If the nfsroot is selected to boot as version 4, then there’s no problem remounting it. Why it made a difference from one kernel version to another is beyond me. So the fix is to add nfsvers=4 to the nfsroot assignment. Something like

root=/dev/nfs nfsroot=10.1.1.1:/path/to/debian-82,nfsvers=4

For the record, I re-ran the remount command with strace again, and exactly the same system call was made, including that most-likely-ignored 0x61a250 argument, and it simply returned success (zero) instead of EINVAL.

As a side note, the rootfstype=nfs in the kernel command line is completely ignored. Write any junk instead of “nfs” and it makes no difference.

Another yak shaved successfully.

When umount says target is busy, but no process can be blamed

eli — Sun, 28 Jun 2020 10:15:54 +0000

A short one: What to do if unmount is impossible with a

# umount /path/to/mount
umount: /path/to/mount: target is busy

but grepping the output of lsof for the said path yields nothing. In other words, the mount is busy, but no process can be blamed for accessing it (even as a home directory).

If this happens, odds are that it’s an NFS mount, held by some remote machine. The access might have been over long ago, but the mount is still considered busy. So the solution for this case is simple: Restart the NFS daemon. On Linux Mint 19 (and probably a lot of others) it’s simply

# systemctl restart nfs-server

and after this, umount is sucessful (hopefully…)

systemd: Reacting to USB NIC hotplugging (post-up scripting)

eli — Mon, 11 Nov 2019 17:17:57 +0000

The problem

Using Linux Mint 19, I have a network device that needs DHCP address allocation connected to a USB network dongle. When I plug it in, the device appears, but the DHCP daemon ignored eth2 (the assigned network device name) and didn’t respond to its DHCP discovery packets. But restarting the DHCP server well after plugging in the USB network card solved the issue.

I should mention that I use a vintage DHCP server for this or other reason (not necessarily a good one). There’s a good chance that a systemd-aware DHCP daemon will resynchronize itself following a network hotplug event. It’s evident that avahi-daemon, hostapd, systemd-timesyncd and vmnet-natd trigger some activity as a result of the new network device.

Most notable is systemd-timesyncd, which goes

Nov 11 11:25:59 systemd-timesyncd[1101]: Network configuration changed, trying to establish connection.

twice, once when the new device appears, and a second time when it is configured. See sample kernel log below.

It’s not clear to me how these daemons get their notification on the new network device. I could have dug deeper into this, but ended up with a rather ugly solution. I’m sure this can be done better, but I’ve wasted enough time on this — please comment below if you know how.

Setting up a systemd service

The very systemd way to run a script when a networking device appears is to add a service. Namely, add this file as /etc/systemd/system/eth2-up.service:

[Unit]
Description=Restart dhcp when eth2 is up

[Service]
ExecStart=/bin/sleep 10 ; /bin/systemctl restart my-dhcpd
Type=oneshot

[Install]
WantedBy=sys-subsystem-net-devices-eth2.device

And then activate the service:

# systemctl daemon-reload
# systemctl enable eth2-up

The concept is simple: A on-shot service depends on the relevant device. When it’s up, what’s on ExecStart is run, the DHCP server is restarted, end of story.

I promised ugly, didn’t I: Note the 10 second sleep before kicking off the daemon restart. This is required because the service is launched when the networking device appears, and not when it’s fully configured. So starting the DHCP daemon right away misses the point (or simply put: It doesn’t work).

I guess the DHCP daemon will be restarted one time extra on boot due to this extra service. In that sense, the 10 seconds delay is possible better than restarting it soon after or while it being started by systemd in general.

So with the service activated, this is what the log looks like (the restarting of the DHCP server not included):

Nov 11 11:25:54 kernel: usb 1-12: new high-speed USB device number 125 using xhci_hcd
Nov 11 11:25:54 kernel: usb 1-12: New USB device found, idVendor=0bda, idProduct=8153
Nov 11 11:25:54 kernel: usb 1-12: New USB device strings: Mfr=1, Product=2, SerialNumber=6
Nov 11 11:25:54 kernel: usb 1-12: Product: USB 10/100/1000 LAN
Nov 11 11:25:54 kernel: usb 1-12: Manufacturer: Realtek
Nov 11 11:25:54 kernel: usb 1-12: SerialNumber: 001000001
Nov 11 11:25:55 kernel: usb 1-12: reset high-speed USB device number 125 using xhci_hcd
Nov 11 11:25:55 vmnet-natd[1845]: RTM_NEWLINK: name:eth2 index:848 flags:0x00001002
Nov 11 11:25:55 vmnetBridge[1620]: RTM_NEWLINK: name:eth2 index:848 flags:0x00001002
Nov 11 11:25:55 kernel: r8152 1-12:1.0 eth2: v1.09.9
Nov 11 11:25:55 mtp-probe[59372]: checking bus 1, device 125: "/sys/devices/pci0000:00/0000:00:14.0/usb1/1-12"
Nov 11 11:25:55 mtp-probe[59372]: bus: 1, device: 125 was not an MTP device
Nov 11 11:25:55 upowerd[2203]: unhandled action 'bind' on /sys/devices/pci0000:00/0000:00:14.0/usb1/1-12
Nov 11 11:25:55 systemd-networkd[65515]: ppp0: Link is not managed by us
Nov 11 11:25:55 systemd-networkd[65515]: vmnet8: Link is not managed by us
Nov 11 11:25:55 systemd-networkd[65515]: vmnet1: Link is not managed by us
Nov 11 11:25:55 networkd-dispatcher[1140]: WARNING:Unknown index 848 seen, reloading interface list
Nov 11 11:25:55 systemd-networkd[65515]: lo: Link is not managed by us
Nov 11 11:25:55 systemd-networkd[65515]: eth2: IPv6 successfully enabled
Nov 11 11:25:55 systemd[1]: Starting Restart dhcp when eth2 is up...
Nov 11 11:25:55 kernel: IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
Nov 11 11:25:55 vmnet-natd[1845]: RTM_NEWLINK: name:eth2 index:848 flags:0x00001043
Nov 11 11:25:55 vmnetBridge[1620]: RTM_NEWLINK: name:eth2 index:848 flags:0x00001043
Nov 11 11:25:55 vmnetBridge[1620]: Adding interface eth2 index:848
Nov 11 11:25:55 vmnet-natd[1845]: RTM_NEWLINK: name:eth2 index:848 flags:0x00001043
Nov 11 11:25:55 vmnetBridge[1620]: RTM_NEWLINK: name:eth2 index:848 flags:0x00001043
Nov 11 11:25:55 systemd-timesyncd[1101]: Network configuration changed, trying to establish connection.
Nov 11 11:25:55 vmnetBridge[1620]: RTM_NEWLINK: name:eth2 index:848 flags:0x00001003
Nov 11 11:25:55 vmnetBridge[1620]: Removing interface eth2 index:848
Nov 11 11:25:55 vmnet-natd[1845]: RTM_NEWLINK: name:eth2 index:848 flags:0x00001003
Nov 11 11:25:55 upowerd[2203]: unhandled action 'bind' on /sys/devices/pci0000:00/0000:00:14.0/usb1/1-12/1-12:1.0
Nov 11 11:25:55 kernel: IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
Nov 11 11:25:55 systemd-timesyncd[1101]: Synchronized to time server 91.189.89.198:123 (ntp.ubuntu.com).
Nov 11 11:25:55 kernel: userif-3: sent link down event.
Nov 11 11:25:55 kernel: userif-3: sent link up event.
Nov 11 11:25:57 vmnetBridge[1620]: RTM_NEWLINK: name:eth2 index:848 flags:0x00011043
Nov 11 11:25:57 vmnetBridge[1620]: Adding interface eth2 index:848
Nov 11 11:25:57 systemd-networkd[65515]: eth2: Gained carrier
Nov 11 11:25:57 systemd-timesyncd[1101]: Network configuration changed, trying to establish connection.
Nov 11 11:25:57 avahi-daemon[1115]: Joining mDNS multicast group on interface eth2.IPv4 with address 10.20.30.1.
Nov 11 11:25:57 avahi-daemon[1115]: New relevant interface eth2.IPv4 for mDNS.
Nov 11 11:25:57 avahi-daemon[1115]: Registering new address record for 10.20.30.1 on eth2.IPv4.
Nov 11 11:25:57 kernel: r8152 1-12:1.0 eth2: carrier on
Nov 11 11:25:57 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
Nov 11 11:25:57 vmnet-natd[1845]: RTM_NEWLINK: name:eth2 index:848 flags:0x00011043
Nov 11 11:25:57 vmnet-natd[1845]: RTM_NEWADDR: index:848, addr:10.20.30.1
Nov 11 11:25:57 systemd-timesyncd[1101]: Synchronized to time server 91.189.89.198:123 (ntp.ubuntu.com).
Nov 11 11:25:58 kernel: userif-3: sent link down event.
Nov 11 11:25:58 kernel: userif-3: sent link up event.
Nov 11 11:25:59 avahi-daemon[1115]: Joining mDNS multicast group on interface eth2.IPv6 with address fe80::2e0:4cff:fe68:71d.
Nov 11 11:25:59 avahi-daemon[1115]: New relevant interface eth2.IPv6 for mDNS.
Nov 11 11:25:59 systemd-networkd[65515]: eth2: Gained IPv6LL
Nov 11 11:25:59 avahi-daemon[1115]: Registering new address record for fe80::2e0:4cff:fe68:71d on eth2.*.
Nov 11 11:25:59 systemd-networkd[65515]: eth2: Configured
Nov 11 11:25:59 systemd-timesyncd[1101]: Network configuration changed, trying to establish connection.
Nov 11 11:25:59 systemd-timesyncd[1101]: Synchronized to time server 91.189.89.198:123 (ntp.ubuntu.com).

As emphasized in bold above, there are 4 seconds between the activation of the script and systemd-networkd’s declaration that it’s finished with it.

It would have been much nicer to kick off the script where systemd-timesyncd detects the change for the second time. It would have been much wiser had WantedBy=sys-subsystem-net-devices-eth2.device meant that the target is reached when it’s actually configured. Once again, if someone has an idea, please comment below.

A udev rule instead

The truth is that I started off with a udev rule first, ran into the problem with the DHCP server being restarted too early, and tried to solve it with systemd as shown above, hoping that it would work better. The bottom line is that it’s effectively the same. So here’s the udev rule, which I kept as /etc/udev/rules.d/99-network-dongle.rules:

SUBSYSTEM=="net", ACTION=="add", KERNEL=="eth2", RUN+="/bin/sleep 10 ; /bin/systemctl restart my-dhcpd"

Note that I nail down the device by its name (eth2). It would have been nicer to do it based upon the USB device’s Vendor / Product IDs, however that failed for me. Somehow, it didn’t match for me when using SUBSYSTEM==”usb”. How I ensure the repeatable eth2 name is explained on this post.

Also worth noting that these two commands for getting the udev rule together:

# udevadm test -a add /sys/class/net/eth2
# udevadm info -a /sys/class/net/eth2