my tech blog » Linux

Enabling STARTTLS on sendmail with Let’s Encrypt certificate

eli — Wed, 04 Mar 2026 07:39:01 +0000

Introduction

I’ll start with the crucial point: My interest in giving sendmail a Let’s Encrypt certificate (along with a secret key, of course) has nothing to do with security. The real reason is that some mail servers won’t deliver their mail to my server unless the link is encrypted. As of today (March 2026), I know only of one such case, but that’s enough for me to understand that I must at least support an opportunistic TLS upgrade for arriving mails. In other words, my mail server must allow STARTTLS for servers who want to drop a mail at my server.

The problem with servers that won’t play ball without STARTTLS is that the mail is lost, sometimes without the sender being notified. I discovered this issue when one of those confirm-your-email messages didn’t arrive, and I sent the sender’s tech support a complaint. To which they responded with the reason: My server didn’t support a STARTTLS upgrade.

So the goal is to reduce the risk of mail loss, nothing else. And the main concern is that mail will not be delivered in cases it would have before adding STARTTLS. For example, where a cleartext connection would have been OK, but the sides attempted and failed to initiate a STARTTLS encrypted session and then the connection is terminated altogether, no mail delievered.

I’m running sendmail 8.14.4 on a Debian 8 machine (yes, it’s really ancient, but don’t play around with a stable system).

As for sources of information, there are a lot of guides out there. For those preferring the horse’s mouth, there’s the README on configuration files at /usr/share/sendmail-cf/README. Look for the section about STARTTLS. An example configuration can be found at /usr/share/sendmail/examples/tls/starttls.m4.

And since this topic revolves around certificates, maybe check out another post of mine which attempts to explain this topic. And my own little tutorial on setting up sendmail.

Is encryption worth anything?

If a mail server refuses to talk with anyone unless a TLS link is established with a mutually verified certificate (both sides check each other), encryption indeed adds security. Otherwise, it’s quite pointless.

Let’s begin with accepting arriving emails: If a server agrees to cleartext connections by clients for dropping off mails, it’s up to the sender to decide the level of security. Even if the arriving connection is encrypted with TLS, that doesn’t mean that connection is secure. Surely, the server is requied to submit a certificate when setting up the TLS session, but did the client verify it? And it the verification failed, did it terminate the connection? If it didn’t, a simple man-in-the-middle attack allows an eavesdropper can come in the middle, feed the client with a cooked-up certificate, accept the email, and then relay this email to the real destination, this time with a proper TLS connection. The creates an illusion that the mail was transmitted securely.

As a receiver of this mail, you can’t be sure who’s on the other side without checking the client’s certificate. A lot of clients won’t supply a certificate, though (my own server included, more about this below).

As for sending emails, an eavesdropping server might pretend to be the destination (possibly by DNS poisoning of the MX record). In the simplest man-in-the-middle attack, the eavesdropper doesn’t allow STARTTLS, and the message is transmitted in cleartext. If someone bothers to look in the mail server’s logs, this can be detected. Alternatively, the eavesdropper might suggest STARTTLS and offer an invalid certificate. For example, a self-signed certificate might seem like an innocent mistake. If the sending server agrees to sending the email nevertheless, the attack is successful (but with a “verify=FAIL” in the logs, allowing spotting the attack, if verifications usually are successful).

So to be really secure all mail servers must insist on TLS and verify each other’s certificates, or else the mail doesn’t go through. At present, going this path with a public mail server means a lot of undelivered mails (from legit sources). In particular, insisting that the sender of the email offers a valid certificate is not going to end well.

The situation before I made any changes

With the default configuration, sendmail has no certificates available for use. With no certificates, I mean no certificates to identify itself, but also no root certificates that allow verifying other servers.

This doesn’t prevent my server from connecting to other servers with TLS for outbound mails. In the mail log, the relevant entry looks like this:

sm-mta: 6237AaWA022257: from=, size=2503, class=0, nrcpts=1, msgid=, bodytype=8BITMIME, proto=ESMTP, daemon=IPv4-port-587, relay=localhost.localdomain [127.0.0.1]
sm-mta: 6237AaWA022257: Milter insert (1): header: DKIM-Signature: [ ... ]
sm-mta: STARTTLS=client, relay=mx.other.com., version=TLSv1/SSLv3, verify=FAIL, cipher=ECDHE-RSA-AES128-GCM-SHA256, bits=128/128
sm-mta: 6237AaWA022257: to=, ctladdr= (1000/1000), delay=00:00:03, xdelay=00:00:03, mailer=esmtp, pri=122503, relay=mx.other.com. [128.112.34.45], dsn=2.0.0, stat=Sent (Ok: queued as )

Note the “VERIFY=fail” on the third row, discussed in length below. For sending email, the only drawback for not setting up anything encryption-related is that the server’s identity isn’t verified. Plus, my server didn’t send a certificate if it was asked to do so, but that’s quite usual.

So to the server receiving the email, everything is normal. My server, acting as a client, upgraded the connection to encrypted with STARTTLS, and went through with it. No problem at all.

Should I install root certificates?

In order to allow my server to prevent a man-in-the-middle attack for outbound email, I can install root certificates and make them available to sendmail by virtue of configuration parameters. I also need to configure sendmail to refuse anything else than a TLS with a verified server. Otherwise, it’s pointless, see above.

At the very least, I will need to update the root certificates often enough, so that new root certificates that are generally accepted are recognized by sendmail, and that expired root certificates are replaced by new ones.

And even if I do everything right, some mails will probably not go through because the destination mail server isn’t configured correctly. Or doesn’t support STARTTLS at all, just as my own didn’t, before the changes I describe here.

So clearly, no root certificates on my server. I want all emails to fail verification, so if I mistakenly enable some kind of enforcement, all deliveries will fail, and not one isolated case a couple of weeks after making the mistake.

Now to the practical part

In the existing installation, /etc/mail is where sendmail keeps its configuration file. So I created the /etc/mail/certs/ directory, and populated it with three files:

my.pem: The certificate obtained from Let’s Encrypt.
my.key: The secret key for which this certificate is made (or to be really accurate, the certificate is made for the public key that pairs with this key). Readable by root only (for security and to make sendmail happy).
ca.pem: The intermediate certificate which completes the trust chain from my.pem to the root certificate (Let’s Encrypt’s entire chain consists of just three certificates, root included).

I was a bit sloppy in the description for my.pem, because I use bacme to obtain the certificate from Let’s Encrypt, and that script gives me a certificate file that contains both my.pem and ca.pem, concatenated. The script that separates these two into separate files is shown further below.

And then I added these rows to sendmail.mc:

define(`confCACERT_PATH', `/etc/mail/certs')dnl
define(`confCACERT', `/etc/mail/certs/ca.pem')dnl
define(`confSERVER_CERT', `/etc/mail/certs/my.pem')dnl
define(`confSERVER_KEY', `/etc/mail/certs/my.key')dnl

Note that with this setting, the server doesn’t supply any certificate when acting as a client (i.e. when submitting an outbound email), even if asked for it. To enable this, the confCLIENT_CERT and confCLIENT_KEY options need to be assigned. This option should not be used with Let’s Encrypt’s certificates, as discussed below.

Actually, my initial attempt was to add only the two last rows, defining confSERVER_CERT and confSERVER_KEY. As the certificate file I get from my renewal utility already contains both my own and intermediate certificates, why not give sendmail only this combo file and forget about CAs? I mean, this is how I do it with Apache’s web server!

That idea failed royally in two different ways:

If confCACERT_PATH and confCACERT aren’t defined, sendmail will start, but won’t activate the STARTTLS option. Actually, even if both are defined as above, and ca.pem is empty, no STARTTLS. Raising the loglevel with this definition allowed sendmail to complain specifically about this:
```
define(`confLOG_LEVEL', `14')dnl
```
When I put an irrelevant certificate in ca.pem, sendmail activated STARTTLS, ignored ca.pem (which is fine, it doesn’t help) but presented only the first certificate in my.pem to the client connecting.

To put it simple, sendmail wants my own certificate in my.pem and the CA’s certificate(s) in ca.pem, and kindly asks me not to fool around. And I have no problem with this, as the intermediate certificate can’t be used by my server to verify other server’s certificates. So adding it doesn’t work against my decision that all attempts to verify certificates by my server will fail.

But this arrangement requires writing a little script to separate the certificates, which is listed below. As far as I understand, those using the mainstream certbot don’t have this problem, as it generates separate files.

Reloading the daemon

It says everywhere, that sendmail must be restarted after updating the certificates. Even though I have the impression that sendmail is designed to shut itself down properly and safely in response to a SIGTERM, and even more importantly, that it’s designed not to lose or duplicate any mails, I didn’t fancy the idea of sending the server the signal that is usually used when shutting down the entire computer.

Instead, it’s possible to send the server a SIGHUP, which makes the server reload its configuration files (except for sendmail.conf, I read somewhere?) and of course the certificates. It’s actually a quick shutdown and restart, so maybe the difference isn’t so great, but reloading is the way it was meant to be. And it’s easily done with this command:

# /etc/init.d/sendmail reload > /dev/null

Redirection to /dev/null silences output (suitable for cron jobs).

It works!

There are two ways to see that mails actually arrive with TLS. One is through the mail logs:

sm-mta: STARTTLS=server, relay=mail-lj1-f180.google.com [209.85.208.180], version=TLSv1/SSLv3, verify=FAIL, cipher=ECDHE-RSA-AES128-GCM-SHA256, bits=128/128

“STARTTLS=server” in the log, indicates that a client has connected with STARTTLS for inbound mail. A reminder from above, if it says “STARTTLS=client”, it’s the server that has connected to another one for outbound mail. And along with that, sendmail tells us how the verification of the other side went.

Even easier, the TLS session leaves its tracks in the relevant Received: header in the mail itself:

Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com
 [209.85.208.180])	by mx.server.com (8.14.4/8.14.4/Debian-8+deb8u2) with
 ESMTP id 6245JWdJ019568	(version=TLSv1/SSLv3
 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=FAIL)	for
 ; Wed, 4 Mar 2026 12:19:34 GMT

No need to be alarmed about the verify=FAIL part. It just indicates that my server failed to verify the certificates that Gmail sent, which is quite natural, as it has no root certificates to go with. Which, as mentioned above, is intentional. See below for more about the verify=FAIL thing.

If something goes wrong…

In order to obtain debug messages, increase the log level to 14 by adding this to sendmail.mc (and compile with make).

define(`confLOG_LEVEL', `14')dnl

This is the log output after a reload (with SIGHUP), ending with STARTTLS working fine:

sm-mta: restarting /usr/sbin/sendmail-mta due to signal
sm-mta: error: safesasl(/etc/sasl2/Sendmail.conf) failed: No such file or directory
sm-mta: error: safesasl(/etc/sasl/Sendmail.conf) failed: No such file or directory
sm-mta: starting daemon (8.14.4): SMTP+queueing@00:10:00
sm-mta: STARTTLS: CRLFile missing
sm-mta: STARTTLS=server, Diffie-Hellman init, key=1024 bit (1)
sm-mta: STARTTLS=server, init=1
sm-mta started as: /usr/sbin/sendmail-mta -Am -L sm-mta -bd -q10m

Except for the fact that there are no errors related to STARTTLS, it says init=1, which is the indication it works.

The complaints about files missing in /etc/sasl2/ can be ignored, as I don’t use any kind of authentication (i.e. asking the client for credentials).

Checking the server for real

The easiest way to test the mail server is with CheckTLS.

This tool connects to the server and attempts starting a STARTTLS session (but doesn’t send a mail). The tool shows the details of session with lots of details, along with clarifications on the meaning of those details. So no need to know all the technicalities to get an idea on how you’re doing. If CheckTLS says all is fine, it’s really fine. If it says otherwise, take the remarks seriously (but never mind if MTASTS and DANE aren’t tested and marked yellow).

But even more importantly, CheckTLS gives the details of the certificates that the server provides. This is a good way to verify that the correct certificates are used, and that the certificate chain is valid.

Note however that this only checks how the server behaves when receiving emails. So if something is wrong with how the server sends email, for example it offers a problematic client certificate, that will go undetected.

For a more do-it-yourself approach, the first thing is to check that the server offers STARTTLS:

$ nc localhost 25
220 mx.server.com ESMTP MTA; Wed, 4 Mar 2026 13:14:20 GMT
EHLO there.com
250-mx.server.com Hello localhost.localdomain [127.0.0.1], pleased to meet you
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-8BITMIME
250-SIZE
250-STARTTLS
250-DELIVERBY
250 HELP
^C

It’s easier to do this on the local machine, because some ISP block connections to port 25 (to avoid spamming from their IP addresses). Note that it’s necessary to type an “EHLO” command to get the server saying something.

All fine? Connect to the server (see “man s_client):

$ openssl s_client -connect localhost:25 -starttls smtp < /dev/null
CONNECTED(00000003)
Can't use SSL_get_servername
depth=1 C = US, O = Let's Encrypt, CN = R12
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = mx.server.com
verify return:1
---
Certificate chain
 0 s:CN = mx.server.com
   i:C = US, O = Let's Encrypt, CN = R12
 1 s:C = US, O = Let's Encrypt, CN = R12
   i:C = US, O = Internet Security Research Group, CN = ISRG Root X1

[ ... ]

And then a lot of mumbo-jumbo follows. The certificates sent by the server are listed in the “Certificate chain” section. The part above it shows openssl’s attempts to validate the certification chain. In the case shown above, there was no root certificate available on the server, hence the “unable to get local issuer certificate” error.

The reason I focus on the “Certificate chain” section is that if a root certificate server is present on the computer that runs openssl, it is listed in the section above it. It will then look like this:

CONNECTED(00000005)
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R12
verify return:1
depth=0 CN = mx.server.com
verify return:1
---
Certificate chain
 0 s:CN = mx.server.com
   i:C = US, O = Let's Encrypt, CN = R12
 1 s:C = US, O = Let's Encrypt, CN = R12
   i:C = US, O = Internet Security Research Group, CN = ISRG Root X1
---

So openssl is happy now, but it lists the root certificate which it picked from its own repository. That is a bit confusing when checking the server.

Notes:

The exact same test can be run on port 587, if it’s open for connections.
The reason for the
The number in parentheses after CONNECTED is just the file descriptor number of the TCP socket. Nothing really interesting, even though it happens to be different when the certificate chain is validated and when it’s not.
Add the -showcerts flag to dump all certificates that the server sent, not just the first one. Certificates from the local machine that help validation but weren’t sent from server are not dumped. For example:
```
openssl s_client -connect localhost:25 -starttls smtp -showcerts < /dev/null
```
As shown on a different post of mine, this allows examining them closer by copying each certificate to a separate file, and going
```
openssl x509 -in thecertificate.crt -text
```
If the verfication of the cerficate fails, openssl establishes the session regardless. The status of this verification is indicated on the output line saying “Verify return code: 0 (ok)” for a successful verification, and another number code and message otherwise (this part isn’t shown above). It’s also possible to add the -verify_return_error flag, which causes openssl to abort the session if the certificate chain verification fails, and indicate that with an error exit code.

verify=FAIL? Is that a problem?

The short answer is, no (in my case).

For the longer answer, let’s start with a brief explanation on the establishment of the TLS session. Encrypted mail delivery, regardless of whether it’s initiated with a STARTTLS on port 25 or by connecting to port 587, is based upon the TLS protocol. This is exactly the same protocol used by web browsers when they establish an https connection.

According to the TLS protocol, the server (as in client / server) is required to send its certificate (along with intermediate certificates) in order to prove that it’s indeed the server of the domain requested (domain as in example.com). The purpose is to avoid a man-in-the-middle attack.

The server may request the client to send its certificate (if it has any) in order to identify itself. This might seem odd in a web browser / server connection, but is supported by the commonly used browsers: It’s possible to supply them with a client certificate, which may be used on websites that request such.

This is why the practical step for making a mail server support STARTTLS is to supply it with a certificate. This certificate is part of the TLS handshake. The thing is, that it’s verified by the client, not our server. So we don’t know if the verification was successful or not. The fact that the TLS connection was established doesn’t mean that the client was happy with the certificate it got. It might have continued setting up the encrypted link regardless. To compare with web browsers, they issue a scary warning when they fail to verify the web server’s certificate. But that’s how it is today. In the distant past, the common reaction to an unverified certificate was a slightly different icon in the browser’s address bar, nothing more.

Likewise, a mail client is likely to continue the TLS handshake if the verification of the server’s certificate fails. This is a reasonable choice in particular if the client initiated a STARTTLS session, but would have sent the email in cleartext if this option wasn’t availble: An encrypted session with the possibility of a man-in-the-middle attack is better than sending the mail in cleartext, might be a way to think about it. And when looking at tutorials on the Internet from the early 2000′s on how to set up a mail server, it’s quite often suggested to use self-signed certificates, such that can’t be validated no matter what. This was a reasonable idea before the Let’s Encrypt era, when a certificate was an expensive thing.

It’s somewhat amusing that the TLS protocol doesn’t include a mechanism for the client to say “listen, your certificate stinks, but I’ll continue anyhow”. It could have been useful for server maintainers, but I suppose crypto people didn’t even want to think about this possibility.

Now to the point: The “verify” part in the mail log (as well as in the related Received header row in arriving mails) indicates the result of verifying the other side’s certificate, if such was requested.

As for the meaning of this attribute, here’s a copy-paste from the relevant part in the README file mentioned above:

${verify} holds the result of the verification of the presented cert.
	Possible values are:
	OK	 verification succeeded.
	NO	 no cert presented.
	NOT	 no cert requested.
	FAIL	 cert presented but could not be verified,
		 e.g., the cert of the signing CA is missing.
	NONE	 STARTTLS has not been performed.
	TEMP	 temporary error occurred.
	PROTOCOL protocol error occurred (SMTP level).
	SOFTWARE STARTTLS handshake failed.

Recall from above that I deliberately didn’t give my mail server any root certificates, with the intention that all verifications of certificates will fail.

It’s important to distinguish between two cases:

Outbound emails, my server acting as a client, STARTTLS=client in the log: In this case, the certificate is required by the TLS protocol. Had I provided the mail server with root certificates, anything but verify=OK would have indicated a problem, and could have been a reason to terminate the TLS session. As I’ve already mentioned, the reason I didn’t provide the root certificates is to ensure that my server won’t be this picky.
Inbound emails, my server acting as server, STARTTLS=server in the log: The certificate isn’t required to establish the TLS connection, but the server is allowed to ask for it.

So there are two normal options:

verify=FAIL, meaning that my side got a certificate (as a client in order to establish a TLS session, or as a server because it asked for it), and the other side submitted something in response. And the verification failed, which is normal when you don’t have a root certificate.
verify=NO, meaning that the other side didn’t submit any certificate.

Had I supplied root certificates to my server, I should have seen verify=OK most of the time for STARTTLS=client, and possibly verify=NO for STARTTLS=server. Would this information help me? Would this help telling spam servers from legit ones? I doubt that.

It’s a bit questionable why my server asks for certificates it can’t verify in the STARTTLS=server case, but that’s the default setting, and I guess this is how normal servers behave. According to the README file, setting the confTLS_SRV_OPTIONS to ‘V’ tells the server not to ask clients for certificates. Haven’t tried that personally, but I suppose one gets verify=NOT (as opposed to NO).

But what could be the point in asking the client for certificates? One possibility is that Sendmail has an access database, which defines rules for who is allowed to talk with the server. It’s possible to add rules related to TLS that depend on a successful verification of the certificate by making the rules depend on ${verify}, which is the macro containing the result of this verification (should be “OK”). The rules can of course also depend other attributes of the sender. This is more useful for relaying mails inside an organization. More about this in the README file.

To summarize, both verify=FAIL and verify=NO indicate no problem in my case, because the server has no way to validate the client’s certificate, and this validation isn’t necessary anyhow in my setting, as the server isn’t configured to refuse unverified partners by default, and I surely didn’t change that. It might have been a bit more elegant to set the confTLS_SRV_OPTIONS option to ‘V’ to spare clients the pointless task of sending a certificate that isn’t checked, but I stayed with the default configuration. I try to make a few changes as possible.

Should my server provide a certificate as a client?

Short answer, no, because Let’s Encrypt’s certificates don’t support TLS client authentication anymore.

Besides, this feature isn’t required, and opens for a possibility that weird things might happen if the other side doesn’t manage to verify my certificate for some reason. One could argue that sending certificates improves my server’s position as a non-spammer, and that it makes my server behave a bit more like Google’s. However I searched the web for indications that this would improve the spam score, and found none. So even if I had a certificate that allows client authentication, I wouldn’t use it. And the fact that Let’s Encrypt phased it out proves the point: I would most likely not have noticed the change, and my server would have sent an inadequate certificate, and now go figure why my mails aren’t delivered.

That said, it’s possible to configure Gmail (for business?) to accept mails only from servers that have presented a valid certificate.

Let’s get a bit technical about this: This is taken from the output of “openssl x509 -in thecertificate.crt -text” of a Let’s Encrypt certificate, issued in March 2026:

        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE

Compare with the same part on a certificate from Gmail, when acting as a server:

        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE

So far, the same “Extended Key Usage”. But when Gmail’s server identifies as a client, the relevant part is this:

        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE

So “TLS Web Client Authentication” is a thing, and it’s not wise to issue a certificate without this option when identifying as a client.

Fun fact, I dug up the certificate for the same server from October 2023 from a backup. And indeed, client authentication was enabled:

        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE

Not surprising, given the phase-out message from Let’s Encrypt from the link above. Otherwise, the old and new certificates are pretty much alike.

I’ve attached the printouts of both Google’s certificates below for reference. Anyhow, the main takeaways are:

Even though it says “TLS Web Server Authentication”, it’s fine for mail servers. I would otherwise think “Web Server” refers to https. So Let’s Encrypt’s certificates are really legit for a mail server.
When Gmail acts as a client, it indeed has the TLS Web Client Authentication option
Don’t try using Let’s Encrypt’s certificates for client authentication.

The script splitting certificates

I promised the script that splits the PEM certificate file obtained by bacme from Let’s Encrypt into my own certificate and the intermediate certificate. I do such things in Perl, so here it is, but don’t use this, because there’s a better script below:

#!/usr/bin/perl
use warnings;
use strict;

local $/; # Slurp mode

my $cert = <>;

my @chunks = ($cert =~ /(-----BEGIN CERTIFICATE-----.*?-----END CERTIFICATE-----)/gs);

my $found = @chunks;

die("$0: Expected to find two certificates, found $found instead.\n")
  unless ($found == 2);

writefile("my.pem", "$chunks[0]\n");
writefile("ca.pem", "$chunks[1]\n");

exit(0);

sub writefile {
  my ($fname, $data) = @_;

  open(my $out, ">", $fname)
    or die "Can't open \"$fname\" for write: $!\n";
  print $out $data;
  close $out;
}

I run this as a regular user, not root, which is why I’m relatively sloppy with just writing out a couple of files in the current directory. Even though the hardcoded filenames makes this rather safe anyhow.

This script is given the combined PEM file through standard input (or with the file name as the first argument), and emits the two PEM files to the current directory. Deliberately unsophisticated, and deliberately very picky about the existence of exactly two certificates in the input, so I get notified if something in Let’s Encrypt’s setting suddenly changes.

For example, in the old Let’s Encrypt certificate from 2023 I mentioned above, there were three certificates. The third certificate affirmed ISRG Root X1 with the help of DST Root CA X3, the latter considered the root certificate at the time. The former is nowadays an established root certificate by itself, hence a third certificate unnecessary. But it can change, and if it does, I suppose the solution will be to concatenate everything but the first certificate into ca.pem. And if that happens, I want to be aware of the change and verify that the server gives the correct intermediate certificates.

But then towards the end of May 2026, Let’s encrypt started sending three certificates instead of two. So I changed the script to the following. This one works either way:

#!/usr/bin/perl
use warnings;
use strict;

local $/; # Slurp mode

my $cert = <>;

my @chunks = ($cert =~ /(-----BEGIN CERTIFICATE-----.*?-----END CERTIFICATE-----)/gs);

my $found = @chunks;

die("$0: Expected to find at least two certificates, found $found instead.\n")
  if ($found < 2);

my $my = shift @chunks;
my $ca = join("\n", @chunks);

writefile("my.pem", "$my\n");
writefile("ca.pem", "$ca\n");

exit(0);

sub writefile {
  my ($fname, $data) = @_;

  open(my $out, ">", $fname)
    or die "Can't open \"$fname\" for write: $!\n";
  print $out $data;
  close $out;
}

Summary

After all said and done, I could have just split the certificate from Let’s Encrypt into two as shown above, and added the sendmail configuration options mentioned everywhere on tutorials, and everything would have been just fine. And had I used certbot, like everyone else, I would have had the ready-to-used certificate files directly.

As it turns out, there was no real need to delve into the details. Sendmail does the right thing anyhow. But understanding what’s going on under the hood is still better, and worth the effort, I think. In particular with a crucial component like sendmail.

Appendix: The Gmail’s client certificate

As it was quite difficult to obtain this certificate (running tcpdump on my server, feeding the result to Wireshark, exporting the certificate as raw bytes and opening the file with openssl as DER), I thought I’d show its printout:

    Data:
        Version: 3 (0x2)
        Serial Number:
            2b:64:a8:5a:82:a3:d2:c2:10:5b:9b:25:ab:75:c1:af
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, O = Google Trust Services, CN = WR4
        Validity
            Not Before: Feb 10 17:58:28 2026 GMT
            Not After : May 11 17:58:27 2026 GMT
        Subject: CN = smtp.gmail.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    00:9e:75:cf:b1:84:c9:a8:f2:bb:c8:89:fe:ef:09:
                    ad:71:d7:2a:1e:e2:b0:51:e2:0b:d5:b9:a7:52:70:
                    e8:c1:ff:5b:60:b6:7c:65:c0:b1:8b:90:cb:cd:ab:
                    0c:da:ef:10:8f:17:79:ed:a5:b9:95:57:f2:28:f2:
                    da:3d:d3:1d:ed:03:a2:6f:88:da:7f:0c:cc:b9:f4:
                    f6:44:ac:bc:fa:95:62:c0:7b:31:8d:44:9c:3f:bf:
                    cf:05:66:8b:a2:7d:9a:dd:af:2b:dc:05:16:b8:37:
                    3c:1f:c5:23:9f:4d:2b:15:a4:97:87:ab:a7:70:3a:
                    4a:5d:2a:8d:d4:21:1a:68:48:da:74:89:6e:1a:27:
                    2f:ef:06:4b:38:b5:65:5f:c4:da:49:96:c5:4e:9f:
                    78:7f:cb:2b:6a:61:ff:f7:0f:f6:f3:d4:d0:7d:94:
                    84:a8:0c:21:8a:a2:a4:20:04:f7:83:ac:00:83:85:
                    eb:9e:01:7a:ea:a5:2a:b9:89:3b:ad:94:2d:c4:c1:
                    2f:49:86:17:52:f7:85:1a:97:76:9d:2f:cf:c4:20:
                    a3:9c:c5:7b:74:57:28:f2:35:d8:ab:fa:d8:53:b9:
                    ee:c9:24:cb:f3:aa:d4:0b:f9:1a:8e:3d:b9:ad:16:
                    7c:99:7c:40:ef:3f:25:5a:c7:94:87:e8:20:bb:19:
                    92:6f
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Key Identifier:
                65:7C:AF:FE:54:FD:A3:0A:53:90:AB:9A:94:E7:AD:DF:DC:B9:8B:58
            X509v3 Authority Key Identifier:
                keyid:9B:C8:11:BC:3D:AA:36:B9:31:8C:4E:8F:44:D5:57:32:2F:C3:C0:61

            Authority Information Access:
                OCSP - URI:http://o.pki.goog/s/wr4/K2Q
                CA Issuers - URI:http://i.pki.goog/wr4.crt

            X509v3 Subject Alternative Name:
                DNS:smtp.gmail.com
            X509v3 Certificate Policies:
                Policy: 2.23.140.1.2.1

            X509v3 CRL Distribution Points: 

                Full Name:
                  URI:http://c.pki.goog/wr4/F-WFK5nQurE.crl

            CT Precertificate SCTs:
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : 96:97:64:BF:55:58:97:AD:F7:43:87:68:37:08:42:77:
                                E9:F0:3A:D5:F6:A4:F3:36:6E:46:A4:3F:0F:CA:A9:C6
                    Timestamp : Feb 10 18:58:35.396 2026 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:46:02:21:00:BE:28:85:4E:52:7D:B5:FC:0C:C7:FA:
                                26:98:AE:D5:C4:86:E1:E1:70:A6:6A:3C:CA:CE:9E:21:
                                17:27:D4:09:BE:02:21:00:89:B7:00:57:51:76:41:FB:
                                D3:73:9B:27:FA:E1:40:2F:51:E1:4F:14:D1:65:18:EE:
                                81:C7:7C:A1:60:BA:6A:BF
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : CB:38:F7:15:89:7C:84:A1:44:5F:5B:C1:DD:FB:C9:6E:
                                F2:9A:59:CD:47:0A:69:05:85:B0:CB:14:C3:14:58:E7
                    Timestamp : Feb 10 18:58:35.450 2026 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:45:02:21:00:B0:B1:6E:A6:C2:1B:49:2A:28:2C:C9:
                                EC:AE:C6:F9:F4:EC:89:64:AC:88:6A:BE:08:86:09:36:
                                17:66:63:49:D0:02:20:5C:CE:E6:21:C3:21:88:15:E1:
                                D9:17:13:D6:0B:E3:F6:54:71:58:C9:55:9F:DA:14:63:
                                F8:69:F1:BC:DD:4B:32
    Signature Algorithm: sha256WithRSAEncryption
         8f:fa:cf:2b:ab:6a:66:07:2a:32:ae:15:39:c8:bf:a6:22:e1:
         b1:55:6d:1f:04:26:4b:34:54:fe:91:cd:61:92:6c:b1:2a:8b:
         47:81:28:84:ee:d1:b7:c2:fc:da:81:fd:74:c4:bf:6e:ba:f1:
         ef:b2:81:77:f1:0b:80:73:78:e1:86:1f:92:c8:92:a7:45:e6:
         26:93:4d:92:a2:2b:d2:02:db:1c:b8:81:4e:56:79:bc:4a:f6:
         8c:6c:f3:2a:a8:09:b2:5f:c2:74:bb:2d:74:0b:ea:3a:50:e7:
         dd:33:61:fa:ed:df:6c:ed:6e:ba:50:8c:54:9d:19:76:03:1b:
         56:7e:55:be:ee:3f:a3:c5:d6:ad:6b:fc:1b:43:ce:aa:50:52:
         af:f6:83:f0:38:f5:62:8d:0b:91:f3:72:f1:b7:10:64:1a:ca:
         02:97:8e:f9:13:a3:5d:1a:1b:ee:5d:01:dd:b0:48:f2:f3:30:
         cf:8d:6a:98:21:8d:83:23:38:c7:80:22:59:97:f0:45:76:fb:
         8c:a9:4e:f8:37:38:de:ba:4e:94:c5:1f:b1:d0:3c:87:69:11:
         ea:90:0d:75:72:82:5a:a3:c3:99:c6:e5:ce:57:05:ed:63:a9:
         2e:20:ab:b6:41:8c:53:e1:92:5c:55:de:bf:3b:d1:d3:ec:08:
         a8:87:9e:c0
-----BEGIN CERTIFICATE-----
MIIFMjCCBBqgAwIBAgIQK2SoWoKj0sIQW5slq3XBrzANBgkqhkiG9w0BAQsFADA7
MQswCQYDVQQGEwJVUzEeMBwGA1UEChMVR29vZ2xlIFRydXN0IFNlcnZpY2VzMQww
CgYDVQQDEwNXUjQwHhcNMjYwMjEwMTc1ODI4WhcNMjYwNTExMTc1ODI3WjAZMRcw
FQYDVQQDEw5zbXRwLmdtYWlsLmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCC
AQoCggEBAJ51z7GEyajyu8iJ/u8JrXHXKh7isFHiC9W5p1Jw6MH/W2C2fGXAsYuQ
y82rDNrvEI8Xee2luZVX8ijy2j3THe0Dom+I2n8MzLn09kSsvPqVYsB7MY1EnD+/
zwVmi6J9mt2vK9wFFrg3PB/FI59NKxWkl4erp3A6Sl0qjdQhGmhI2nSJbhonL+8G
Szi1ZV/E2kmWxU6feH/LK2ph//cP9vPU0H2UhKgMIYqipCAE94OsAIOF654Beuql
KrmJO62ULcTBL0mGF1L3hRqXdp0vz8Qgo5zFe3RXKPI12Kv62FO57skky/Oq1Av5
Go49ua0WfJl8QO8/JVrHlIfoILsZkm8CAwEAAaOCAlIwggJOMA4GA1UdDwEB/wQE
AwIFoDAdBgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwDAYDVR0TAQH/BAIw
ADAdBgNVHQ4EFgQUZXyv/lT9owpTkKualOet39y5i1gwHwYDVR0jBBgwFoAUm8gR
vD2qNrkxjE6PRNVXMi/DwGEwXgYIKwYBBQUHAQEEUjBQMCcGCCsGAQUFBzABhhto
dHRwOi8vby5wa2kuZ29vZy9zL3dyNC9LMlEwJQYIKwYBBQUHMAKGGWh0dHA6Ly9p
LnBraS5nb29nL3dyNC5jcnQwGQYDVR0RBBIwEIIOc210cC5nbWFpbC5jb20wEwYD
VR0gBAwwCjAIBgZngQwBAgEwNgYDVR0fBC8wLTAroCmgJ4YlaHR0cDovL2MucGtp
Lmdvb2cvd3I0L0YtV0ZLNW5RdXJFLmNybDCCAQUGCisGAQQB1nkCBAIEgfYEgfMA
8QB3AJaXZL9VWJet90OHaDcIQnfp8DrV9qTzNm5GpD8PyqnGAAABnEjrcQQAAAQD
AEgwRgIhAL4ohU5SfbX8DMf6Jpiu1cSG4eFwpmo8ys6eIRcn1Am+AiEAibcAV1F2
QfvTc5sn+uFAL1HhTxTRZRjugcd8oWC6ar8AdgDLOPcViXyEoURfW8Hd+8lu8ppZ
zUcKaQWFsMsUwxRY5wAAAZxI63E6AAAEAwBHMEUCIQCwsW6mwhtJKigsyeyuxvn0
7IlkrIhqvgiGCTYXZmNJ0AIgXM7mIcMhiBXh2RcT1gvj9lRxWMlVn9oUY/hp8bzd
SzIwDQYJKoZIhvcNAQELBQADggEBAI/6zyuramYHKjKuFTnIv6Yi4bFVbR8EJks0
VP6RzWGSbLEqi0eBKITu0bfC/NqB/XTEv2668e+ygXfxC4BzeOGGH5LIkqdF5iaT
TZKiK9IC2xy4gU5WebxK9oxs8yqoCbJfwnS7LXQL6jpQ590zYfrt32ztbrpQjFSd
GXYDG1Z+Vb7uP6PF1q1r/BtDzqpQUq/2g/A49WKNC5HzcvG3EGQaygKXjvkTo10a
G+5dAd2wSPLzMM+NapghjYMjOMeAIlmX8EV2+4ypTvg3ON66TpTFH7HQPIdpEeqQ
DXVyglqjw5nG5c5XBe1jqS4gq7ZBjFPhklxV3r870dPsCKiHnsA=
-----END CERTIFICATE-----

For comparison, this is the certificate obtained on the same day, when Gmail responded as a server:

    Data:
        Version: 3 (0x2)
        Serial Number:
            e9:b6:68:79:fa:91:bf:49:10:8d:b9:1e:cc:e8:63:b0
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, O = Google Trust Services, CN = WR2
        Validity
            Not Before: Feb  2 08:37:58 2026 GMT
            Not After : Apr 27 08:37:57 2026 GMT
        Subject: CN = mx.google.com
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)
                pub:
                    04:55:ba:49:43:8f:d9:72:9d:f9:d0:fa:1c:76:ec:
                    73:44:39:69:e7:21:68:49:1f:d0:0e:c4:70:bb:1f:
                    61:15:71:58:a7:44:df:bd:9f:d5:f6:e9:d1:8a:77:
                    73:79:ac:82:e7:30:88:53:95:62:ff:f3:cd:32:71:
                    9e:68:21:a7:62
                ASN1 OID: prime256v1
                NIST CURVE: P-256
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Key Identifier:
                51:B6:13:35:D8:FB:85:27:72:70:77:EE:D7:5B:1D:06:5E:63:FD:51
            X509v3 Authority Key Identifier:
                keyid:DE:1B:1E:ED:79:15:D4:3E:37:24:C3:21:BB:EC:34:39:6D:42:B2:30

            Authority Information Access:
                OCSP - URI:http://o.pki.goog/wr2
                CA Issuers - URI:http://i.pki.goog/wr2.crt

            X509v3 Subject Alternative Name:
                DNS:mx.google.com, DNS:smtp.google.com, DNS:aspmx.l.google.com, DNS:alt1.aspmx.l.google.com, DNS:alt2.aspmx.l.google.com, DNS:alt3.aspmx.l.google.com, DNS:alt4.aspmx.l.google.com, DNS:gmail-smtp-in.l.google.com, DNS:alt1.gmail-smtp-in.l.google.com, DNS:alt2.gmail-smtp-in.l.google.com, DNS:alt3.gmail-smtp-in.l.google.com, DNS:alt4.gmail-smtp-in.l.google.com, DNS:gmr-smtp-in.l.google.com, DNS:alt1.gmr-smtp-in.l.google.com, DNS:alt2.gmr-smtp-in.l.google.com, DNS:alt3.gmr-smtp-in.l.google.com, DNS:alt4.gmr-smtp-in.l.google.com, DNS:mx1.smtp.goog, DNS:mx2.smtp.goog, DNS:mx3.smtp.goog, DNS:mx4.smtp.goog, DNS:aspmx2.googlemail.com, DNS:aspmx3.googlemail.com, DNS:aspmx4.googlemail.com, DNS:aspmx5.googlemail.com, DNS:gmr-mx.google.com
            X509v3 Certificate Policies:
                Policy: 2.23.140.1.2.1

            X509v3 CRL Distribution Points: 

                Full Name:
                  URI:http://c.pki.goog/wr2/oQ6nyr8F0m0.crl

            CT Precertificate SCTs:
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : D1:6E:A9:A5:68:07:7E:66:35:A0:3F:37:A5:DD:BC:03:
                                A5:3C:41:12:14:D4:88:18:F5:E9:31:B3:23:CB:95:04
                    Timestamp : Feb  2 09:38:00.395 2026 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:45:02:20:21:B9:8B:BD:E8:4E:B3:F4:24:46:6B:25:
                                17:CF:53:2E:2E:B7:83:A3:F5:DB:7B:F7:91:70:62:A2:
                                D5:74:B8:20:02:21:00:C9:3D:D4:79:5C:05:59:7C:68:
                                ED:6F:EA:45:59:55:D5:A6:9B:F8:9B:A3:62:AD:8B:2B:
                                30:A0:CC:4A:62:A1:EB
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : 96:97:64:BF:55:58:97:AD:F7:43:87:68:37:08:42:77:
                                E9:F0:3A:D5:F6:A4:F3:36:6E:46:A4:3F:0F:CA:A9:C6
                    Timestamp : Feb  2 09:38:00.184 2026 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:44:02:20:3A:11:AE:85:B9:06:AF:A9:EF:88:25:64:
                                EB:2A:F3:4B:07:50:AF:B9:63:0F:4C:7A:B0:13:F4:CA:
                                0E:58:55:B7:02:20:50:81:1C:CF:06:47:39:AF:8A:F3:
                                27:00:78:34:FD:40:3F:1E:36:E3:2E:42:08:8E:14:B0:
                                09:B0:CA:CE:FD:B9
    Signature Algorithm: sha256WithRSAEncryption
         5e:bf:fc:22:aa:45:d9:35:37:c7:f3:9b:95:5a:e1:eb:2d:72:
         70:ba:ea:c5:ce:10:2e:53:b6:da:f0:54:77:f4:f4:7d:43:df:
         ff:fe:45:18:f3:cb:85:1c:ae:df:0d:a3:10:f1:01:7a:6f:81:
         03:af:c8:1c:d9:26:2b:4d:69:c1:4a:ef:bf:e2:98:cb:a8:c6:
         42:fe:78:4f:d9:82:d9:2c:39:fc:3e:d3:c2:6f:de:b8:e6:dc:
         82:51:04:00:0d:13:1d:2b:0e:fd:2f:56:7c:bf:73:a6:35:46:
         85:12:99:99:1f:1e:cb:9c:a5:e3:64:7f:b0:66:45:f5:ba:97:
         f0:ac:88:41:7e:c7:b0:7d:7f:04:15:c6:8b:0f:58:cd:19:1e:
         fb:b2:8c:f4:a6:dd:7f:8c:84:98:12:49:60:1b:20:c8:14:da:
         b1:fe:11:06:09:be:92:6b:cc:33:cd:e1:93:7c:bd:ca:1c:c9:
         70:71:cf:46:60:6c:db:22:72:9c:0d:00:e0:6a:72:bc:32:13:
         11:f0:8d:2f:95:d5:d9:20:76:9b:86:dd:73:10:8f:fc:a9:51:
         de:1c:90:d2:c8:a6:f9:ff:ab:a9:a8:5f:75:56:ae:a9:25:6a:
         7f:37:ff:67:5e:53:4e:2b:b7:c0:72:3c:9c:1b:68:f9:9a:0a:
         ef:60:6f:f2
-----BEGIN CERTIFICATE-----
MIIGxDCCBaygAwIBAgIRAOm2aHn6kb9JEI25HszoY7AwDQYJKoZIhvcNAQELBQAw
OzELMAkGA1UEBhMCVVMxHjAcBgNVBAoTFUdvb2dsZSBUcnVzdCBTZXJ2aWNlczEM
MAoGA1UEAxMDV1IyMB4XDTI2MDIwMjA4Mzc1OFoXDTI2MDQyNzA4Mzc1N1owGDEW
MBQGA1UEAxMNbXguZ29vZ2xlLmNvbTBZMBMGByqGSM49AgEGCCqGSM49AwEHA0IA
BFW6SUOP2XKd+dD6HHbsc0Q5aechaEkf0A7EcLsfYRVxWKdE372f1fbp0Yp3c3ms
gucwiFOVYv/zzTJxnmghp2KjggSvMIIEqzAOBgNVHQ8BAf8EBAMCB4AwEwYDVR0l
BAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAdBgNVHQ4EFgQUUbYTNdj7hSdy
cHfu11sdBl5j/VEwHwYDVR0jBBgwFoAU3hse7XkV1D43JMMhu+w0OW1CsjAwWAYI
KwYBBQUHAQEETDBKMCEGCCsGAQUFBzABhhVodHRwOi8vby5wa2kuZ29vZy93cjIw
JQYIKwYBBQUHMAKGGWh0dHA6Ly9pLnBraS5nb29nL3dyMi5jcnQwggKGBgNVHREE
ggJ9MIICeYINbXguZ29vZ2xlLmNvbYIPc210cC5nb29nbGUuY29tghJhc3BteC5s
Lmdvb2dsZS5jb22CF2FsdDEuYXNwbXgubC5nb29nbGUuY29tghdhbHQyLmFzcG14
LmwuZ29vZ2xlLmNvbYIXYWx0My5hc3BteC5sLmdvb2dsZS5jb22CF2FsdDQuYXNw
bXgubC5nb29nbGUuY29tghpnbWFpbC1zbXRwLWluLmwuZ29vZ2xlLmNvbYIfYWx0
MS5nbWFpbC1zbXRwLWluLmwuZ29vZ2xlLmNvbYIfYWx0Mi5nbWFpbC1zbXRwLWlu
LmwuZ29vZ2xlLmNvbYIfYWx0My5nbWFpbC1zbXRwLWluLmwuZ29vZ2xlLmNvbYIf
YWx0NC5nbWFpbC1zbXRwLWluLmwuZ29vZ2xlLmNvbYIYZ21yLXNtdHAtaW4ubC5n
b29nbGUuY29tgh1hbHQxLmdtci1zbXRwLWluLmwuZ29vZ2xlLmNvbYIdYWx0Mi5n
bXItc210cC1pbi5sLmdvb2dsZS5jb22CHWFsdDMuZ21yLXNtdHAtaW4ubC5nb29n
bGUuY29tgh1hbHQ0Lmdtci1zbXRwLWluLmwuZ29vZ2xlLmNvbYINbXgxLnNtdHAu
Z29vZ4INbXgyLnNtdHAuZ29vZ4INbXgzLnNtdHAuZ29vZ4INbXg0LnNtdHAuZ29v
Z4IVYXNwbXgyLmdvb2dsZW1haWwuY29tghVhc3BteDMuZ29vZ2xlbWFpbC5jb22C
FWFzcG14NC5nb29nbGVtYWlsLmNvbYIVYXNwbXg1Lmdvb2dsZW1haWwuY29tghFn
bXItbXguZ29vZ2xlLmNvbTATBgNVHSAEDDAKMAgGBmeBDAECATA2BgNVHR8ELzAt
MCugKaAnhiVodHRwOi8vYy5wa2kuZ29vZy93cjIvb1E2bnlyOEYwbTAuY3JsMIIB
AwYKKwYBBAHWeQIEAgSB9ASB8QDvAHYA0W6ppWgHfmY1oD83pd28A6U8QRIU1IgY
9ekxsyPLlQQAAAGcHbdWSwAABAMARzBFAiAhuYu96E6z9CRGayUXz1MuLreDo/Xb
e/eRcGKi1XS4IAIhAMk91HlcBVl8aO1v6kVZVdWmm/ibo2KtiyswoMxKYqHrAHUA
lpdkv1VYl633Q4doNwhCd+nwOtX2pPM2bkakPw/KqcYAAAGcHbdVeAAABAMARjBE
AiA6Ea6FuQavqe+IJWTrKvNLB1CvuWMPTHqwE/TKDlhVtwIgUIEczwZHOa+K8ycA
eDT9QD8eNuMuQgiOFLAJsMrO/bkwDQYJKoZIhvcNAQELBQADggEBAF6//CKqRdk1
N8fzm5Va4estcnC66sXOEC5TttrwVHf09H1D3//+RRjzy4Ucrt8NoxDxAXpvgQOv
yBzZJitNacFK77/imMuoxkL+eE/ZgtksOfw+08Jv3rjm3IJRBAANEx0rDv0vVny/
c6Y1RoUSmZkfHsucpeNkf7BmRfW6l/CsiEF+x7B9fwQVxosPWM0ZHvuyjPSm3X+M
hJgSSWAbIMgU2rH+EQYJvpJrzDPN4ZN8vcocyXBxz0ZgbNsicpwNAOBqcrwyExHw
jS+V1dkgdpuG3XMQj/ypUd4ckNLIpvn/q6moX3VWrqklan83/2deU04rt8ByPJwb
aPmaCu9gb/I=
-----END CERTIFICATE-----

Note the long list of alternative names. I wasn’t sure if they are respected by mail servers as well, but here they are. I fetched this certificate from alt1.gmail-smtp-in.l.google.com, actually, and not “directly” from mx.google.com.

Un-ignore /usr/lib/systemd/ in .gitignore with git repo on root filesystem

eli — Tue, 23 Dec 2025 14:27:35 +0000

Actually, this is about un-ignoring any subdirectory that is grandchild to an ignored directory.

Running Linux Mint 22.2 (based upon Ubuntu 24.04), and having a git repository on root filesystem to keep track of the computer’s configuration, the vast majority of directories are ignored. One of the is /lib, however /lib/systemd/ should not be ignored, as it contains crucial files for the system’s configuration.

On other distributions, the relevant part in .gitignore usually goes:

[ ... ]
bin/
boot/
dev/
home/
lib/*
!lib/systemd/
lib64/
lib32/
libx32/
lost+found/
media/
mnt/
opt/
proc/
root/
run/
sbin/
[ ... ]

So lib/ isn’t ignored as a directory, but all its content, including subdirectories is. That allows for un-ignoring lib/systemd/ on the following row. That’s why lib/ isn’t ignore-listed like the other ones.

But on Linux Mint 22.2, /lib is a symbolic link to /usr/lib. And since git treats a symbolic link just like a file, /lib/systemd/ is treated as /usr/lib/systemd. Ignoring /lib as a directory has no effect, and un-ignoring /lib/systemd has no effect, because to git, this directory doesn’t even exist.

So go

$ man gitignore

and try to figure out what to do. It’s quite difficult actually, but it boils down to this:

usr/*
!usr/lib/
usr/lib/*
!usr/lib/systemd/

It’s a bit tangled, but the point is that /usr/lib is un-ignored, then all its files are ignored, and then /usr/lib/systemd is un-ignored.

The only good part about this solution is that it works.

Footnote and whole-page layout with wkhtmltopdf

eli — Sat, 29 Nov 2025 11:58:40 +0000

This HTML code makes wkhtmltopdf create a single page with a footnote. If the external

is duplicated, separate pages are generated.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
head>
<body>
<div style="height: 1350px; display: flex; flex-direction: column; break-inside: avoid; border:1px solid #668;">
This is just a test.
<div style="margin-top: auto;">
This is a footnote
div>
div>
body>
html>

So how does it work? The important part is the “style” attribute of the outer

tag:

height: 1350px: This sets the
block’s height to a full A4 page. Why 1350 pixels? I don’t know. I just tweaked with this figure until it got right. It’s possible another figure is needed on a different version of wkhtmltopdf. I’ve tried to set this with cm as well as pt units, but none corresponded to the standard figures for an A4 page. So I went with pixels, which clarifies that it’s a wild guess.
display: flex; flex-direction: column: This turns this
block into a Flexbox container, with vertical packing. This is needed to push the footnote’s block to the bottom.
break-inside: avoid: This tells wkhtmltopdf to avoid page breaks in the middle of the block. This makes no difference for a single page, but if this
block is repeated, this style attribute ensures that each block gets a separate page (unless any of the pages exceeds a page’s height).
border:1px solid #668: This generates a border around the
block’s occupied area. Used only for finding the correct height attribute, and should should be removed afterwards (unless this border is desired on every page).

The footnote is pushed to the bottom of the page by virtue of the margin-top: auto style attribute and the fact that the

block having this attribute is within a vertical packed Flexbox container.

Notes:

This was done with wkhtmltopdf 0.12.4, without the “wkhtmltopdf patches” according to the man page.
If the height is too large on any page, all break-inside are ignored. In other words, the whole pdf document gets garbled, not just around the page that went wrong.
I tried changing the resolution on my X11 display, and it didn’t make any difference. This might sound like a silly thing to check, but wkhtmltopdf depends on the X11 server.

Notes on installing Linux Mint 22.2 (full disk encryption)

eli — Wed, 08 Oct 2025 12:37:35 +0000

Introduction

These are my notes to self for the next time I install a Linux system. As if I read my previous posts before attempting.

So I installed Linux Mint 22.2 (kernel 6.14.0-29) on a Lenovo V14 G4 IRU laptop. With Cinnamon, of course, not that it’s relevant.

All that I wanted was a full-disk encryption, but being allowed to choose the setup of the partitions explicitly, and not let the installation wizard make the choices for me. In particular, I wanted a swap partition with the size I choose, and even more important: Almost all disk space in a /storage mount, so that one can fill the hard disk with junk without risking a system failure because root partition is full.

Cutting the cake

/boot/efi is where the BIOS reads from. It’s natural to put it as the first partition, and it can be very small (even 1 MB can be enough in some cases, but let’s not push it like I eventually did). But if you make it really small, it’s a FAT16 partition, not FAT32, and that’s OK. Don’t force it into FAT32, because the system won’t boot with it if it has less clusters than required.

So it goes like this:

Create three partitions:
- First one for /boot/efi (e.g. nvme0n1p1), 10 MB. This must be FAT32 or FAT16 (the latter for 10MB). Note that 10MB is a bit too small, because the BIOS won’t have room for its own backup this way.
- Second one for /boot, will contain initramfs images, so ~500 MB. Any filesystem that GRUB can read (so ext4 is definitely OK)
- Third partition for LUKS
In the LUKS partition, create an LVM with partitions for / (100G) and swap. The rest is for /storage.

It’s somewhat confusing that /boot/efi is a subdirectory of /boot, but that’s the way it is.

Running the installation wizard

Unlock the encrypted partition, if it’s not already (e.g. with the “Disks” GUI utility). This requires giving the passphrase, of course.
Double-click the “Install Linux Mint” icon on the desktop.
When reaching the “Installation type”, pick “Something else”.
Set the following mount points:
- Set the small FAT partition (nvme0n1p1 in my case) as “EFI System Partition”
- /boot on the partition allocated for that (non-encrypted partition, possibly ext4).
- / on the relevant LVM partition inside the encrypted block
- Set the swap partition
Set the “Device for boot loader installation” to the one allocated for “EFI System Partition” (nvme0n1p1) in my case. One may wonder why this isn’t done automatically. Note that it’s the first partition (/dev/nvme0n1p1) and not the entire disk (/dev/nvme0n1).
Don’t do anything with the planned /storage partition. As I don’t want to assign it with a mounting point, handle it after the installation is done.

If the installation ends with a failure to install GRUB, run “journalctl” on a terminal window and look for error messages from the grub installer. Don’t ask ChatGPT to help you with solving any issues, and don’t ask me why I know it’s a bad idea.

When I insisted on FAT32

Sometimes I’m too much of a control freak, and when the Disks utility formatted the EFI partition into FAT16, I thought, oh no, it should be FAT32, what if the BIOS won’t play ball?

Well, that was silly of me, and also silly to ignore the warning about a FAT32 filesystem with just 10 MB having too few clusters.

So even though the installer wizard finished successfully, there was no option to boot from the disk. Secure boot was disabled, of course. And yet, there was no suitable option in the BIOS’ boot menu. There was a “UEFI” option there, which is always in black (not possible to select), but that doesn’t seem to be relevant.

Following the warm advice of ChatGPT, I added an entry while in Live USB mode:

# efibootmgr -c -d /dev/nvme0n1p1 -p 1 -L "Linux Mint" -l '\EFI\ubuntu\grubx64.efi' 
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0003,2001,2002,2003
Boot0000* EFI USB Device (SanDisk Cruzer Blade)	UsbWwid(781,5567,0,4C53011006040812233)/CDROM(1,0x2104,0xa000)RC
Boot0001* EFI PXE 0 for IPv4 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv4(0.0.0.00.0.0.0,0,0)RC
Boot0002* EFI PXE 0 for IPv6 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv6([::]:<->[::]:,0,0)RC
Boot2001* EFI USB Device	RC
Boot2002* EFI DVD/CDROM	RC
Boot2003* EFI Network	RC
Boot0003* Linux Mint	HD(1,GPT,12345678-aaaa-bbbb-cccc-dddddddddddd,0x800,0x5000)/File(\EFI\ubuntu\grubx64.efi)

(some identifying numbers replaced trivially)

Cute, heh? But it made no difference. After rebooting with a Live USB again:

# efibootmgr
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 2001,2002,2003
Boot0000* EFI USB Device (SanDisk Cruzer Blade)	UsbWwid(781,5567,0,4C53011006040812233)/CDROM(1,0x2104,0xa000)RC
Boot0001* EFI PXE 0 for IPv4 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv4(0.0.0.00.0.0.0,0,0)RC
Boot0002* EFI PXE 0 for IPv6 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv6([::]:<->[::]:,0,0)RC
Boot2001* EFI USB Device	RC
Boot2002* EFI DVD/CDROM	RC
Boot2003* EFI Network	RC

So the entry was gone.

I changed the EFI partition to FAT16, ran through the installation all the way again. And immediately after the installation was done (before booting to start from disk for the first time):

# efibootmgr
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0003,2001,2002,2003
Boot0000* EFI USB Device (SanDisk Cruzer Blade)	UsbWwid(781,5567,0,4C53011006040812233)/CDROM(1,0x2104,0xa000)RC
Boot0001* EFI PXE 0 for IPv4 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv4(0.0.0.00.0.0.0,0,0)RC
Boot0002* EFI PXE 0 for IPv6 (AA-BB-CC-DD-EE-FF) 	PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(aabbccddeeff,0)/IPv6([::]:<->[::]:,0,0)RC
Boot0003* Ubuntu	HD(1,GPT,12345678-aaaa-bbbb-cccc-dddddddddddd,0x800,0x5000)/File(\EFI\ubuntu\shimx64.efi)
Boot2001* EFI USB Device	RC
Boot2002* EFI DVD/CDROM	RC
Boot2003* EFI Network	RC

This time, when the laptop went on, the BIOS came up with a “Please don’t power off while completing system update”. What it actually did was to write its own backup file into the EFI partition, which appears as /boot/efi/BackupSbb.bin. Actually, it doesn’t seem like it was successful, as the space in the partition ran out. So I deleted this file and turned off the “BIOS Self-Healing” option in the BIOS’ configuration (it will be much worse if it attempts to self-heal on a faulty backup file).

At this point, there was an “ubuntu” entry in the list of boot options in BIOS’ boot menu (Not “Ubuntu”, but “ubuntu”, probably referring to the directory and not the name). And the black “UEFI” option remained in the option list, not possible to choose. So this is why I don’t think it’s relevant.

Asking for a passphrase is too much to ask?

Having reached this far, I got a nice Linux Mint logo on the screen, however nothing happened, and then I got thrown into a initramfs rescue shell. In other words, no attempt to unlock the encrypted partition.

So I ran the live USB again, unlocked the root partition and mounted it as /mnt/root/.

Then, as root (sudo su), bind-mounted the essential directories into the root filesystem:

# for d in /dev /dev/pts /proc /sys /run; do mount --bind $d /mnt/root/$d ; done

And then chrooted into it.

# chroot /mnt/root

Of course, there was no /etc/crypttab, so no wonder that the installation didn’t take unlocking the encrypted partition into account.

So I followed my own instruction from a previous post. First, mount /boot and /boot/efi with

# mount -a

and then check for the UUID of the encrypted partition:

# cryptsetup luksUUID /dev/nvme0n1p2
11223344-5566-7788-99aa-bbccddeeff00

and then add /etc/crypttab reading

luks-11223344-5566-7788-99aa-bbccddeeff00 UUID=11223344-5566-7788-99aa-bbccddeeff00 none luks

Note that the luks-{UUID} part is as the name of the partition as it appears in /dev/mapper. In this case, this was what the Disks GUI utility chose. Had I done this with command line, I could have chosen a shorter name. But who cares.

And finally, edit /etc/default/grub for your preferences, update initramfs and GRUB, exactly as already mentioned in that post:

# update-initramfs -u
# update-grub
# grub-install

It was really exactly the same as the previous post. And then reboot, and all was finally fine.

And by the way, the initrd file is 77 MB. Running update-initramfs again didn’t make it smaller. Not a big deal with a flash disk, anyhow.

But GRUB can open LUKS too!

GRUB has cryptodisk and luks modules which can open an encrypted partition, so in principle it can read the kernel from an encrypted root partition. However there is no mechanism I’m aware of to pass over the unlocked encrypted partition to the kernel, so it would be necessary to supply the passphrase twice when booting.

This is why I went for two partitions for booting. I guess this still is the only sane way.

Measuring how much RAM a Linux service eats

eli — Fri, 13 Dec 2024 17:20:02 +0000

Introduction

Motivation: I wanted to move a service to another server that is dedicated only to that service. But how much RAM does this new server need? RAM is $$$, so too much is a waste of money, too little means problems.

The method is to run the service and expose it to a scenario that causes it to consume RAM. And then look at the maximal consumption.

This can be done with “top” and similar programs, but these show the current use. I needed the maximal RAM use. Besides, a service may spread out its RAM consumption across several processes. It’s the cumulative consumption that is interesting.

The appealing solution is to use the fact that systemd creates a cgroup for the service. The answer hence lies in the RAM consumption of the cgroup as a whole. It’s also possible to create a dedicated cgroup and run a program within that one, as shown in another post of mine.

This method is somewhat crude, because this memory consumption includes disk cache as well. In other words, this method shows how much RAM is consumed when there’s plenty of memory, and hence when there’s no pressure to reclaim any RAM. Therefore, if the service runs on a server with less RAM (or the service’s RAM consumption is limited in the systemd unit file), it’s more than possible that everything will work just fine. It might run somewhat slower due to disk access that was previously substituted by the cache.

So using a server with as much memory as measured by the test described below (plus some extra for the OS itself) will result in quick execution, but it might be OK to go for less RAM. A tight RAM limit will cause a lot of disk activity at first, and only afterwards will processes be killed by the OOM killer.

Where the information is

All said in this post relates to Linux kernel v4.15. Things are different with later kernels, not necessarily for the better.

There are in principle two versions of the interface with cgroup’s memory management: First, the one I won’t use, which is cgroup-v2 (or maybe this doc for v2 is better?). The sysfs files for this interface for a service named “theservice” reside in /sys/fs/cgroup/unified/system.slice/theservice.service.

I shall be working with the memory control of cgroup-v1. The sysfs files in question are in /sys/fs/cgroup/memory/system.slice/theservice.service/.

If /sys/fs/cgroup/memory/ doesn’t exist, it might be necessary to mount it explicitly. Also, if system.slice doesn’t exist under /sys/fs/cgroup/memory/ it’s most likely because systemd’s memory accounting is not in action. This can be enabled globally, or by setting MemoryAccounting=true on the service’s systemd unit (or maybe any unit?).

Speaking of which, it might be a good idea to set MemoryMax in the service’s systemd unit in order to see what happens when the RAM is really restricted. Or change the limit dynamically, as shown below.

And there’s always the alternative of creating a separate cgroup and running the service in that group. I’ll refer to my own blog post again.

Getting the info

All files mentioned below are in /sys/fs/cgroup/unified/system.slice/theservice.service/ (assuming that the systemd service in question is theservice).

The maximal memory used: memory.max_usage_in_bytes. As it’s name implies this is the maximal amount of RAM used, measured in bytes. This includes disk cache, so the number is higher than what appears in “top”.

The memory currently used: memory.usage_in_bytes.

For more detailed info about memory use: memory.stat. For example:

$ cat memory.stat 
cache 1138688
rss 4268224512
rss_huge 0
shmem 0
mapped_file 516096
dirty 0
writeback 0
pgpgin 36038063
pgpgout 34995738
pgfault 21217095
pgmajfault 176307
inactive_anon 0
active_anon 4268224512
inactive_file 581632
active_file 401408
unevictable 0
hierarchical_memory_limit 4294967296
total_cache 1138688
total_rss 4268224512
total_rss_huge 0
total_shmem 0
total_mapped_file 516096
total_dirty 0
total_writeback 0
total_pgpgin 36038063
total_pgpgout 34995738
total_pgfault 21217095
total_pgmajfault 176307
total_inactive_anon 0
total_active_anon 4268224512
total_inactive_file 581632
total_active_file 401408
total_unevictable 0

Note the “cache” part at the beginning. It’s no coincidence that it’s first. That’s the most important part: How much can be reclaimed just by flushing the cache.

On a 6.1.0 kernel, I’ve seen memory.peak and memory.current instead of memory.max_usage_in_bytes and memory.usage_in_bytes. memory.peak wasn’t writable however (neither in its permissions nor was it possible to write to it), so it wasn’t possible to reset the max level.

Setting memory limits

It’s possible to set memory limits in systemd’s unit file, but it can be more convenient to do this on the fly. In order to set the hard limit of memory use to 40 MiB, go (as root)

# echo 40M > memory.limit_in_bytes

To disable the limit, pick an unreasonably high number, e.g.

# echo 100G > memory.limit_in_bytes

Note that restarting the systemd service has no effect on these parameters (unless a memory limit is required in the unit file). The cgroup directory remains intact.

Resetting between tests

To reset the maximal value that has been recorded for RAM use (as root)

# echo 0 > memory.max_usage_in_bytes

But to really want to start from fresh, all disk cache needs to be cleared as well. The sledge-hammer way is going

# echo 1 > /proc/sys/vm/drop_caches

This frees the page caches system-wide, so everything running on the computer will need to re-read things again from the disk. There’s a slight and temporary global impact on the performance. On a GUI desktop, it gets a bit slow for a while.

A message like this will appear in the kernel log in response:

bash (43262): drop_caches: 1

This is perfectly fine, and indicates no error.

Alternatively, set a low limit for the RAM usage with memory.limit_in_bytes, as shown above. This impacts the cgroup only, forcing a reclaim of disk cache.

Two things that have no effect:

Reducing the soft limit (memory.soft_limit_in_bytes). This limit is relevant only when the system is in a shortage of RAM overall. Otherwise, it does nothing.
Restarting the service with systemd. It wouldn’t make any sense to flush a disk cache when restarting a service.

It’s of course a good idea to get rid of the disk cache before clearing memory.max_usage_in_bytes, so the max value starts without taking the disk cache into account.

A function similar to Perl’s die() in bash

eli — Mon, 12 Aug 2024 06:51:13 +0000

This is maybe a bit silly, but Perl has a die() function that is really handy for quitting a script with an error message. And I kind of miss it in Bash. So it can be defined with this simple one-liner:

function die { echo $1 ; exit 1 ; }

And then it can be used with something like:

unzip thefile.zip || die "Unzip returned with error status"

The Perl feeling, in Bash.

Linux kernel workqueues: Is it OK for the worker function to kfree its own work item?

eli — Tue, 30 Jul 2024 06:37:14 +0000

Freeing yourself

Working with Linux kernel’s workqueues, I incremented a kref reference count before queuing a work item, in order to make sure that the data structure that it operated on will still be in memory while it runs. Just before returning, the work item’s function decremented this reference count, and as a result, the data structure’s memory could be freed at that very moment.

The thing was, that this data structure also included the work item’s own struct work_struct. In other words, the work item’s function could potentially free the entry that was pushed into the workqueue on its behalf. Could this possibly be allowed?

The short answer is yes. It’s OK to call kfree() on the memory of the struct work_struct of the currently running work item. No risk for use-after-free (UAF).

It’s also OK to requeue the work item on the same workqueue (or on a different one). All in all, the work item’s struct is just a piece of unused memory as soon as the work item’s function is called.

On the other hand, don’t think about calling destroy_workqueue() on the workqueue on which the running work item is queued: destroy_workqueue() waits for all work items to finish before destroying the queue, which will never happen if the request to destroy the queue came from one of its own work items.

From the horse’s mouth

I didn’t find any documentation on this topic, but there are a couple of comments in the source code, namely in the process_one_work() function in kernel/workqueue.c: First, this one by Tejun Heo from June 2010:

/*
 * It is permissible to free the struct work_struct from
 * inside the function that is called from it, this we need to
 * take into account for lockdep too.  To avoid bogus "held
 * lock freed" warnings as well as problems when looking into
 * work->lockdep_map, make a copy and use that here.
 */

And this comes after calling the work item’s function, worker->current_func(work). Written by Arjan van de Ven in August 2010.

/*
 * While we must be careful to not use "work" after this, the trace
 * point will only record its address.
 */
trace_workqueue_execute_end(work, worker->current_func);

The point of this comment is that the value of @work will be used by the call to trace_workqueue_execute_end(), but it won’t be used as a pointer. This emphasizes the commitment of not touching what @work points at, i.e. the memory segment may have been freed.

How it’s done

process_one_work(), which is the only function that calls the work item’s function, is clearly written in a way that ignores the work item’s struct after calling the work item’s function.

The first thing is that it copies the address of the work function into the worker struct:

worker->current_func = work->func;

It then removes the work item from the workqueue:

list_del_init(&work->entry);

And later on, it calls the function, using the copy of the pointer (even though it could also have used the original at this point).

worker->current_func(work);

After this, the @work variable isn’t used anymore as a pointer.

Installing GRUB 2 manually with rescue-like techniques

eli — Fri, 12 Jul 2024 15:16:20 +0000

Introduction

It’s rarely necessary to make an issue of installing and maintaining the GRUB bootloader. However, for reasons explained in a separate post, I wanted to install GRUB 2.12 on an old distribution (Debian 8). So it required some acrobatics. That said, it doesn’t limit the possibility to install new kernels in the future etc. If you’re ready to edit a simple text file, rather than running automatic tools, that is. Which may actually be a good idea anyhow.

The basics

Grub has two parts: First, there’s the initial code that is loaded by the BIOS, either from the MBR or from the EFI partition. That’s the plain GRUB executable. This executable goes directly to the ext2/3/4 root partition, and reads from /boot/grub/. That directory contains, among others, the precious grub.cfg file, which GRUB reads in order to decide which modules to load, which menu entries to display and how to act if each is selected.

grub.cfg is created by update-grub, which effectively runs “grub-mkconfig -o /boot/grub/grub.cfg”.

This file is created from /etc/grub.d/ and settings from /etc/default/grub, and based upon the kernel image and initrd files that are found in /boot.

Hence an installation of GRUB consists of two tasks, which are fairly independent:

Running grub-install so that the MBR or EFI partition are set to run GRUB, and that /boot/grub/ is populated with modules and other stuff. The only important thing is that this utility knows the correct disk to target and where the partition containing /boot/grub is.
Running update-grub in order to create (or update) the /boot/grub/grub.cfg file. This is normally done every time the content of /boot is updated (e.g. a new kernel image).

Note that grub-install populates /boot/grub with a lot of files that are used by the bootloader, so it’s necessary to run this command if /boot is wiped and started from fresh.

What made this extra tricky for me, was that Debian 8 comes with an old GRUB 1 version. Therefore, the option of chroot’ing into the filesystem for the purpose of installing GRUB was eliminated.

So there were two tasks to accomplish: Obtaining a suitable grub.cfg and running grub-install in a way that will do the job.

This is a good time to understand what this grub.cfg file is.

The grub.cfg file

grub.cfg is a script, written with a bash-like syntax. and is based upon an internal command set. This is a plain file in /boot/grub/, owned by root:root and writable by root only, for obvious reasons. But for the purpose of booting, permissions don’t make any difference.

Despite the “DO NOT EDIT THIS FILE” comment at the top of this file, and the suggestion to use grub-mkconfig, it’s perfectly OK to edit it for the purposes of updating the behavior of the boot menu. This is unnecessarily complicated in most cases, even when rescuing a system from a Live ISO system: There’s always the possibility to chroot into the target’s root filesystem and call grub-mkconfig from there. That’s usually all that is necessary to update which kernel image / initrd should be kicked off.

That said, it might also be easier to edit this file manually in order to add menu entries for new kernels, for example. In addition, automatic utilities tend to add a lot of specific details that are unnecessary, and that can fail the boot process, for example if the file system’s UUID changes. So maintaining a clean grub.cfg manually can pay off in the long run.

The most interesting part in this file is the menuentry section. Let’s look at a sample command:

menuentry 'Ubuntu' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-a0c2e12e-5d16-4aac-b11d-15cbec5ae98e' {
	recordfail
	load_video
	gfxmode $linux_gfx_mode
	insmod gzio
	if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
	insmod part_gpt
	insmod ext2
	search --no-floppy --fs-uuid --set=root a0c2e12e-5d16-4aac-b11d-15cbec5ae98e
	linux	/boot/vmlinuz-6.8.0-36-generic root=UUID=a0c2e12e-5d16-4aac-b11d-15cbec5ae98e ro
	initrd	/boot/initrd.img-6.8.0-36-generic
}

So these are a bunch of commands that run if the related menu entry is chosen. I’ll discuss “menuentry” and “search” below. Note the “insmod” commands, which load ELF executable modules from /boot/grub/i386-pc/. GRUB also supports lsmod, if you want to try it with GRUB’s interactive command interface.

The menuentry command

The menuentry command is documented here. Let’s break down the command in this example:

menuentry: Obviously, the command itself.
‘Ubuntu’: The title, which is the part presented to the user.
–class ubuntu –class gnu-linux –class gnu –class os: The purpose of these class flags is to help GRUB group the menu options nicer. Usually redundant.
$menuentry_id_option ‘gnulinux-simple-a0c2e12e-5d16-4aac-b11d-15cbec5ae98e’: “$menuentry_id_option” expands into “–id”, so this gives the menu option a unique identifier. It’s useful for submenus, otherwise not required.

Bottom line: If there are no submenus (in the original file there actually are), this header would have done the job as well:

menuentry 'Ubuntu for the lazy' {

The search command

The other interesting part is this row within the menucommand clause:

search --no-floppy --fs-uuid --set=root a0c2e12e-5d16-4aac-b11d-15cbec5ae98e

The search command is documented here. The purpose of this command is to set the $root environment variable, which is what the “–set=root” part means (this is an unnecessary flag, as $root is the target variable anyhow). This tells GRUB in which filesystem to look for the files mentioned in the “linux” and “initrd” commands.

On a system with only one Linux installed, the “search” command is unnecessary: Both $root and $prefix are initialized according to the position of the /boot/grub, so there’s no reason to search for it again.

In this example, the filesystem is defined according to the its UUID , which can be found with this Linux command:

# dumpe2fs /dev/vda2 | grep UUID

It’s better to remove this “search” command if there’s only one /boot directory in the whole system (and it contains the linux kernel files, of course). The advantage is the Linux system can be installed just by pouring all files into an ext4 filesystem (including /boot) and then just run grub-install. Something that won’t work if grub.cfg contains explicit UUIDs. Well, actually, it will work, but with an error message and a prompt to press ENTER: The “search” command fails if the UUID is incorrect, but it wasn’t necessary to begin with, so $root will retain it’s correct value and the system can boot properly anyhow. Given that ENTER is pressed. That hurdle can be annoying on a remote virtual machine.

A sample menuentry command

I added these lines to my grub.cfg file in order to allow future self to try out a new kernel without begin too scared about it:

menuentry 'Unused boot menu entry for future hacks' {
        recordfail
        load_video
        gfxmode $linux_gfx_mode
        insmod gzio
        if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
        insmod part_gpt
        insmod ext2
        linux   /boot/vmlinuz-6.8.12 root=/dev/vda3 ro
}

This is just an implementation of what I said above about the “menuentry” and “search” commands above. In particular, that the “search” command is unnecessary. This worked well on my machine.

As for the other rows, I suggest mixing and matching with whatever appears in your own grub.cfg file in the same places.

Obtaining a grub.cfg file

So the question is: How do I get the initial grub.cfg file? Just take one from a random system? Will that be good enough?

Well, no, that may not work: The grub.cfg is formed differently, depending in particular on how the filesystems on the hard disk are laid out. For example, comparing two grub.cfg files, one had this row:

insmod lvm

and the other didn’t. Obviously, one computer utilized LVM and the other didn’t. Also, in relation to setting the $root variable, there were different variations, going from the “search” method shown above to simply this:

set root='hd0,msdos1'

My solution was to install a Ubuntu 24.04 system on the same KVM virtual machine that I intended to install Debian 8 on later. After the installation, I just copied the grub.cfg and wiped the filesystem. I then installed the required distribution and deleted everything under /boot. Instead, I added this grub.cfg into /boot/grub/ and edited it manually to load the correct kernel.

As I kept the structure of the harddisk and the hardware environment remained unchanged, this worked perfectly fine.

Running grub-install

Truth to be told, I probably didn’t need to use grub-install, since the MBR was already set up with GRUB thanks to the installation I had already carried out for Ubuntu 24.04. Also, I could have copied all other files in /boot/grub from this installation before wiping it. But I didn’t, and it’s a good thing I didn’t, because this way I found out how to do it from a Live ISO. And this might be important for rescue purposes, in the unlikely and very unfortunate event that it’s necessary.

Luckily, grub-install has an undocumented option, –root-directory, which gets the job done.

# grub-install --root-directory=/mnt/new/ /dev/vda
Installing for i386-pc platform.
Installation finished. No error reported.

Note that using –boot-directory isn’t good enough, even if it’s mounted. Only –root-directory makes GRUB detect the correct root directory as the place to fetch the information from. With –boot-directory, the system boots with no menus.

Running update-grub

If you insist on running update-grub, be sure to edit /etc/default/grub and set it this way:

GRUB_TIMEOUT=3
GRUB_RECORDFAIL_TIMEOUT=3

The previous value for GRUB_TIMEOUT is 0, which is supposed to mean to skip the menu. If GRUB deems the boot media not to be writable, it considers every previous boot as a failure (because it can’t know if it was successful or not), and sets the timeout to 30 seconds. 3 seconds are enough, thanks.

And then run update-grub.

# update-grub
Sourcing file `/etc/default/grub'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.0-36-generic
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done

Alternatively, edit grub.cfg and fix it directly.

A note about old GRUB 1

This is really not related to anything else above, but since I made an attempt to install Debian 8′s GRUB on the hard disk at some point, this is what happened:

# apt install grub
# grub --version
grub (GNU GRUB 0.97)

# update-grub 
Searching for GRUB installation directory ... found: /boot/grub
Probing devices to guess BIOS drives. This may take a long time.
Searching for default file ... Generating /boot/grub/default file and setting the default boot entry to 0
Searching for GRUB installation directory ... found: /boot/grub
Testing for an existing GRUB menu.lst file ... 

Generating /boot/grub/menu.lst
Searching for splash image ... none found, skipping ...
Found kernel: /boot/vmlinuz
Found kernel: /boot/vmlinuz-6.8.0-31-generic
Updating /boot/grub/menu.lst ... done

# grub-install /dev/vda
Searching for GRUB installation directory ... found: /boot/grub
The file /boot/grub/stage1 not read correctly.

The error message about /boot/grub/stage1 appears to be horribly misleading. According to this and this, among others, the problem was that the ext4 file system was created with 256 as the inode size, and GRUB 1 doesn’t support that. Which makes sense, as the installation was done on behalf of Ubuntu 24.04 and not a museum distribution.

The solution is apparently to wipe the filesystem correctly:

# mkfs.ext4 -I 128 /dev/vda3

Actually, I don’t know if this was really the problem, because I gave up this old GRUB version quite soon.

Migrating an OpenVZ container to KVM

eli — Fri, 12 Jul 2024 15:11:13 +0000

Introduction

My Debian 8-based web server had been running for several years as an OpenVZ container, when the web host told me that containers are phased out, and it’s time to move on to a KVM.

This is an opportunity to upgrade to a newer distribution, most of you would say, but if a machine works flawlessly for a long period of time, I’m very reluctant to change anything. Don’t touch a stable system. It just happened to have an uptime of 426 days, and the last time this server caused me trouble was way before that.

So the question is if it’s possible to convert a container into a KVM machine, just by copying the filesystem. After all, what’s the difference if /sbin/init (systemd) is kicked off as a plain process inside a container or if the kernel does the same thing?

The answer is yes-ish, this manipulation is possible, but it requires some adjustments.

These are my notes and action items while I found my way to get it done. Everything below is very specific to my own slightly bizarre case, and at times I ended up carrying out tasks in a different order than as listed here. But this can be useful for understanding what’s ahead.

By the way, the wisest thing I did throughout this process, was to go through the whole process on a KVM machine that I built on my own local computer. This virtual machine functioned as a mockup of the server to be installed. Not only did it make the trial and error much easier, but it also allowed me to test all kind of things after the real server was up and running without messing the real machine.

Faking Ubuntu 24.04 LTS

To make things even more interesting, I also wanted to push the next time I’ll be required to mess with the virtual machine as long as possible into the future. Put differently, I wanted to hide the fact that the machine runs on ancient software. There should not be a request to upgrade in the foreseeable future because the old system isn’t compatible with some future version of KVM.

So to the KVM hypervisor, my machine should feel like an Ubuntu 24.04, which was the latest server distribution offered at the time I did this trick. Which brings the question: What does the hypervisor see?

The KVM guest interfaces with its hypervisor in three ways:

With GRUB, which accesses the virtual disk.
Through the kernel, which interacts with the virtual hardware.
Through the guest’s DHCP client, which fetches the IP address, default gateway and DNS from the hypervisor’s dnsmasq.

Or so I hope. Maybe there’s some aspect I’m not aware of. It’s not like I’m such an expert in virtualization.

So the idea was that both GRUB and the kernel should be the same as in Ubuntu 24.04. This way, any KVM setting that works with this distribution will work with my machine. The Naphthalene smell from the user-space software underneath will not reach the hypervisor.

This presumption can turn out to be wrong, and the third item in the list above demonstrates that: The guest machine gets its IP address from the hypervisor through a DHCP request issued by systemd-networkd, which is part of systemd version 215. So the bluff is exposed. Will there be some kind of incompatibility between the old systemd’s DHCP client and some future hypervisor’s response?

Regarding this specific issue, I doubt there will be a problem, as DHCP is such a simple and well-established protocol. And even if that functionality broke, the IP address is fixed anyhow, so the virtual NIC can be configured statically.

But who knows, maybe there is some kind of interaction with systemd that I’m not aware of? Future will tell.

So it boils down to faking GRUB and using a recent kernel.

Solving the GRUB problem

Debian 8 comes with GRUB version 0.97. Could we call that GRUB 1? I can already imagine the answer to my support ticket saying “please upgrade your system, as our KVM hypervisor doesn’t support old versions of GRUB”.

So I need a new one.

Unfortunately, the common way to install GRUB is with a couple of hocus-pocus tools that do the work well in the usual scenario.

As it turns out, there are two parts that need to be installed: The first part consists of the GRUB binary on the boot partition (GRUB partition or EFI, pick your choice), plus several files (modules and other) in /boot/grub/. The second part is a script file, grub.cfg, which is a textual file that can be edited manually.

To make a long story short, I installed the distribution on a virtual machine with the same layout, and made a copy of the grub.cfg file that was created. I then edited this file directly to fit into the new machine. As for installing GRUB binary, I did this from a Live ISO Ubuntu 24.04, so it’s genuine and legit.

For the full and explained story, I’ve written a separate post.

Fitting a decent kernel

This way or another, a kernel and its modules must be added to the filesystem in order to convert it from a container to a KVM machine. This is the essential difference: With a container, one kernel runs all containers and gives them the illusion that they’re the only one. With KVM, the boot starts from the very beginning.

If there was something I didn’t worry about, it was the concept of running an ancient distribution with a very recent kernel. I have a lot of experience with compiling the hot-hot-latest-out kernel and run it with steam engine distributions, and very rarely have I seen any issue with that. The Linux kernel is backward compatible in a remarkable way.

My original idea was to grab the kernel image and the modules from a running installation of Ubuntu 24.04. However, the module format of this distro is incompatible with old Debian 8 (ZST compression seems to have been the crux), and as a result, no modules were loaded.

So I took config-6.8.0-36-generic from Ubuntu 24.04 and used it as the starting point for the .config file used for compiling the vanilla stable kernel with version v6.8.12.

And then there were a few modifications to .config:

“make oldconfig” asked a few questions and made some minor modifications, nothing apparently related.
Dropped kernel module compression (CONFIG_MODULE_COMPRESS_ZSTD off) and set kernel’s own compression to gzip. This was probably the reason the distribution’s modules didn’t load.
Some crypto stuff was disabled: CONFIG_INTEGRITY_PLATFORM_KEYRING, CONFIG_SYSTEM_BLACKLIST_KEYRING and CONFIG_INTEGRITY_MACHINE_KEYRING were dropped, same with CONFIG_LOAD_UEFI_KEYS and most important, CONFIG_SYSTEM_REVOCATION_KEYS was set to “”. Its previous value, “debian/canonical-revoked-certs.pem” made the compilation fail.
Dropped CONFIG_DRM_I915, which caused some weird compilation error.
After making a test run with the kernel, I also dropped CONFIG_UBSAN with everything that comes with it. UBSAN spat a lot of warning messages on mainstream drivers, and it’s really annoying. It’s still unclear to me why these warnings don’t appear on the distribution kernel. Maybe because a difference between compiler versions (the warnings stem from checks inserted by gcc).

The compilation took 32 minutes on a machine with 12 cores (6 hyperthreaded). By far, the longest and most difficult kernel compilation I can remember for a long time.

Based upon my own post, I created the Debian packages for the whole thing, using the bindeb-pkg make target.

That took additional 20 minutes, running on all cores. I used two of these packages in the installation of the KVM machine, as shown in the cookbook below.

Methodology

So the deal with my web host was like this: They started a KVM machine (with a different IP address, of course). I prepared this KVM machine, and when that was ready, I sent a support ticket asking for swapping the IP addresses. This way, the KVM machine became the new server, and the old container machine went to the junkyard.

As this machine involved a mail server and web sites with user content (comments to my blog, for example), I decided to stop the active server, copy “all data”, and restart the server only after the IP swap. In other words, the net result should be as if the same server had been shut down for an hour, and then restarted. No discontinuities.

As it turned out, everything that is related to the web server and email, including the logs of everything, are in /var/ and /home/. So I could therefore copy all files from the old server to the new one for the sake of setting it up, and verify that everything is smooth as a first stage.

Then I shut down the services and copied /var/ and /home/. And then came the IP swap.

This simple command is handy for checking which files have changed during the past week. The first finds the directories, and the second the plain files.

# find / -xdev -ctime -7 -type d | sort
# find / -xdev -ctime -7 -type f | sort

The purpose of the -xdev flag is to remain on one filesystem. Otherwise, a lot of files from /proc and such are printed out. If your system has several relevant filesystems, be sure to add them to “/” in this example.

The next few sections below are the cookbook I wrote for myself in order to get it done without messing around (and hence mess up).

In hindsight, I can say that except for dealing with GRUB and the kernel, most of the hassle had to with the NIC: Its name changed from venet0 to eth0, and it got its address through DHCP relatively late in the boot process. And that required some adaptations.

Preparing the virtual machine

Start the installation Ubuntu 24.04 LTS server edition (or whatever is available, it doesn’t matter much). Possible stop the installation as soon as files are being copied: The only purpose of this step is to partition the disk neatly, so that /dev/vda1 is a small partition for GRUB, and /dev/vda3 is the root filesystem (/dev/vda2 is a swap partition).
Start the KVM machine with a rescue image (preferable graphical or with sshd running). I went for Ubuntu 24.04 LTS server Live ISO (the best choice provided by my web host). See notes below on using Ubuntu’s server ISO as a rescue image.
Wipe the existing root filesystem, if such has been installed. I considered this necessary at the time, because the default inode size may be 256, and GRUB version 1 won’t play ball with that. But later on I decided on GRUB 2. Anyhow, I forced it to be 128 bytes, despite the warning that 128-byte inodes cannot handle dates beyond 2038 and are deprecated:
```
# mkfs.ext4 -I 128 /dev/vda3
```
And since I was at it, no automatic fsck check. Ever. It’s really annoying when you want to kick off the server quickly.
```
# tune2fs -c 0 -i 0 /dev/vda3
```

Mount new system as /mnt/new:

# mkdir /mnt/new
# mount /dev/vda3 /mnt/new

Copy the filesystem. On the OpenVZ machine:
```
# tar --one-file-system -cz / | nc -q 0 185.250.251.160 1234 > /dev/null
```
and the other side goes (run this before the command above):
```
# nc -l 1234 < /dev/null | time tar -C /mnt/new/ -xzv
```
This took about 30 minutes. The purpose of the “-q 0″ flag and those /dev/null redirections is merely to make nc quit when the tar finishes.
Or, doing the same from a backup tarball:
```
$ cat myserver-all-24.07.08-08.22.tar.gz | nc -q 0 -l 1234 > /dev/null
```
and the other side goes
```
# nc 10.1.1.3 1234 < /dev/null | time tar -C /mnt/new/ -xzv
```

Remove old /lib/modules and boot directory:

# rm -rf /mnt/new/lib/modules/ /mnt/new/boot/

Create /boot/grub and copy the grub.cfg file that I’ve prepared in advance to there. This separate post explains the logic behind doing it this way.
Install GRUB on the boot parition (this also adds a lot of files to /boot/grub/):
```
# grub-install --root-directory=/mnt/new /dev/vda
```

In order to work inside the chroot, some bind and tmpfs mounts are necessary:

# mount -o bind /dev /mnt/new/dev
# mount -o bind /sys /mnt/new/sys
# mount -t proc /proc /mnt/new/proc
# mount -t tmpfs tmpfs /mnt/new/tmp
# mount -t tmpfs tmpfs /mnt/new/run

Copy the two .deb files that contain the Linux kernel files to somewhere in /mnt/new/
Chroot into the new fs:
```
# chroot /mnt/new/
```
Check that /dev, /sys, /proc, /run and /tmp are as expected (mounted correctly).
Disable and stop these services: bind9, sendmail, cron.
This wins the prize for the oddest fix: Probably in relation to the OpenVZ container, the LSB modules_dep service is active, and it deletes all module files in /lib/modules on reboot. So make sure to never see it again. Just disabling it wasn’t good enough.
```
# systemctl mask modules_dep.service
```
Install the Linux kernel and its modules into /boot and /lib/modules:
```
# dpkg -i linux-image-6.8.12-myserver_6.8.12-myserver-2_amd64.deb
```

Also install the headers for compilation (why not?)

# dpkg -i linux-headers-6.8.12-myserver_6.8.12-myserver-2_amd64.deb

Add /etc/systemd/network/20-eth0.network
```
[Match]
Name=eth0

[Network]
DHCP=yes
```
The NIC was a given in a container, but now it has to be raised explicitly and the IP address possibly obtained from the hypervisor via DHCP, as I’ve done here.
Add the two following lines to /etc/sysctl.conf, in order to turn off IPv6:
```
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
```
Adjust the firewall rules, so that they don’t depend on the server having a specific IP address (because a temporary IP address will be used).
Add support for lspci (better do it now if something goes wrong after booting):
```
# apt install pciutils
```
Ban the evbug module, which is intended to generate debug message on input devices. Unfortunately, it floods the kernel log sometimes when the mouse goes over the virtual machine’s console window. So ditch it by adding /etc/modprobe.d/evbug-blacklist.conf having this single line:
```
blacklist evbug
```
Edit /etc/fstab. Remove everything, and leave only this row:
```
/dev/vda3 / ext4 defaults 0 1
```
Remove persistence udev rules, if such exist, at /etc/udev/rules.d. Oddly enough, there was nothing in this directory, not in the existing OpenVZ server and not in a regular Ubuntu 24.04 server installation.
Boot up the system from disk, and perform post-boot fixes as mentioned below.

Post-boot fixes

Verify that /tmp is indeed mounted as a tmpfs.
Disable (actually, mask) the automount service, which is useless and fails. This makes systemd’s status degraded, which is practically harmless, but confusing.
```
# systemctl mask proc-sys-fs-binfmt_misc.automount
```

Install the dbus service:

# apt install dbus

Not only is it the right thing to do on a Linux system, but it also silences this warning:

Cannot add dependency job for unit dbus.socket, ignoring: Unit dbus.socket failed to load: No such file or directory.

Enable login prompt on the default visible console (tty1) so that a prompt appears after all the boot messages:
```
# systemctl enable getty@tty1.service
```
The other tty’s got a login prompt when using Ctrl-Alt-Fn, but not the visible console. So this fixed it. Otherwise, one can be mislead into thinking that the boot process is stuck.
Optionally: Disable vzfifo service and remove /.vzfifo.

Just before the IP address swap

Reboot the openVZ server to make sure that it wakes up OK.
Change the openVZ server’s firewall, so works with a different IP address. Otherwise, it becomes unreachable after the IP swap.
Boot the target KVM machine in rescue mode. No need to set up the ssh server as all will be done through VNC.
On the KVM machine, mount new system as /mnt/new:
```
# mkdir /mnt/new
# mount /dev/vda3 /mnt/new
```

On the OpenVZ server, check for recently changed directories and files:

# find / -xdev -ctime -7 -type d | sort > recently-changed-dirs.txt
# find / -xdev -ctime -7 -type f | sort > recently-changed-files.txt

Verify that the changes are only in the places that are going to be updated. If not, consider if and how to update these other files.
Verify that the mail queue is empty, or let sendmail empty it if possible. Not a good idea to have something firing off as soon as sendmail resumes:
```
# mailq
```

Disable all services except sshd on the OpenVZ server:

# systemctl disable cron dovecot apache2 bind9 sendmail mysql xinetd

Run “mailq” again to verify that the mail queue is empty (unless there was a reason to leave a message there in the previous check).
Reboot OpenVZ server and verify that none of these is running. This is the point at which this machine is dismissed as a server, and the downtime clock begins ticking.
Verify that this server doesn’t listen to any ports except ssh, as an indication that all services are down:
```
# netstat -n -a | less
```
Repeat the check of recently changed files.
On KVM machine, remove /var and /home.
```
# rm -rf /mnt/new/var /mnt/new/home
```

Copy these parts:
On the KVM machine, using the VNC console, go

# nc -l 1234 < /dev/null | time tar -C /mnt/new/ -xzv

and on myserver:

# tar --one-file-system -cz /var /home | nc -q 0 185.250.251.160 1234 > /dev/null

Took 28 minutes.

Check that /mnt/new/tmp and /mnt/tmp/run are empty and remove whatever is found, if there’s something there. There’s no reason for anything to be there, and it would be weird if there was, given the way the filesystem was copied from the original machine. But if there are any files, it’s just confusing, as /tmp and /run are tmpfs on the running machine, so any files there will be invisible anyhow.
Reboot the KVM machine with a reboot command. It will stop anyhow for removing the CDROM.
Remove the KVM’s CDROM and continue the reboot normally.
Login to the KVM machine with ssh.
Check that all is OK: systemctl status as well as journalctl. Note that the apache, mysql and dovecot should be running now.
Power down both virtual machines.
Request an IP address swap. Let them do whatever they want with the IPv6 addresses, as they are ignored anyhow.

After IP address swap

Start the KVM server normally, and login normally through ssh.
Try to browse into the web sites: The web server should already be working properly (even though the DNS is off, but there’s a backup DNS).
Check journalctl and systemctl status.
Resume the original firewall rules and verify that the firewall works properly:
```
# systemctl restart netfilter-persistent
# iptables -vn -L
```

Start all services, and check status and journalctl again:

# systemctl start cron dovecot apache2 bind9 sendmail mysql xinetd

If all is fine, enable these services:

# systemctl enable cron dovecot apache2 bind9 sendmail mysql xinetd

Reboot (with reboot command), and check that all is fine.
In particular, send DNS queries directly to the server with dig, and also send an email to a foreign address (e.g. gmail). My web host blocked outgoing connections to port 25 on the new server, for example.
Delete ifcfg-venet0 and ifcfg-venet0:0 in /etc/sysconfig/network-scripts/, as they relate to the venet0 interface that exists only in the container machine. It’s just misleading to have it there.
Compare /etc/rc* and /etc/systemd with the situation before the transition in the git repo, to verify that everything is like it should be.

Check the server with nmap (run this from another machine):
```
$ nmap -v -A server
$ sudo nmap -v -sU server
```

And then the DNS didn’t work

I knew very well why I left plenty of time free for after the IP swap. Something will always go wrong after a maneuver like this, and this time was no different. And for some odd reason, it was the bind9 DNS that played two different kinds of pranks.

I noted immediately that the server didn’t answer to DNS queries. As it turned out, there were two apparently independent reasons for it.

The first was that when I re-enabled the bind9 service (after disabling it for the sake of moving), systemctl went for the SYSV scripts instead of its own. So I got:

# systemctl enable bind9
Synchronizing state for bind9.service with sysvinit using update-rc.d...
Executing /usr/sbin/update-rc.d bind9 defaults
insserv: warning: current start runlevel(s) (empty) of script `bind9' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `bind9' overrides LSB defaults (0 1 6).
Executing /usr/sbin/update-rc.d bind9 enable

This could have been harmless and gone unnoticed, had it not been that I’ve added a “-4″ flag to bind9′s command, or else it wouldn’t work. So by running the SYSV scripts, my change in /etc/systemd/system/bind9.service wasn’t in effect.

Solution: Delete all files related to bind9 in /etc/init.d/ and /etc/rc*.d/. Quite aggressive, but did the job.

Having that fixed, it still didn’t work. The problem now was that eth0 was configured through DHCP after the bind9 had begun running. As a result, the DNS didn’t listen to eth0.

I slapped myself for thinking about adding a “sleep” command before launching bind9, and went for the right way to do this. Namely:

$ cat /etc/systemd/system/bind9.service
[Unit]
Description=BIND Domain Name Server
Documentation=man:named(8)
After=network-online.target systemd-networkd-wait-online.service
Wants=network-online.target systemd-networkd-wait-online.service

[Service]
ExecStart=/usr/sbin/named -4 -f -u bind
ExecReload=/usr/sbin/rndc reload
ExecStop=/usr/sbin/rndc stop

[Install]
WantedBy=multi-user.target

The systemd-networkd-wait-online.service is not there by coincidence. Without it, bind9 was launched before eth0 had received an address. With this, systemd consistently waited for the DHCP to finish, and then launched bind9. As it turned out, this also delayed the start of apache2 and sendmail.

If anything, network-online.target is most likely redundant.

And with this fix, the crucial row appeared in the log:

named[379]: listening on IPv4 interface eth0, 193.29.56.92#53

Another solution could have been to assign an address to eth0 statically. For some odd reason, I prefer to let DHCP do this, even though the firewall will block all traffic anyhow if the IP address changes.

Using Live Ubuntu as rescue mode

Set Ubuntu 24.04 server amd64 as the CDROM image.

After the machine has booted, send a Ctrl-Alt-F2 to switch to the second console. Don’t go on with the installation wizard, as it will of course wipe the server.

In order to establish an ssh connection:

Choose a password for the default user (ubuntu-server).
```
$ passwd
```
If you insist on a weak password, remember that you can do that only as root.
Use ssh to log in:
```
$ ssh ubuntu-server@185.250.251.160
```

Root login is forbidden (by default), so don’t even try.

Note that even though sshd apparently listens only to IPv6 ports, it’s actually accepting IPv4 connection by virtue of IPv4-mapped IPv6 addresses:

# lsof -n -P -i tcp 2>/dev/null
COMMAND    PID            USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
systemd      1            root  143u  IPv6   5323      0t0  TCP *:22 (LISTEN)
systemd-r  911 systemd-resolve   15u  IPv4   1766      0t0  TCP 127.0.0.53:53 (LISTEN)
systemd-r  911 systemd-resolve   17u  IPv4   1768      0t0  TCP 127.0.0.54:53 (LISTEN)
sshd      1687            root    3u  IPv6   5323      0t0  TCP *:22 (LISTEN)
sshd      1847            root    4u  IPv6  11147      0t0  TCP 185.250.251.160:22->85.64.140.6:57208 (ESTABLISHED)
sshd      1902   ubuntu-server    4u  IPv6  11147      0t0  TCP 185.250.251.160:22->85.64.140.6:57208 (ESTABLISHED)One can get the impression that sshd listens only to IPv6. But somehow, it also accepts

So don’t get confused by e.g. netstat and other similar utilities.

To NTP or not?

I wasn’t sure if I should run an NTP client inside a KVM virtual machine. So these are the notes I took.

This is a nice tutorial to start with.
It’s probably a good idea to run an NTP client on the client. It would have been better to utilize the PTP protocol, and get the host’s clock directly. But this is really an overkill. The drawback with these daemons is that if the client goes down and back up again, it will start with the old time, and then jump.
It’s also a good idea to use kvm_clock in addition to NTP. This kernel feature uses the pvclock protocol to lets guest virtual machines read the host physical machine’s wall clock time as well as its TSC. See this post for a nice tutorial about kvm_clock.
In order to know which clock source the kernel uses, look in /sys/devices/system/clocksource/clocksource0/current_clocksource. Quite expectedly, it was kvm-clock (available sources were kvm-clock, tsc and acpi_pm).
It so turned out that systemd-timesyncd started running without my intervention when moving from a container to KVM.

On a working KVM machine, timesyncd tells about its presence in the log:

Jul 11 20:52:52 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.001s/0.007s/0.003s/+0ppm
Jul 11 21:27:00 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.007s/0.001s/+0ppm
Jul 11 22:01:08 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.002s/0.007s/0.001s/+0ppm
Jul 11 22:35:17 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.001s/0.007s/0.001s/+0ppm
Jul 11 23:09:25 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.007s/0.007s/0.003s/+0ppm
Jul 11 23:43:33 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.003s/0.007s/0.005s/+0ppm (ignored)
Jul 12 00:17:41 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.006s/0.007s/0.005s/-1ppm
Jul 12 00:51:50 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.001s/0.007s/0.005s/+0ppm
Jul 12 01:25:58 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.002s/0.007s/0.005s/+0ppm
Jul 12 02:00:06 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/+0.002s/0.007s/0.005s/+0ppm
Jul 12 02:34:14 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.001s/0.007s/0.005s/+0ppm
Jul 12 03:08:23 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.007s/0.005s/+0ppm
Jul 12 03:42:31 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.001s/0.007s/0.004s/+0ppm
Jul 12 04:17:11 myserver systemd-timesyncd[197]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.007s/0.003s/+0ppm

So a resync takes place every 2048 seconds (34 minutes and 8 seconds), like a clockwork. As apparent from the values, there’s no dispute about the time between Debian’s NTP server and the web host’s hypervisor.

Running KVM on Linux Mint 19 random jots

eli — Fri, 12 Jul 2024 14:54:45 +0000

General

Exactly like my previous post from 14 years ago, these are random jots that I took as I set up a QEMU/KVM-based virtual machine on my Linux Mint 19 computer. This time, the purpose was to prepare myself for moving a server from an OpenVZ container to KVM.

Other version details, for the record: libvirt version 4.0.0, QEMU version 2.11.1, Virtual Machine manager 1.5.1.

Installation

Install some relevant packages:

# apt install qemu-kvm qemu-utils libvirt-daemon-system libvirt-clients virt-manager virt-viewer ebtables ovmf

This clearly installed a few services: libvirt-bin, libvirtd, libvirt-guest, virtlogd, qemu-kvm, ebtables, and a couple of sockets: virtlockd.socket and virtlogd.socket with their attached services.

My regular username on the computer was added automatically to the “libvirt” group, however that doesn’t take effect until one logs out and and in again. Without belonging to this group, one gets the error message “Unable to connect to libvirt qemu:///system” when attempting to run the Virtual Machine Manager. Or in more detail: “libvirtError: Failed to connect socket to ‘/var/run/libvirt/libvirt-sock’: Permission denied”.

The lazy and temporary solution is to run the Virtual Machine Manager with “sg”. So instead of the usual command for starting the GUI tool (NOT as root):

$ virt-manager &

Use “sg” (or start a session with the “newgroup” command):

$ sg libvirt virt-manager &

This is necessary only until next time you log in to the console. I think. I didn’t get that far. Who logs out?

There’s also a command-line utility, virsh. For example, to list all running machines:

$ sudo virsh list

Or just “sudo virsh” for an interactive shell.

Note that without root permissions, the list is simply empty. This is really misleading.

General notes

Virtual machines are called “domains” in several contexts (within virsh in particular).
To get the mouse out of the graphical window, use Ctrl-Alt.
For networking to work, some rules related to virbr0 are automatically added to the iptables firewall. If these are absent, go “systemctl restart libvirtd” (don’t do this with virtual machines running, of course).
These iptables rules are important in particular for WAN connections. Apparently, these allow virbr0 to make DNS queries to the local machine (adding rules to INPUT and OUTPUT chains). In addition, the FORWARD rule allows forwarding anything to and from virbr0 (as long as the correct address mask is matched). Plus a whole lot off stuff around POSTROUTING. Quite disgusting, actually.
There are two Ethernet interfaces related to KVM virtualization: vnet0 and virbr0 (typically). For sniffing, virbr0 is a better choice, as it’s the virtual machine’s own bridge to the system, so there is less noise. This is also the interface that has an IP address of its own.
A vnetN pops up for each virtual machine that is running, virbr0 is there regardless.
The configuration files are kept as fairly readable XML files in /etc/libvirt/qemu
The images are typically held at /var/lib/libvirt/images, owned by root with 0600 permissions.
The libvirtd service runs /usr/sbin/libvirtd as well as two processes of /usr/sbin/dnsmasq. When a virtual machine runs, it also runs an instance of qemu-system-x86_64 on its behalf.

Creating a new virtual machine

Start the Virtual Manager. The GUI is good enough for my purposes.

$ sg libvirt virt-manager &

Click on the “Create new virtual machine” and choose “Local install media”. Set the other parameters as necessary.
As for storage, choose “Select or create custom storage” and create a qcow2 volume in a convenient position on the disk (/var/lib/libvirt/images is hardly a good place for that, as it’s on the root partition).
In the last step, choose “customize configuration before install”.
Network selection: Virtual nework ‘default’: NAT.
Change the NIC, Disk and Video to VirtIO as mentioned below.
Click “Begin Installation”.

Do it with VirtIO

That is, use Linux’ paravirtualization drivers, rather than emulation of hardware.

To set up a machine’s settings, go View > Details.

This is lspci’s response with a default virtual machine:

00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04)
00:03.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter (rev 20)
00:04.0 Audio device: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) High Definition Audio Controller (rev 01)
00:05.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:05.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:05.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:05.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:06.0 Communication controller: Red Hat, Inc Virtio console
00:07.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon

Cute, but all interfaces are emulations of real hardware. In other words, this will run really slowly.

Testing link speed: On the host machine:

$ nc -l 1234 < /dev/null > /dev/null

And on the guest:

$ dd if=/dev/zero bs=128k count=4k | nc -q 0 10.1.1.3 1234
4096+0 records in
4096+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 3.74558 s, 143 MB/s

Quite impressive for hardware emulation, I must admit. But it can get better.

Things to change from the default settings:

NIC: Choose “virtio” as device model, keep “Virtual network ‘default’” as NAT.
Disk: On “Disk bus”, don’t use IDE, but rather “VirtIO” (it will appear as /dev/vda etc.).
Video: Don’t use QXL, but Virtio (without 3D acceleration, it wasn’t supported on my machine). Actually, I’m not so sure about this one. For example, Ubuntu’s installation live boot gave me a black screen occasionally with Virtio.

Note that it’s possible to use a VNC server instead of “Display spice”.

After making these changes:

00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Red Hat, Inc Virtio GPU (rev 01)
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
00:04.0 Audio device: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) High Definition Audio Controller (rev 01)
00:05.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:05.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:05.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:05.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:06.0 Communication controller: Red Hat, Inc Virtio console
00:07.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
00:08.0 SCSI storage controller: Red Hat, Inc Virtio block device

Try the speed test again?

$ dd if=/dev/zero bs=128k count=4k | nc -q 0 10.1.1.3 1234
4096+0 records in
4096+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.426422 s, 1.3 GB/s

Almost ten times faster.

Preparing a live Ubuntu ISO for ssh

$ sudo su
# apt install openssh-server
# passwd ubuntu

In the installation of the openssh-server, there’s a question of which configuration files to use. Choose the package maintainer’s version.