Heads up—Let’s Encrypt and Dovecot

Let’s Encrypt certificates work just dandy not only for HTTPS, but also for SSL/TLS on IMAP and SMTP services in mailservers. I deployed Let’s Encrypt to replace manually-purchased-and-deployed certificates on a client server in 2019, and today, users started reporting they were getting certificate expiration errors in mail clients.

When I checked the server using TLS checking tools, they reported that the certificate was fine; both the tools and a manual check of the datestamp on the actual .pem file showed that it had been updating just fine, with the most recent update happening in January and extending the certificate validation until April. WTF?

As it turns out, the problem is that Dovecot—which handles IMAP duties on the server—doesn’t notice when the certificate has been updated on disk; it will cheerfully keep using an in-memory cached copy of whatever certificate was present when the service started until time immemorial.

The way to detect this was to use openssl on the command line to connect directly to the IMAPS port:

you@anybox:~$ openssl s_client -showcerts -connect mail.example.com:993 -servername example.com

Scrolling through the connect data produced this gem:

---
Server certificate
subject=CN = mail.example.com

issuer=C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3

---
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA
Server Temp Key: ECDH, P-384, 384 bits
---
SSL handshake has read 3270 bytes and written 478 bytes
Verification error: certificate has expired

So obviously, the Dovecot service hadn’t reloaded the certificate after Certbot-auto renewed it. One /etc/init.d/dovecot restart later, running the same command instead produced (among all the other verbiage):

---
Server certificate
subject=CN = mail.example.com

issuer=C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3

---
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA
Server Temp Key: ECDH, P-384, 384 bits
---
SSL handshake has read 3269 bytes and written 478 bytes
Verification: OK
---

With the immediate problem resolved, the next step was to make sure Dovecot gets automatically restarted frequently enough to pick new certs up before they expire. You could get fancy and modify certbot’s cron job to include a Dovecot restart; you can find certbot’s cron job with grep -ir certbot /etc/crontab and add a –deploy-hook argument to restart after new certificates are obtained (and only after new certificates are obtained).

But I don’t really recommend doing it that way; the cron job might get automatically updated with an upgraded version of certbot at some point in the future. Instead, I created a new root cron job to restart Dovecot once every Sunday at midnight:

# m h dom mon dow   command
0 0 * * Sun /etc/init.d/dovecot restart

Since Certbot renews any certificate with 30 days or less until expiration, and the Sunday restart will pick up new certificates within 7 days of their deployment, we should be fine with this simple brute-force approach rather than a more efficient—but also more fragile—approach tying the update directly to restarting Dovecot using the –deploy-hook argument.

 

Heartbleed SSL vulnerability

Last night (2014 Apr 7) a massive security vulnerability was publicly disclosed in OpenSSL, the library that encrypts most of the world’s sensitive traffic. The bug in question is approximately two years old – systems older than 2012 are not vulnerable – and affects the TLS “heartbeat” function, which is why the vulnerability has been nicknamed HeartBleed.

The bug allows a malicious remote user to scan arbitrary 64K chunks of the affected server’s memory. This can disclose any and ALL information in that affected server’s memory, including SSL private keys, usernames and passwords of ANY running service accepting logins, and more. Nobody knows if the vulnerability was known or exploited in the wild prior to its public disclosure last night.

If you are an end user:

You will need to change any passwords you use online unless you are absolutely sure that the servers you used them on were not vulnerable. If you are not a HIGHLY experienced admin or developer, you absolutely should NOT assume that sites and servers you use were not vulnerable. They almost certainly were. If you are a highly experienced ops or dev person… you still absolutely should not assume that, but hey, it’s your rope, do what you want with it.

Note that most sites and servers are not yet patched, meaning that changing your password right now will only expose that password as well. If you have not received any notification directly from the site or server in question, you may try a scanner like the one at http://filippo.io/Heartbleed/ to see if your site/server has been patched. Note that this script is not bulletproof, and in fact it’s less than 24 hours old as of the time of this writing, up on a free site, and under massive load.

The most important thing for end users to understand: You must not, must not, MUST NOT reuse passwords between sites. If you have been using one or two passwords for every site and service you access – your email, forums you post on, Facebook, Twitter, chat, YouTube, whatever – you are now compromised everywhere and will continue to be compromised everywhere until ALL sites are patched. Further, this will by no means be the last time a site is compromised. Criminals can and absolutely DO test compromised credentials from one site on other sites and reuse them elsewhere when they work! You absolutely MUST use different passwords – and I don’t just mean tacking a “2” on the end instead of a “1”, or similar cheats – on different sites if you care at all about your online presence, the money and accounts attached to your online presence, etc.

If you are a sysadmin, ops person, dev, etc:

Any systems, sites, services, or code that you are responsible for needs to be checked for links against OpenSSL versions 1.0.1 through 1.0.1f. Note, that’s the OpenSSL vendor versioning system – your individual distribution, if you are using repo versions like a sane person, may have different numbering schemes. (For example, Ubuntu is vulnerable from 1.0.1-0 through 1.0.1-4ubuntu5.11.)

Examples of affected services: HTTPS, IMAPS, POP3S, SMTPS, OpenVPN. Fabulously enough, for once OpenSSH is not affected, even in versions linking to the affected OpenSSL library, since OpenSSH did not use the Heartbeat function. If you are a developer and are concerned about code that you wrote, the key here is whether your code exposed access to the Heartbeat function of OpenSSL. If it was possible for an attacker to access the TLS heartbeat functionality, your code was vulnerable. If it was absolutely not possible to check an SSL heartbeat through your application, then your application was not vulnerable even if it linked to the vulnerable OpenSSL library.

In contrast, please realize that just because your service passed an automated scanner like the one linked above doesn’t mean it was safe. Most of those scanners do not test services that use STARTTLS instead of being TLS-encrypted from the get-go, but services using STARTTLS are absolutely still affected. Similarly, none of the scanners I’ve seen will test UDP services – but UDP services are affected. In short, if you as a developer don’t absolutely know that you weren’t exposing access to the TLS heartbeat function, then you should assume that your OpenSSL-using application or service was/is exploitable until your libraries are brought up to date.

You need to update all copies of the OpenSSL library to 1.0.1g or later (or your distribution’s equivalent), both dynamically AND statically linked (PS: stop using static links, for exactly things like this!), and restart any affected services. You should also, unfortunately, consider any and all credentials, passwords, certificates, keys, etc. that were used on any vulnerable servers, whether directly related to SSL or not, as compromised and regenerate them. The Heartbleed bug allowed scanning ALL memory on any affected server and thus could be used by a sufficiently skilled user to extract ANY sensitive data held in server RAM. As a trivial example, as of today (2014-Apr-08) users at the Ars Technica forums are logging on as other users using password credentials held in server RAM, as exposed by standard exploit test scripts publicly disclosed.

Completely eradicating all potential vulnerability is a STAGGERING amount of work and will involve a lot of user disruption. When estimating your paranoia level, please do remember that the bug itself has been in the wild since 2012 – the public disclosure was not until 2014-Apr-07, but we have no way of knowing how long private, possibly criminal entities have been aware of and/or exploiting the bug in the wild.

Slow performance with dovecot – mysql – roundcube

This drove me crazy forever, and Google wasn’t too helpful.  If you’re running dovecot with mysql authentication, your logins will be exceedingly slow.  This isn’t much of a problem with traditional mail clients – just an annoying bit of a hiccup you probably won’t even notice except for SASL authentication when sending mail – but it makes Roundcube webmail PAINFULLY painfully slow in a VERY obvious way.

The issue is due to PAM authentication being enabled by default in Dovecot, and on Ubuntu at least, it’s done in a really hidden little out-of-the-way file with no easy way to forcibly override it elsewhere that I’m aware of.

Again on Ubuntu, you’ll find the file in question at /etc/dovecot/conf.d/auth-system.conf.ext, and the relevant block should be commented out COMPLETELY, like this:

# PAM authentication. Preferred nowadays by most systems.
# PAM is typically used with either userdb passwd or userdb static.
# REMEMBER: You'll need /etc/pam.d/dovecot file created for PAM
# authentication to actually work. <doc/wiki/PasswordDatabase.PAM.txt>
#passdb {
  # driver = pam
  # [session=yes] [setcred=yes] [failure_show_msg=yes] [max_requests=]
  # [cache_key=] []
  #args = dovecot
#}

Once you’ve done this (and remember, we’re assuming you’re using SQL auth here, and NOT actually USING the PAM!) you’ll auth immediately instead of having to fail PAM and then fall back to SQL auth on every auth request, and things will speed up IMMENSELY. This turns Roundcube from “painfully slow” to “blazing fast”.

Block common trojans in SpamAssassin

If you have a reasonably modern (>= 3.1) version of SpamAssassin, you should by default have a MIMEHeader plugin available (at least on Ubuntu).  This enables you to create a couple of custom rules that block the more pernicious “open this ZIP file” style trojans.

Put the following in /etc/spamassassin/local.cf:

loadplugin Mail::SpamAssassin::Plugin::MIMEHeader

mimeheader ZIP_ATTACHED Content-Type =~ /zip/i
describe ZIP_ATTACHED email contains a zip file attachment
score ZIP_ATTACHED 0.1

header SUBJ_PACKAGE_PICKUP Subject =~ /(parcel|package).*avail.*pickup/i
describe SUBJ_PACKAGE_PICKUP 1 of 2 for meta-rule TROJAN_PACKAGE_PICKUP
score SUBJ_PACKAGE_PICKUP 0.1

header FROM_IRS_GOV From =~ /irs\.gov/i
describe FROM_IRS_GOV 1 of 2 for meta-rule TROJAN_IRS_ZIPFILE
score FROM_IRS_GOV 0.1

meta TROJAN_PACKAGE_PICKUP (SUBJ_PACKAGE_PICKUP &&  ZIP_ATTACHED)
describe TROJAN_PACKAGE_PICKUP Nobody sends a ZIP file to say “your package is ready”.
score TROJAN_PACKAGE_PICKUP 4.0

meta TROJAN_IRS_ZIPFILE (FROM_IRS_GOV &&  ZIP_ATTACHED)
describe TROJAN_IRS_ZIPFILE If the IRS really wants to send a ZIP, they’ll have to find another way
score TROJAN_IRS_ZIPFILE 4.0

As always, spamassassin –lint to make sure the new rules work okay, and /etc/init.d/spamassassin reload to activate them in your running spamd process.

Configuring retry duration in Postfix

By default, Postfix (like most mailservers) will keep trying to deliver mail for unconscionably long periods of time – 5 days. Most users do NOT expect mail that isn’t getting delivered to just sorta hang out for 5 days before they find out it didn’t go through – the typical scenario is that a mistyped email address (or a receiving mailserver that decides to SMTP 4xx you instead of SMTP 5xx you when it really doesn’t want that email, ever) means that three or four days later, the message still hasn’t gotten there, the user has no idea anything went wrong, and then there’s a giant kerfuffle “but I sent that to you days ago!”

If you want to reconfigure Postfix to something a little more in tune with modern sensibilities, add the following to /etc/postfix/main.cf near the top:

# if you can't deliver it in an hour - it can't be delivered!
maximal_queue_lifetime = 1h
maximal_backoff_time = 15m
minimal_backoff_time = 5m
queue_run_delay = 5m

Note that this will cause you problems if you need to deliver to someone who has a particularly aggressive graylisting setup in place that would require retries at or near a full hour later from the original bounce that it sends you. But that’s okay – greylisting is bad, and the remote admin should FEEL bad for doing it. (Alternately – adjust the numbers above as desired to 2h, 4h, whatever will allow you to pass the greylisting on the remote end. Ultimately, you’re the one who has to answer to your users about a balance between “knowing right away that the mail didn’t go through” vs “managing to survive dumb things the other mail admin may have done on his end”.)