Avahi killed my server :'(

Avahi is the equivalent to Apple’s “Bonjour” zeroconf network service. It installs by default with the ubuntu-desktop meta-package, which I generally use to get, you guessed it, a full desktop on virtualization host servers. This never caused me any issues until today.

Today, though – on a server with dual network interfaces, both used as bridge ports on its br0 adapter – Avahi apparently decided “screw the configuration you specified in /etc/network/interfaces, I’m going to give your production virt host bridge an autoconf address. Because I want to be helpful.”

When it did so, the host dropped off the network, I got alarms on my monitoring service, and I couldn’t so much as arp the host, much less log into it. So I drove down to the affected office and did an ifconfig br0, which showed me the following damning bit of evidence:

me@box:~$ ifconfig br0
br0       Link encap:Ethernet  HWaddr 00:0a:e4:ae:7e:4c
         inet6 addr: fe80::20a:e4ff:feae:7e4c/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:11 errors:0 dropped:0 overruns:0 frame:0
         TX packets:96 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:3927 (3.8 KB)  TX bytes:6970 (6.8 KB)

br0:avahi Link encap:Ethernet  HWaddr 00:0a:e4:ae:7e:4c
         inet addr:169.254.6.229  Bcast:169.254.255.255  Mask:255.255.0.0
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

Oh, Avahi, you son-of-a-bitch. Was there anything wrong with the actual NIC? Certainly didn’t look like it – had link lights on the NIC and on the switch, and sure enough, ifdown br0 ; ifup br0 brought it right back online again.

Can we confirm that avahi really was the culprit?

/var/log/syslog:Jan  9 09:10:58 virt0 avahi-daemon[1357]: Withdrawing address record for [redacted IP] on br0.
/var/log/syslog:Jan  9 09:10:58 virt0 avahi-daemon[1357]: Leaving mDNS multicast group on interface br0.IPv4 with address [redacted IP].
/var/log/syslog:Jan  9 09:10:58 virt0 avahi-daemon[1357]: Interface br0.IPv4 no longer relevant for mDNS.
/var/log/syslog:Jan  9 09:10:59 virt0 avahi-autoipd(br0)[12460]: Found user 'avahi-autoipd' (UID 111) and group 'avahi-autoipd' (GID 121).
/var/log/syslog:Jan  9 09:10:59 virt0 avahi-autoipd(br0)[12460]: Successfully called chroot().
/var/log/syslog:Jan  9 09:10:59 virt0 avahi-autoipd(br0)[12460]: Successfully dropped root privileges.
/var/log/syslog:Jan  9 09:10:59 virt0 avahi-autoipd(br0)[12460]: Starting with address 169.254.6.229
/var/log/syslog:Jan  9 09:11:03 virt0 avahi-autoipd(br0)[12460]: Callout BIND, address 169.254.6.229 on interface br0
/var/log/syslog:Jan  9 09:11:03 virt0 avahi-daemon[1357]: Joining mDNS multicast group on interface br0.IPv4 with address 169.254.6.229.
/var/log/syslog:Jan  9 09:11:03 virt0 avahi-daemon[1357]: New relevant interface br0.IPv4 for mDNS.
/var/log/syslog:Jan  9 09:11:03 virt0 avahi-daemon[1357]: Registering new address record for 169.254.6.229 on br0.IPv4.
/var/log/syslog:Jan  9 09:11:07 virt0 avahi-autoipd(br0)[12460]: Successfully claimed IP address 169.254.6.229

I know I said this already, but – oh, avahi, you worthless son of a bitch!

Next step was to kill it and disable it.

me@box:~$ sudo stop avahi-daemon
me@box:~$ echo manual | sudo tee /etc/init/avahi-daemon.override

Grumble grumble grumble. Now I’m just wondering why I’ve never had this problem before… I suspect it’s something to do with having dual NICs on the bridge, and one of them not being plugged in (I only added them both so it wouldn’t matter which one actually got plugged in if the box ever got moved somewhere).

The SSLv3 “POODLE” attack in a (large) nutshell

A summary of the POODLE sslv3 vulnerability and attack:

A vulnerability has been discovered in a decrepit-but-still-widely-supported version of SSL, SSLv3, which allows an attacker a good chance at determining the true value of a single byte of encrypted traffic. This is of limited use in most applications, but in HTTPS (eg your web browser, many mobile applications, etc) an attacker in an MITM (Man-In-The-Middle) position, such as someone operating a wireless router you connect to, can capture and resend the traffic repeatedly until they manage to get a valuable chunk of it assembled in the clear. (This is done by manipulating cleartext traffic, to the same or any other site, injecting some Javascript in that traffic to get your browser to run it. The rogue JS function is what reloads the secure site, offscreen where you can’t see it happening, until the attacker gets what s/he needs out of it.)

That “valuable chunk” is the cookie that validates your user login on whatever secure website you happen to be browsing – your bank, webmail, ebay or amazon account, etc. By replaying that cookie, the attacker can now hijack your logged in session directly on his/her own device, and from there can do anything that you would be able to do – make purchases, transfer funds, change the password, change the associated email account, et cetera.

It reportedly takes 60 seconds or less for an attacker in a MITM position (again, typically someone in control of a router your traffic is being directed through, which is most often going to be a wireless router – maybe even one you don’t realize you’ve connected to) to replay traffic enough to capture the cookie using this attack.

Worth noting: SSLv3 is hopelessly obsolete, but it’s still widely supported in part because IE6/Windows XP need it, and so many large enterprises STILL are using IE6. Many sites and servers have proactively disabled SSLv3 for quite some time already, and for those, you’re fine. However, many large sites still have not – a particularly egregious example being Citibank, to whom you can still connect with SSLv3 today. As long as both your client application (web browser) and the remote site (web server) both support SSLv3, a MITM can force a downgrade dance, telling each side that the OTHER side only supports SSLv3, forcing that protocol even though it’s strongly deprecated.

I’m an end user – what do I do?

Disable SSLv3 in your browser. If you use IE, there’s a checkbox in Internet Options you can uncheck to remove SSLv3 support. If you use Firefox, there’s a plugin for that. If you use Chrome, you can start Chrome with a command-line option that disables SSLv3 for now, but that’s kind of a crappy “fix”, since you’d have to make sure to start Chrome either from the command line or from a particular shortcut every time (and, for example, clicking a link in an email that started up a new Chrome instance would fail to do so).

Instructions, with screenshots, are available at https://zmap.io/sslv3/ and I won’t try to recreate them here; they did a great job.

I will note specifically here that there’s a fix for Chrome users on Ubuntu that does fairly trivially mitigate even use-cases like clicking a link in an email with the browser not already open:


* Open /usr/share/applications/google-chrome.desktop in a text editor
* For any line that begins with "Exec", add the argument --ssl-version-min=tls1
* For instance the line "Exec=/usr/bin/google-chrome-stable %U" should become "Exec=/usr/bin/google-chrome-stable --ssl-version-min=tls1 %U

You can test to see if your fix for a given browser worked by visiting https://zmap.io/sslv3/ again afterwards – there’s a banner at the top of the page which will warn you if you’re vulnerable. WARNING, caching is enabled on that page, meaning you will have to force-refresh to make certain that you aren’t seeing the old cached version with the banner intact – on most systems, pressing ctrl-F5 in your browser while on the page will do the trick.

I’m a sysadmin – what do I do?

Disable SSLv3 support in any SSL-enabled service you run – Apache, nginx, postfix, dovecot, etc. Worth noting – there is currently no known way to usefully exploit the POODLE vulnerability with IMAPS or SMTPS or any other arbitrary SSL-wrapped protocol; currently HTTPS is the only known protocol that allows you to manipulate traffic in a useful enough way. I would not advise banking on that, though. Disable this puppy wherever possible.

The simplest way to test if a service is vulnerable (at least, from a real computer – Windows-only admins will need to do some more digging):

openssl s_client -connect mail.jrs-s.net:443 -ssl3

The above snippet would check my mailserver. The correct (sslv3 not available) response begins with a couple of error lines:

CONNECTED(00000003)
140301802776224:error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure:s3_pkt.c:1260:SSL alert number 40
140301802776224:error:1409E0E5:SSL routines:SSL3_WRITE_BYTES:ssl handshake failure:s3_pkt.c:596:

What you DON’T want to see is a return with a certificate chain in it:

CONNECTED(00000003)
depth=1 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = PositiveSSL CA 2
verify error:num=20:unable to get local issuer certificate
verify return:0
---
Certificate chain
0 s:/OU=Domain Control Validated/OU=PositiveSSL/CN=mail.jrs-s.net
i:/C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=PositiveSSL CA 2
1 s:/C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=PositiveSSL CA 2
i:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root

On Apache on Ubuntu, you can edit /etc/apache2/mods-available/ssl.conf and find the SSLProtocol line and change it to the following:

SSLProtocol all -SSLv2 -SSLv3

Then restart Apache with /etc/init.d/apache2 restart, and you’re golden.

I haven’t had time to research Postfix or Dovecot yet, which are my other two big concerns (even though they theoretically shouldn’t be vulnerable since there’s no way for the attacker to manipulate SMTPS or IMAPS clients into replaying traffic repeatedly).

Possibly also worth noting – I can’t think of any way for an attacker to exploit POODLE without access to web traffic running both in the clear and in a Javascript-enabled browser, so if you wanted to disable Javascript completely (which is pretty useless since it would break the vast majority of the web) or if you’re using a command-line tool like wget for something, it should be safe.

Allowing traceroutes to succeed with iptables

There is a LOT of bogus half-correct information about traceroutes and iptables floating around out there.  It took me a bit of sifting through it all to figure out the real deal and the best way to allow traceroutes without negatively impacting security this morning, so here’s some documentation in case I forget before the next time.

Traceroute from Windows machines typically uses ICMP Type 8 packets.  Traceroute from Unixlike machines typically uses UDP packets with sequentially increasing destination ports, from 33434 to 33534.  So your server (the traceroute destination) must not drop incoming ICMP Type 8 or UDP 33434:33534.

Here’s where it gets tricky: it really doesn’t need to accept those packets either, which the vast majority of sites addressing this issue recommends.  It just needs to be able to reject them, which won’t happen if they’re being dropped.  If you implement the typical advice – accepting those packets – traceroute basically ends up sort of working by accident: those ports shouldn’t be in use by any running applications, and since nothing is monitoring them, the server will issue an ICMP Type 3 response (destination unreachable).  However, if you’re accepting packets to these ports, then a rogue application listening on those ports also becomes reachable – which is the sort of thing your firewall should be preventing in the first place.

The good news is, DROP and ACCEPT aren’t your only options – you can REJECT these packets instead, which will do exactly what we want here: allow traceroutes to work properly without also potentially enabling some rogue application to listen on those UDP ports.

So all you really need on your server to allow incoming traceroutes to work properly is:

# allow ICMP Type 8 (ping, ICMP traceroute)
-A INPUT -p icmp --icmp-type 8 -j ACCEPT
# enable UDP traceroute rejections to get sent out
-A INPUT -p udp --dport 33434:33523 -j REJECT

Note: you may very well need and/or want more ICMP functionality than this in general – but this is all you need for incoming traceroutes to complete properly.

OpenVPN on BeagleBone Black

beaglebone_black
This is my new Beaglebone Black. Enormous, isn’t it?

I needed an inexpensive embedded device for OpenVPN use, and my first thought (actually, my tech David’s first thought) was the obvious in this day and age: “Raspberry Pi.”

Unfortunately, the Pi didn’t really fit the bill.  Aside from the unfortunate fact that my particular Pi arrived with a broken ethernet port, doing some quick network-less testing of OpenSSL gave me very disappointing 5mbps-ish numbers – 5mbps or so, running flat out, encryption alone, let alone any actual routing.  This bore up with some reviews I found online, so I had to give up on the Pi as an embedded solution for OpenVPN use.

Luckily, that didn’t mean I was sunk yet – enter the Beaglebone Black.  Beaglebone doesn’t get as much press as the Pi does, but it’s an interesting device with an interesting history – it’s been around longer than the Pi (more than ten years!), it’s fully open source where the Pi is not (hardware plans are published online, and other vendors are not only allowed but encouraged to build bit-for-bit identical devices!), and although it doesn’t have the video chops of the Pi (no 1080p resolution supported), it has a much better CPU – a 1GHZ Cortex A8, vs the Pi’s 700MHz A7.  If all that isn’t enough, the Beaglebone also has built-in 2GB eMMC flash with a preloaded installation of Angstrom Linux, and – again unlike the Pi – directly supports being powered from plain old USB connected to a computer.  Pretty nifty.

The only real hitch I had with my Beaglebone was not realizing that if I had an SD card in, it would attempt to boot from the SD card, not from the onboard eMMC.  Once I disconnected my brand new Samsung MicroSD card and power cycled the Beaglebone, though, I was off to the races.  It boots into Angstrom pretty quickly, and thanks to the inclusion of the Avahi daemon in the default installation, you can discover the device (from linux at least – haven’t tested Windows) by just pinging beaglebone.local.  Once that resolves, ssh root@beaglebone.local with a default password, and you’re embedded-Linux-ing!

Angstrom doesn’t have any prebuilt packages for OpenVPN, so I downloaded the source from openvpn.net and did the usual ./configure ; make ; make install fandango.  I did have one minor hitch – the system clock wasn’t set, so ./configure bombed out complaining about files in the future.  Easily fixed – ntpdate us.pool.ntp.org updated my clock, and this time the package built without incident, needing somewhere south of 5 minutes to finish.  After that, it was time to test OpenVPN’s throughput – which, spoiler alert, was a total win!

root@beaglebone:~# openvpn --genkey --secret beagle.key ; scp beagle.key me@locutus:/tmp/
root@beaglebone:~# openvpn --secret beagle.key --port 666 --ifconfig 10.98.0.1 10.98.0.2 --dev tun
me@locutus:/tmp$ sudo openvpn --secret beagle.key --remote beaglebone.local --port 666 --ifconfig 10.98.0.2 10.98.0.1 --dev tun

Now I have a working tunnel between locutus and my beaglebone.  Opening a new terminal on each, I ran iperf to test throughput.  To run iperf (which was already available on Angstrom), you just run iperf -s on the server machine, and run iperf -c [ip address] on the client machine to connect to the server.  I tested connectivity both ways across my OpenVPN tunnel:

me@locutus:~$ iperf -c 10.98.0.1
------------------------------------------------------------
Client connecting to 10.98.0.1, TCP port 5001
TCP window size: 21.9 KByte (default)
------------------------------------------------------------
[ 3] local 10.98.0.2 port 55873 connected with 10.98.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.1 sec 46.2 MBytes 38.5 Mbits/sec
me@locutus:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.98.0.2 port 5001 connected with 10.98.0.1 port 32902
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 47.0 MBytes 39.2 Mbits/sec

38+ mbps from an inexpensive embedded device?  I’ll take it!

Apache 2.4 / Ubuntu Trusty problems

Found out the hard way today that there’ve been SIGNIFICANT changes in configuration syntax and requirements since Apache 2.2, when I tried to set up a VERY simple couple of vhosts on Apache 2.4.7 on a brand new Ubuntu Trusty Tahr install.

First – the a2ensite/a2dissite scripts refuse to work unless your vhost config files end in .conf. BE WARNED. Example:

you@trusty:~$ ls /etc/apache2/sites-available
000-default.conf
default-ssl.conf
testsite.tld
you@trusty:~$ sudo a2ensite testsite.tld
ERROR: Site testsite.tld does not exist!

The solution is a little annoying; you MUST end the filename of your vhost configs in .conf – after that, a2ensite and a2dissite work as you’d expect.

you@trusty:~$ sudo mv /etc/apache2/sites-available/testsite.tld /etc/apache2/sites-available/testsite.tld.conf
you@trusty:~$ sudo a2ensite testsite.tld
Enabling site testsite.tld
To activate the new configuration, you need to run:
  service apache2 reload

After that, I had a more serious problem. The “site” I was trying to enable was nothing other than a simple exposure of a directory (a local ubuntu mirror I had set up) – no php, no cgi, nothing fancy at all. Here was my vhost config file:

<VirtualHost *:80>
        ServerName us.archive.ubuntu.com
        ServerAlias us.archive.ubuntu.local 
        Options Includes FollowSymLinks MultiViews Indexes
        DocumentRoot /data/apt-mirror/mirror/us.archive.ubuntu.com
	*lt;Directory /data/apt-mirror/mirror/us.archive.ubuntu.com/>
	        Options Indexes FollowSymLinks
	        AllowOverride None
	</Directory>
</VirtualHost>

Can’t get much simpler, right? This would have worked fine in any previous version of Apache, but not in Apache 2.4.7, the version supplied with Trusty Tahr 14.04 LTS.

Every attempt to browse the directory gave me a 403 Forbidden error, which confused me to no end, since the directories were chmod 755 and chgrp www-data. Checking Apache’s error log gave me pages on pages of lines like this:

[Mon Jun 02 10:45:19.948537 2014] [authz_core:error] [pid 27287:tid 140152894646016] [client 127.0.0.1:40921] AH01630: client denied by server configuration: /data/apt-mirror/mirror/us.archive.ubuntu.com/ubuntu/

What I eventually discovered was that since 2.4, Apache not only requires explicit authentication setup and permission for every directory to be browsed, the syntax has changed as well. The old “Order Deny, Allow” and “Allow from all” won’t cut it – you now need “Require all granted”. Here is my final working vhost .conf file:

<VirtualHost *:80>
        ServerName us.archive.ubuntu.com
        ServerAlias us.archive.ubuntu.local 
        Options Includes FollowSymLinks MultiViews Indexes
        DocumentRoot /data/apt-mirror/mirror/us.archive.ubuntu.com
	<Directory /data/apt-mirror/mirror/us.archive.ubuntu.com/>
	        Options Indexes FollowSymLinks
	        AllowOverride None
                Require all granted
	</Directory>
</VirtualHost>

Hope this helps someone else – this was a frustrating start to the morning for me.

Heartbleed SSL vulnerability

Last night (2014 Apr 7) a massive security vulnerability was publicly disclosed in OpenSSL, the library that encrypts most of the world’s sensitive traffic. The bug in question is approximately two years old – systems older than 2012 are not vulnerable – and affects the TLS “heartbeat” function, which is why the vulnerability has been nicknamed HeartBleed.

The bug allows a malicious remote user to scan arbitrary 64K chunks of the affected server’s memory. This can disclose any and ALL information in that affected server’s memory, including SSL private keys, usernames and passwords of ANY running service accepting logins, and more. Nobody knows if the vulnerability was known or exploited in the wild prior to its public disclosure last night.

If you are an end user:

You will need to change any passwords you use online unless you are absolutely sure that the servers you used them on were not vulnerable. If you are not a HIGHLY experienced admin or developer, you absolutely should NOT assume that sites and servers you use were not vulnerable. They almost certainly were. If you are a highly experienced ops or dev person… you still absolutely should not assume that, but hey, it’s your rope, do what you want with it.

Note that most sites and servers are not yet patched, meaning that changing your password right now will only expose that password as well. If you have not received any notification directly from the site or server in question, you may try a scanner like the one at http://filippo.io/Heartbleed/ to see if your site/server has been patched. Note that this script is not bulletproof, and in fact it’s less than 24 hours old as of the time of this writing, up on a free site, and under massive load.

The most important thing for end users to understand: You must not, must not, MUST NOT reuse passwords between sites. If you have been using one or two passwords for every site and service you access – your email, forums you post on, Facebook, Twitter, chat, YouTube, whatever – you are now compromised everywhere and will continue to be compromised everywhere until ALL sites are patched. Further, this will by no means be the last time a site is compromised. Criminals can and absolutely DO test compromised credentials from one site on other sites and reuse them elsewhere when they work! You absolutely MUST use different passwords – and I don’t just mean tacking a “2” on the end instead of a “1”, or similar cheats – on different sites if you care at all about your online presence, the money and accounts attached to your online presence, etc.

If you are a sysadmin, ops person, dev, etc:

Any systems, sites, services, or code that you are responsible for needs to be checked for links against OpenSSL versions 1.0.1 through 1.0.1f. Note, that’s the OpenSSL vendor versioning system – your individual distribution, if you are using repo versions like a sane person, may have different numbering schemes. (For example, Ubuntu is vulnerable from 1.0.1-0 through 1.0.1-4ubuntu5.11.)

Examples of affected services: HTTPS, IMAPS, POP3S, SMTPS, OpenVPN. Fabulously enough, for once OpenSSH is not affected, even in versions linking to the affected OpenSSL library, since OpenSSH did not use the Heartbeat function. If you are a developer and are concerned about code that you wrote, the key here is whether your code exposed access to the Heartbeat function of OpenSSL. If it was possible for an attacker to access the TLS heartbeat functionality, your code was vulnerable. If it was absolutely not possible to check an SSL heartbeat through your application, then your application was not vulnerable even if it linked to the vulnerable OpenSSL library.

In contrast, please realize that just because your service passed an automated scanner like the one linked above doesn’t mean it was safe. Most of those scanners do not test services that use STARTTLS instead of being TLS-encrypted from the get-go, but services using STARTTLS are absolutely still affected. Similarly, none of the scanners I’ve seen will test UDP services – but UDP services are affected. In short, if you as a developer don’t absolutely know that you weren’t exposing access to the TLS heartbeat function, then you should assume that your OpenSSL-using application or service was/is exploitable until your libraries are brought up to date.

You need to update all copies of the OpenSSL library to 1.0.1g or later (or your distribution’s equivalent), both dynamically AND statically linked (PS: stop using static links, for exactly things like this!), and restart any affected services. You should also, unfortunately, consider any and all credentials, passwords, certificates, keys, etc. that were used on any vulnerable servers, whether directly related to SSL or not, as compromised and regenerate them. The Heartbleed bug allowed scanning ALL memory on any affected server and thus could be used by a sufficiently skilled user to extract ANY sensitive data held in server RAM. As a trivial example, as of today (2014-Apr-08) users at the Ars Technica forums are logging on as other users using password credentials held in server RAM, as exposed by standard exploit test scripts publicly disclosed.

Completely eradicating all potential vulnerability is a STAGGERING amount of work and will involve a lot of user disruption. When estimating your paranoia level, please do remember that the bug itself has been in the wild since 2012 – the public disclosure was not until 2014-Apr-07, but we have no way of knowing how long private, possibly criminal entities have been aware of and/or exploiting the bug in the wild.

Restoring Legacy Boot (Linux Boot) on a Chromebook

press ctrl-alt-forward to jump to TTY2 and a standard login prompt
press ctrl-alt-forward to jump to TTY2 and a standard login prompt

I let the battery die completely on my Acer C720 Chromebook, and discovered that unfortunately if you do that, your Chromebook will no longer Legacy boot when you press Ctrl-L – it just beeps at you despondently, with no error message to indicate what’s going wrong.

Sadly, I found message after message on forums indicating that people encountering this issue just reinstalled ChrUbuntu from scratch. THIS IS NOT NECESSARY!

If you just get several beeps when you press Ctrl-L to boot into Linux on your Chromebook, don’t fret – press Ctrl-D to boot into ChromeOS, but DON’T LOG IN. Instead, change terminals to get a shell. The function keys at the top of the keyboard (the row with “Esc” at the far left) map to the F-keys on a normal keyboard, and ctrl-alt-[Fkey] works here just as it would in Linux. The [forward arrow] key two keys to the right of Esc maps to F2, so pressing ctrl-alt-[forward arrow on top row] will bring you to tty2, which presents you with a standard Linux login prompt.

Log in as chronos (no password, unless you’d previously set one). Now, one command will get you right:

sudo crossystem dev_boot_usb=1 dev_boot_legacy=1

That’s it. You’re now ready to reboot and SUCCESSFULLY Ctrl-L into your existing Linux install.

btrfs RAID awesomeness

I am INDECENTLY excited about some plans for further btrfs development that I just got confirmed on the mailing list.

Btrfs can already support a redundant array on an arbitrary collection of “mutt” hard drives via btrfs-raid1 – as an example, say you’ve got five hard drives lying around; a 4TB drive, two 2TB drives, a 1TB drive, and an old 750GB drive. You can dump all of those mutts into one btrfs-raid1 array for a total of 9.75TB of raw capacity, and roughly half that (4.5TB-ish) of usable capacity. The btrfs-raid1 will just store two copies of each data block it writes on two separate drives, doing its best to keep each drive in the array about the same percentage of its overall capacity “full”.

9.75TB btrfs-raid1 empty       9.75TB btrfs-raid1 half full

This is actually a huge win for midmarket and enterprise as well as for small business and prosumers – if you assume a more business-y environment with matched drives, the performance characteristics of an array like this are very interesting; you aren’t tied to stripes, you don’t have to do a read-write-rewrite dance when data is modified, and you don’t have to have an extremely performance-penalized restriping event if a drive fails and is replaced – the array can just quickly write new redundant copies of any orphaned blocks out to the new replacement drives, without needing to disturb (or, worse, read / write / rewrite) blocks that weren’t orphaned.

For small business and especially prosumer, of course, the benefits are even more obvious – the world is littered with people who want to throw an arbitrary number of arbitrarily sized “mutt” drives they have lying around into one big array, and finally, they can do so – quickly, easily, effectively. The only drawback is that you’re locked into n/2 storage efficiency – since this is still a system that relies on a redundant copy of each data block to be written to a different disk than the first copy is written to, the 9.75TB array up there will only be able to store about 4.5TB of data, which is just not efficient enough for a lot of people, especially the small business and prosumer types.

OK, now we’ve wrapped our brain around the idea of a “raid1” that just distributes data blocks evenly around an arbitrary array while making sure that redundant blocks are on separate physical disks – and can handle weird numbers and sizes of disks with aplomb. (I’ve personally tested a btrfs-RAID1 with three equal sized drives, and yes, it can store half the raw capacity of all three drives just fine.) What’s next?

For those folks who aren’t satisfied with n/2 storage efficiency… there are plans on the roadmap for striped/parity storage that ALSO isn’t tied to the number of physical drives present. Let’s say you have five disks and you’re satisfied with single-disk fault tolerance – you might choose to create a striped array with four data blocks and one parity block per stripe. This is pretty obvious and easy to picture (if it were actually tied directly to disks, and the disks had to be identically sized, we’d have just described a RAID3 array). Now, what happens if you add two more disks to your array without changing your stripe size and parity level? No problem – (future versions of) btrfs will just distribute blocks from each stripe equally around the array of physical disks, so that “half full” on the array roughly corresponds to “half full” on each individual disk as well, just like we demo’ed above on the btrfs-raid1 array.

I am VERY excited about being able to decide, for example, “I want 80% storage efficiency and single-failure fault tolerance”, and therefore being able to select a stripe length of 4 data blocks and one parity block… or “I want 75% storage efficiency and two-failure fault tolerance” and selecting a stripe length of 6 data blocks and 2 parity blocks… and in either case, to then be able to say “OK, now I want to expand my array with these six new drives I just bought” and be able to do so just as simply as adding the new drives, without worrying about how that will affect my storage efficency, fault tolerance level, or having to do a severely expensive and somewhat dangerous “restriping” of the array.

We are living in very interesting times, when it comes to data storage, and I have a feeling the majority of the storage industry – even the relatively well-informed bits that keep up with ZFS, Ceph, OrangeFS, etc – are going to be utterly gobsmacked and scrambling for footing when btrfs hits mainstream.

Slow performance with dovecot – mysql – roundcube

This drove me crazy forever, and Google wasn’t too helpful.  If you’re running dovecot with mysql authentication, your logins will be exceedingly slow.  This isn’t much of a problem with traditional mail clients – just an annoying bit of a hiccup you probably won’t even notice except for SASL authentication when sending mail – but it makes Roundcube webmail PAINFULLY painfully slow in a VERY obvious way.

The issue is due to PAM authentication being enabled by default in Dovecot, and on Ubuntu at least, it’s done in a really hidden little out-of-the-way file with no easy way to forcibly override it elsewhere that I’m aware of.

Again on Ubuntu, you’ll find the file in question at /etc/dovecot/conf.d/auth-system.conf.ext, and the relevant block should be commented out COMPLETELY, like this:

# PAM authentication. Preferred nowadays by most systems.
# PAM is typically used with either userdb passwd or userdb static.
# REMEMBER: You'll need /etc/pam.d/dovecot file created for PAM
# authentication to actually work. <doc/wiki/PasswordDatabase.PAM.txt>
#passdb {
  # driver = pam
  # [session=yes] [setcred=yes] [failure_show_msg=yes] [max_requests=]
  # [cache_key=] []
  #args = dovecot
#}

Once you’ve done this (and remember, we’re assuming you’re using SQL auth here, and NOT actually USING the PAM!) you’ll auth immediately instead of having to fail PAM and then fall back to SQL auth on every auth request, and things will speed up IMMENSELY. This turns Roundcube from “painfully slow” to “blazing fast”.