Open Source – Page 7 – JRS Systems: the blog

ZFS dedup: tested, found wanting

Even if you have the RAM for it (and we’re talking a good 6GB or so per TB of storage), ZFS deduplication is, unfortunately, almost certainly a lose.

I don’t usually have that much RAM to spare, but one server has 192GB of RAM and only a few terabytes of storage – and it stores a lot of VM images, with obvious serious block-level duplication between images. Dedup shows at 1.35+ on all the datasets, and would be higher if one VM didn’t have a couple of terabytes of almost dup-free data on it.

That server’s been running for a few years now, and nobody using it has complained. But I was doing some maintenance on it today, splitting up VMs into their own datasets, and saw some truly abysmal performance.

root@virt0:/data/images# pv < jabberserver.qcow2 > jabber/jabberserver.qcow2 206MB 0:00:31 [7.14MB/s] [>                  ]  1% ETA 0:48:41

7MB/sec? UGH! And that’s not even a sustained average; that’s just where it happened to be when I killed the process. This server should be able to sustain MUCH better performance than that, even though it’s reading and writing from the same pool. So I checked, and saw that dedup was on:

root@virt0:~# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
data  7.06T  2.52T  4.55T    35%  1.35x  ONLINE  -

In theory, you’d think that dedup would help tremendously with exactly this operation: copying a quiesced VM from one dataset to another on the same pool. There’s no need for a single block of data to be rewritten, just more pointers added to the metadata for the existing blocks. However, dedup looked like the obvious culprit for my performance woes here, so I disabled it and tried again:

root@virt0:/data/images# pv < jabberserver.qcow2 > jabber/jabberserver.qcow219.2GB 0:04:58 [65.7MB/s] [============>] 100%

Yep, that’s more like it.

TL;DR: ZFS dedup sounds like a great idea, but in the real world, it sucks. Even on a machine built to handle it. Even on exactly the kind of storage (a bunch of VMs with similar or identical operating systems) that seems tailor-made for it. I do not recommend its use for pretty much any conceivable workload.

(On the other hand, LZ4 compression is an unqualified win.)

Exploring copyleft

Recently, the ElementaryOS Linux distribution screwed the pooch pretty badly, PR-wise – TL;DR if ElementaryOS nerfs the blog post they never should have made: “if you don’t donate money when you download ElementaryOS, you’re a cheater”. Yeah, that didn’t go down so well.

Where it got interesting was this Reddit thread discussing the gaffe. It never ceases to amaze me what a slippery concept copyleft really is, and how easy it is for people to get confused in discussing it and its ramifications. In case it isn’t obvious, by the way, I’m “people” too, so I’m going to revisit copyleft here with lots of citations for my own benefit as well as that of anyone reading. There are quite a lot of copyleft licenses, but I’m going to look at the GPLv3 (at this time, the most current version). I promise not to cite any other copyleft license in this article, and if I want to talk about a different one (GPLv2, Affero GPL, Lesser GPL, etc) I’ll make it perfectly clear when (if) I am.

First Things First

The first first thing – gnu.org itself provides a quick guide to the GPLv3. There is also a GPL FAQ that covers all the GPL licenses, not just GPLv3. There are instructions for the usage of the GPL licenses so you know how to make sure a license you’ve selected is properly applied to your work. And finally, and most authoritatively (this is the part which might be judged in a court of law; the rest is helpful but not legally binding), here is the gnu.org hosted authoritative copy of the most current version of the GPL itself – at the time of this writing, GPLv3. (There is sadly no way to specifically link to GPLv3 at gnu.org right now – the “old licenses” area direct links to old versions by specific number, but treats the GPLv3 as though there will never, ever be further alterations to the main GPL. Time will tell, I suppose.)

I will be citing the GPLv3’s terms directly from the version at gnu.org as linked above. I’m not going to attempt to move line by line through the entire license; if you want to do that, you should read the entire license yourself. Instead, I’m going to be moving through the basic rights afforded by copyleft and how they affect the usage, distribution, and commercial value of GPLv3 licensed software, referring to the GPLv3 directly where appropriate.

I am not a lawyer. I am not your lawyer. I am just a reasonably bright person who is reading through the text of the GPLv3, and interpreting it with the assistance of a couple of decades of industry experience and conversations with other reasonably bright people – some lawyers, some not – about what the license is intended to do, and what it actually does.

About the Preamble…

The GPL in general is a very unusual legal document – not specifically in terms of the rights it offers, and certainly not in terms of its complexity (there are contracts you wouldn’t want to carry around printed out without a wheelbarrow; no version of the GPL is one of them), but in terms of the fact that Richard Stallman is not a lawyer, but is an extremely, extremely stubborn man who will listen to advice from lawyers, but not take it.

It is incredibly abnormal for a legal contract to try to give you a plain English overview of what its actual goals, and put said overview in the legally binding contract itself. That’s incredibly abnormal because legal contracts are much like source code – comments may be helpful to you, but the compiler ignores them, and the computer itself is only “bound” by the actual instructions. RMS, for better or for worse, decided that was bullshit and declared that his Preamble is part of the contract itself. This frustrates the living hell out of actual lawyers, because – just like the comments in source code – text that humans in general find “easy reading” tend to be abstract, vague, and difficult to concretely interpret in predictable, enforceable ways. That’s why they’re comments, and why the actual executable instructions are there – if they were equivalent, we’d just write the comments and not bother with the code!

I’m not going to go into the Preamble here – it’s just a summary of the general idea of the license itself – but it’s probably worth realizing that technically, at least in theory, it’s legally enforceable. God help us all, lawyers, internet lawyers, and little fishies alike.

Your Copyleft Rights

Rather than trying to abstract the Preamble any further than it already is, I’m going to reproduce the list of basic rights which is laid out in gnu.org’s Quick Guide to the GPL linked above. While not a part of the license itself, understanding them is key to understanding why the license says the things it does, and what it’s trying to accomplish.

Nobody should be restricted by the software they use. There are four freedoms that every user should have:

the freedom to use the software for any purpose,
the freedom to change the software to suit your needs,
the freedom to share the software with your friends and neighbors, and
the freedom to share the changes you make.

When a program offers users all of these freedoms, we call it free software.

The GPL sets out to accomplish the granting of these basic freedoms almost entirely by governing the distribution of source code. (If you aren’t sure what source code, object code, or binaries are – or the differences between them – please visit the text of the GPL itself. There’s a Terms and Definitions section at the top.)

When aren’t you bound by the GPL?

This is a pretty key component to the GPL that most people don’t understand. Without distribution, it’s basically impossible to violate the GPL. You can modify GPL licensed code without having to give it to the whole world. The catch happens the first time you want somebody else to use your code – when you distribute the code to that other person, your GPL obligations kick in, and now you have to make the source code available.

Let’s look at the relevant bit from the GPLv3 itself:

You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you.

Want to make your own super special fork of the Linux kernel that nobody gets to use but you, ’cause you’re so awesome? You can do that.

Own a company, and want to build your own super special private Linux kernel in-house, and have your staff of minions do the work, and your employees use workstations and servers running that kernel? You can do that.

Want to sell – or even give away – your new Awesum Kernel Terbo 2000? Welp, now your GPL obligations kick in. You can do that, but now you’re going to be bound by the conditions of the GPL, as we see in the very next line:

Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary.

Conditions/Obligations For Distribution

OK, here’s where we get into the actual viral license and copyleft and enforcing your freedoms and all that other good stuff.

Here’s the basic overview:

You may distribute your code, source or otherwise, to anybody you want to, at any price or none
Anyone who you have distributed code to, you must also offer source code to, at no additional charge
Anyone who you have distributed code to has the same rights, obligations, and conditions to that code that you did to begin with

Judging by the Reddit thread that started all this, there seems to be some contention on this point. So let’s dive into the license.

5. Conveying Modified Source Versions.

You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways:

(The ways boil down to “the ways you’d normally get a copy of source code, if you intended to actually read, modify, compile, and/or further distribute it yourself.”)

This one, honestly, is a bit of a brain-bender, and I’ll admit I had some misconceptions about it until reading and re-reading and vacuuming the steam off of my skull afterwards. It’s very obvious that the GPL requires you to convey source, but does not require you to convey compiled code in any form. What’s easy to miss, in true Purloined Letter fashion, is that very first part of the sentence: “You may convey a covered work in object code form.”

So if – for example – you wanted to give somebody a copy of an RHEL install ISO, you could do that, even if you hadn’t compiled the code yourself. (You would be obligated to provide the full source code to whoever you gave the copy of the object code to if they asked for it, of course.)

That isn’t necessarily a useful right, though, which we’ll cover next.

Thinking several moves ahead

OK, so we’ve looked at enough of the GPL to understand the general thrust. (We skipped the patent section in the GPLv3, but for my purposes here, it’s enough to say “it’s there, and it tries to indemnify users from patent abuse in the same ways that the GPL has always tried to indemnify them from copyright abuse.”)

But what about loopholes and ramifications? This is a license that fits in fewer than ten generously-margined pages, even with headers and footers and website fluff prepended and appended; it doesn’t really attempt to and can’t possibly specifically try to chink or even specify every possible loophole or ramification. So you need to do a little of it yourself, especially if you’re going to play the ever-popular sport of internet lawyer – or put your own money on the line.

Can people just make copies of your actual binaries?

Yes, they can – it’s easy to miss, but the GPL gives them the right to copy your actual binaries, not just your source code. The more interesting question here is “does that matter?” and the answer is “not really.” If you want people to have to compile their own binaries rather than redistributing yours, the GPL does nothing to keep you from littering the source with boobytraps – effectively if not literally. Want to put in a routine that checks for a support contract and immediately refuses to continue if none is found? That’s perfectly legit, and it will keep people from just using your binaries. It won’t, however, keep somebody from modifying your source, stripping out the unwanted routine, and then themselves compiling and distributing that.

Obfuscating the source to try to keep people from dyking out your CheckCustomerStatus() routine isn’t an option, either:

“Installation Information” for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made.

If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information.

This section could frankly be better written, but basically if you make it impossible to dyke unpleasant things out of your code, you’re just going to have to give people support in dyking them out anyway.

The TL;DR on this one is you can effectively prevent people from distributing your binaries – but if you do, one of two things will happen. One, somebody sets up shop modifying, compiling, and distributing your code without the unpleasant bits in it, and now people get binaries and source from them, not from you. Or two, your project just isn’t interesting enough for anyone to bother with it if you’re going to be unpleasant about it, and it therefore dies on the vine.

An excellent example of this playing out in real life is Red Hat Enterprise Linux and CentOS. CentOS is a RHEL clone. CentOS is largely if not entirely compiled from source, rather than merely being copies of the RHEL binaries, because there are enough instances of RHEL referring to resources that are only available to Red Hat customers that it’s better to do it that way even though it’s a pain in the butt. I can tell you informally and you’ll-just-have-to-believe-me-but-it-shouldn’t-be-difficult that the relationship between Red Hat and CentOS was a little strained. Nobody would say anything bad about CentOS, publicly or for the most part even privately on the Red Hat campus, but it’s hard not to instinctively feel “those guys are stealing our customers.” But the official position was that a CentOS user wasn’t a RHEL customer – one of the most important selection criteria for a RHEL customer being “someone who wants to give us money.” This was a very wise position, IMO. In the end, Red Hat actually acquired CentOS itself – not to kill it on the vine, but to make sure it was being run well, and reap the PR benefits of running it well.

Red Hat is pretty awesome.

What if I sell one single copy of my GPL program, but then four billion people demand the source code?

This is a pretty reasonable fear; there’s a significant cost associated with distributing a program to thousands, hundreds of thousands, millions, or maybe even more people. “Internet scale” is thrilling, but it can be scary! But relax – remember the bit about how the GPL’s terms only kick in on distribution, and only grant rights to the person you distributed a covered work to? The rest of the world didn’t buy (or otherwise acquire) a copy of your program, so you don’t owe them jack in terms of copies of the source code.

The very first customer you sell to might decide to offer your source code to all those people, of course, with or without your permission. But you won’t be forced to do it yourself.

Let’s take this the next logical step, though: if your program is worthwhile, somebody will want to distribute it. If that person is you, you stand a chance to reap some benefit from it – that could be literal money in the form of sales, donations, or support contracts. Or it could be more immaterial, in the form of fame, respect, job offers, consulting gigs. But if you aren’t the one doing the distribution, it’s going to be less likely for you to be the one reaping the reward. So you should probably plan on distributing your code to all who ask rather than only to those who purchased it, even though you don’t actually have to. (See again: Red Hat eventually deciding to acquire CentOS… and continue their operation.)

What if I change my mind about GPL? Can the cat be put back into the bag?

Well, yes and no. You can always relicense your own code, but:

Anyone who’s already acquired it from you under the GPL will still have their own rights of modification and redistribution
If you’ve ever accepted contributions from community developers, you won’t have the rights to relicense them unless you suckered those developers into specifically signing full – not merely GPL – rights to their own contributions to you
Information wants to be free.

In theory, you could release your awesome project, only sell it to five people, never let anyone else contribute to your own copy of it, never have anyone else fork it, outlive those original five people, and then relicense it proprietary. Mwahahaha! In practice, if your project ever got any market penetration or adoption at all, that’s never going to happen, even if you never let anyone else contribute to the codebase. Somebody out there is still going to have a copy, have distribution rights, and if there’s a significant group of people wanting it, they’ll distribute it. If there’s not a significant market for the project, then why worry about all this anyway?

If you’ve never accepted code contributions from anyone else, you can always distribute your work under any license you see fit, regardless of what licenses you’ve distributed it under before. The question quickly reduces to “does that matter?”, though, since anyone who ever received it from you under the GPL can themselves distribute it to whoever they like, and you have no right to prevent them from doing it.

Taking this a step further, what if you wrote a few thousand more lines of code adding lots of awesome new features and you don’t want to release that under the GPL, you want it to be proprietary? Yep, you can do that – again assuming you are the only author and/or all actual authors of all of the original code that’s still a part of YourProject 2.0 have granted you full control of the license – but you better think about that decision long and hard. If you’re the sole author, you have the freedom to stop distributing under the GPL – but everybody you’ve ever distributed to, and everybody they’ve ever distributed to, and so forth down the line equally have the freedom to just stop getting code from you. You might easily discover that nobody wants to buy your new v2.0 with the onerous license – now they prefer to get the v1.0 licensed GPL, and a community has sprung up around that codebase, and the next thing you know more new features and bugfixes and sweet, sweet PR is going to your “old and busted” codebase that you no longer really control, rather than the new and supposedly awesome one you’re sitting on.

Final Conclusions

The GPL is awesome. Don’t fear it, embrace it.
You can try to put the cat back in the bag, but don’t be surprised when the cat doesn’t actually go.
If your customers want to do business with you, there’s money to be made.
If your customers don’t want to do business with you, they aren’t – and won’t be – your customers.

A lot of these lessons are ultimately true for all business, not just FOSS business, of course. The GPL just forces you to realize them more quickly, and works harder to keep you from exploiting your way around it temporarily.

ZFS: You should use mirror vdevs, not RAIDZ.

Continuing this week’s “making an article so I don’t have to keep typing it” ZFS series… here’s why you should stop using RAIDZ, and start using mirror vdevs instead.

The basics of pool topology

A pool is a collection of vdevs. Vdevs can be any of the following (and more, but we’re keeping this relatively simple):

single disks (think RAID0)
redundant vdevs (aka mirrors – think RAID1)
parity vdevs (aka stripes – think RAID5/RAID6/RAID7, aka single, dual, and triple parity stripes)

The pool itself will distribute writes among the vdevs inside it on a relatively even basis. However, this is not a “stripe” like you see in RAID10 – it’s just distribution of writes. If you make a RAID10 out of 2 2TB drives and 2 1TB drives, the second TB on the bigger drives is wasted, and your after-redundancy storage is still only 2 TB. If you put the same drives in a zpool as two mirror vdevs, they will be a 2x2TB mirror and a 2x1TB mirror, and your after-redundancy storage will be 3TB. If you keep writing to the pool until you fill it, you may completely fill the two 1TB disks long before the two 2TB disks are full. Exactly how the writes are distributed isn’t guaranteed by the specification, only that they will be distributed.

What if you have twelve disks, and you configure them as two RAIDZ2 (dual parity stripe) vdevs of six disks each? Well, your pool will consist of two RAIDZ2 arrays, and it will distribute writes across them just like it did with the pool of mirrors. What if you made a ten disk RAIDZ2, and a two disk mirror? Again, they go in the pool, the pool distributes writes across them. In general, you should probably expect a pool’s performance to exhibit the worst characteristics of each vdev inside it. In practice, there’s no guarantee where reads will come from inside the pool – they’ll come from “whatever vdev they were written to”, and the pool gets to write to whichever vdevs it wants to for any given block(s).

Storage Efficiency

If it isn’t clear from the name, storage efficiency is the ratio of usable storage capacity (after redundancy or parity) to raw storage capacity (before redundancy or parity).

This is where a lot of people get themselves into trouble. “Well obviously I want the most usable TB possible out of the disks I have, right?” Probably not. That big number might look sexy, but it’s liable to get you into a lot of trouble later. We’ll cover that further in the next section; for now, let’s just look at the storage efficiency of each vdev type.

single disk vdev(s) – 100% storage efficiency. Until you lose any single disk, and it becomes 0% storage efficency…
eight single-disk vdevs
RAIDZ1 vdev(s) – (n-1)/n, where n is the number of disks in each vdev. For example, a RAIDZ1 of eight disks has an SE of 7/8 = 87.5%.
an eight disk raidz1 vdev
RAIDZ2 vdev(s) – (n-2)/n. For example, a RAIDZ2 of eight disks has an SE of 6/8 = 75%.
an eight disk raidz2 vdev
RAIDZ3 vdev(s) – (n-3)/n. For example, a RAIDZ3 of eight disks has an SE of 5/8 = 62.5%.
an eight disk raidz3 vdev
mirror vdev(s) – 1/n, where n is the number of disks in each vdev. Eight disks set up as 4 2-disk mirror vdevs have an SE of 1/2 = 50%.
a pool of four 2-disk mirror vdevs

One final note: striped (RAIDZ) vdevs aren’t supposed to be “as big as you can possibly make them.” Experts are cagey about actually giving concrete recommendations about stripe width (the number of devices in a striped vdev), but they invariably recommend making them “not too wide.” If you consider yourself an expert, make your own expert decision about this. If you don’t consider yourself an expert, and you want more concrete general rule-of-thumb advice: no more than eight disks per vdev.

Fault tolerance / degraded performance

Be careful here. Keep in mind that if any single vdev fails, the entire pool fails with it. There is no fault tolerance at the pool level, only at the individual vdev level! So if you create a pool with single disk vdevs, any failure will bring the whole pool down.

It may be tempting to go for that big storage number and use RAIDZ1… but it’s just not enough. If a disk fails, the performance of your pool will be drastically degraded while you’re replacing it. And you have no fault tolerance at all until the disk has been replaced and completely resilvered… which could take days or even weeks, depending on the performance of your disks, the load your actual use places on the disks, etc. And if one of your disks failed, and age was a factor… you’re going to be sweating bullets wondering if another will fail before your resilver completes. And then you’ll have to go through the whole thing again every time you replace a disk. This sucks. Don’t do it. Conventional RAID5 is strongly deprecated for exactly the same reasons. According to Dell, “Raid 5 for all business critical data on any drive type [is] no longer best practice.”

RAIDZ2 and RAIDZ3 try to address this nightmare scenario by expanding to dual and triple parity, respectively. This means that a RAIDZ2 vdev can survive two drive failures, and a RAIDZ3 vdev can survive three. Problem solved, right? Well, problem mitigated – but the degraded performance and resilver time is even worse than a RAIDZ1, because the parity calculations are considerably gnarlier. And it gets worse the wider your stripe (number of disks in the vdev).

Saving the best for last: mirror vdevs. When a disk fails in a mirror vdev, your pool is minimally impacted – nothing needs to be rebuilt from parity, you just have one less device to distribute reads from. When you replace and resilver a disk in a mirror vdev, your pool is again minimally impacted – you’re doing simple reads from the remaining member of the vdev, and simple writes to the new member of the vdev. In no case are you re-writing entire stripes, all other vdevs in the pool are completely unaffected, etc. Mirror vdev resilvering goes really quickly, with very little impact on the performance of the pool. Resilience to multiple failure is very strong, though requires some calculation – your chance of surviving a disk failure is 1-(f/(n-f)), where f is the number of disks already failed, and n is the number of disks in the full pool. In an eight disk pool, this means 100% survival of the first disk failure, 85.7% survival of a second disk failure, 66.7% survival of a third disk failure. This assumes two disk vdevs, of course – three disk mirrors are even more resilient.

But wait, why would I want to trade guaranteed two disk failure in RAIDZ2 with only 85.7% survival of two disk failure in a pool of mirrors? Because of the drastically shorter time to resilver, and drastically lower load placed on the pool while doing so. The only disk more heavily loaded than usual during a mirror vdev resilvering is the other disk in the vdev – which might sound bad, but remember that it’s no more heavily loaded than it would’ve been as a RAIDZ member. Each block resilvered on a RAIDZ vdev requires a block to be read from each surviving RAIDZ member; each block written to a resilvering mirror only requires one block to be read from a surviving vdev member. For a six-disk RAIDZ1 vs a six disk pool of mirrors, that’s five times the extra I/O demands required of the surviving disks.

Resilvering a mirror is much less stressful than resilvering a RAIDZ.

One last note on fault tolerance

No matter what your ZFS pool topology looks like, you still need regular backup.

Say it again with me: I must back up my pool!

ZFS is awesome. Combining checksumming and parity/redundancy is awesome. But there are still lots of potential ways for your data to die, and you still need to back up your pool. Period. PERIOD!

Normal performance

It’s easy to think that a gigantic RAIDZ vdev would outperform a pool of mirror vdevs, for the same reason it’s got a greater storage efficiency. “Well when I read or write the data, it comes off of / goes onto more drives at once, so it’s got to be faster!” Sorry, doesn’t work that way. You might see results that look kinda like that if you’re doing a single read or write of a lot of data at once while absolutely no other activity is going on, if the RAIDZ is completely unfragmented… but the moment you start throwing in other simultaneous reads or writes, fragmentation on the vdev, etc then you start looking for random access IOPS. But don’t listen to me, listen to one of the core ZFS developers, Matthew Ahrens: “For best performance on random IOPS, use a small number of disks in each RAID-Z group. E.g, 3-wide RAIDZ1, 6-wide RAIDZ2, or 9-wide RAIDZ3 (all of which use ⅓ of total storage for parity, in the ideal case of using large blocks). This is because RAID-Z spreads each logical block across all the devices (similar to RAID-3, in contrast with RAID-4/5/6). For even better performance, consider using mirroring.“

Please read that last bit extra hard: For even better performance, consider using mirroring. He’s not kidding. Just like RAID10 has long been acknowledged the best performing conventional RAID topology, a pool of mirror vdevs is by far the best performing ZFS topology.

Future expansion

This is one that should strike near and dear to your heart if you’re a SOHO admin or a hobbyist. One of the things about ZFS that everybody knows to complain about is that you can’t expand RAIDZ. Once you create it, it’s created, and you’re stuck with it.

Well, sorta.

Let’s say you had a server with 12 slots to put drives in, and you put six drives in it as a RAIDZ2. When you bought it, 1TB drives were a great bang for the buck, so that’s what you used. You’ve got 6TB raw / 4TB usable. Two years later, 2TB drives are cheap, and you’re feeling cramped. So you fill the rest of the six available bays in your server, and now you’ve added an 12TB raw / 8TB usable vdev, for a total pool size of 18TB/12TB. Two years after that, 4TB drives are out, and you’re feeling cramped again… but you’ve got no place left to put drives. Now what?

Well, you actually can upgrade that original RAIDZ2 of 1TB drives – what you have to do is fail one disk out of the vdev and remove it, then replace it with one of your 4TB drives. Wait for the resilvering to complete, then fail a second one, and replace it. Lather, rinse, repeat until you’ve replaced all six drives, and resilvered the vdev six separate times – and after the sixth and last resilvering finishes, you have a 24TB raw / 16TB usable vdev in place of the original 6TB/4TB one. Question is, how long did it take to do all that resilvering? Well, if that 6TB raw vdev was nearly full, it’s not unreasonable to expect each resilvering to take twelve to sixteen hours… even if you’re doing absolutely nothing else with the system. The more you’re trying to actually do in the meantime, the slower the resilvering goes. You might manage to get six resilvers done in six full days, replacing one disk per day. But it might take twice that long or worse, depending on how willing to hover over the system you are, and how heavily loaded it is in the meantime.

What if you’d used mirror vdevs? Well, to start with, your original six drives would have given you 6TB raw / 3TB usable. So you did give up a terabyte there. But maybe you didn’t do such a big upgrade the first time you expanded. Maybe since you only needed to put in two more disks to get more storage, you only bought two 2TB drives, and by the time you were feeling cramped again the 4TB disks were available – and you still had four bays free. Eventually, though, you crammed the box full, and now you’re in that same position of wanting to upgrade those old tiny 1TB disks. You do it the same way – you replace, resilver, replace, resilver – but this time, you see the new space after only two resilvers. And each resilvering happens tremendously faster – it’s not unreasonable to expect nearly-full 1TB mirror vdevs to resilver in three or four hours. So you can probably upgrade an entire vdev in a single day, even without having to hover over the machine too crazily. The performance on the machine is hardly impacted during the resilver. And you see the new capacity after every two disks replaced, not every six.

TL;DR

Too many words, mister sysadmin. What’s all this boil down to?

don’t be greedy. 50% storage efficiency is plenty.
for a given number of disks, a pool of mirrors will significantly outperform a RAIDZ stripe.
a degraded pool of mirrors will severely outperform a degraded RAIDZ stripe.
a degraded pool of mirrors will rebuild tremendously faster than a degraded RAIDZ stripe.
a pool of mirrors is easier to manage, maintain, live with, and upgrade than a RAIDZ stripe.
BACK. UP. YOUR POOL. REGULARLY. TAKE THIS SERIOUSLY.

TL;DR to the TL;DR – unless you are really freaking sure you know what you’re doing… use mirrors. (And if you are really, really sure what you’re doing, you’ll probably change your mind after a few years and wish you’d done it this way to begin with.)

Will ZFS and non-ECC RAM kill your data?

This comes up far too often, so rather than continuing to explain it over and over again, I’m going to try to do a really good job of it once and link to it here.

What’s ECC RAM? Is it a good idea?

The ECC stands for Error Correcting Checksum. In a nutshell, ECC RAM is a special kind of server-grade memory that can detect and repair some of the most common kinds of in-memory corruption. For more detail on how ECC RAM does this, and which types of errors it can and cannot correct, the rabbit hole’s over here.

Now that we know what ECC RAM is, is it a good idea? Absolutely. In-memory errors, whether due to faults in the hardware or to the impact of cosmic radiation (yes, really) are a thing. They do happen. And if it happens in a particularly strategic place, you will lose data to it. Period. There’s no arguing this.

What’s ZFS? Is it a good idea?

ZFS is, among other things, a checksumming filesystem. This means that for every block committed to storage, a strong hash (somewhat misleadingly AKA checksum) for the contents of that block is also written. (The validation hash is written in the pointer to the block itself, which is also checksummed in the pointer leading to itself, and so on and so forth. It’s turtles all the way down. Rabbit hole begins over here for this one.)

Is this a good idea? Absolutely. Combine ZFS checksumming with redundancy or parity, and now you have a self-healing array. If a block is corrupt on disk, the next time it’s read, ZFS will see that it doesn’t match its checksum and will load a redundant copy (in the case of mirror vdevs or multiple copy storage) or rebuild a parity copy (in the case of RAIDZ vdevs), and assuming that copy of the block matches its checksum, will silently feed you the correct copy instead, and log a checksum error against the first block that didn’t pass.

ZFS also supports scrubs, which will become important in the next section. When you tell ZFS to scrub storage, it reads every block that it knows about – including redundant copies – and checks them versus their checksums. Any failing blocks are automatically overwritten with good blocks, assuming that a good (passing) copy exists, either redundant or as reconstructed from parity. Regular scrubs are a significant part of maintaining a ZFS storage pool against long term corruption.

Is ZFS and non-ECC worse than not-ZFS and non-ECC? What about the Scrub of Death?

OK, it’s pretty easy to demonstrate that a flipped bit in RAM means data corruption: if you write that flipped bit back out to disk, congrats, you just wrote bad data. There’s no arguing that. The real issue here isn’t whether ECC is good to have, it’s whether non-ECC is particularly problematic with ZFS. The scenario usually thrown out is the the much-dreaded Scrub Of Death.

TL;DR version of the scenario: ZFS is on a system with non-ECC RAM that has a stuck bit, its user initiates a scrub, and as a result of in-memory corruption good blocks fail checksum tests and are overwritten with corrupt data, thus instantly murdering an entire pool. As far as I can tell, this idea originates with a very prolific user on the FreeNAS forums named Cyberjock, and he lays it out in this thread here. It’s a scary idea – what if the very thing that’s supposed to keep your system safe kills it? A scrub gone mad! Nooooooo!

The problem is, the scenario as written doesn’t actually make sense. For one thing, even if you have a particular address in RAM with a stuck bit, you aren’t going to have your entire filesystem run through that address. That’s not how memory management works, and if it were how memory management works, you wouldn’t even have managed to boot the system: it would have crashed and burned horribly when it failed to load the operating system in the first place. So no, you might corrupt a block here and there, but you’re not going to wring the entire filesystem through a shredder block by precious block.

But we’re being cheap here. Say you only corrupt one block in 5,000 this way. That would still be hellacious. So let’s examine the more reasonable idea of corrupting some data due to bad RAM during a scrub. And let’s assume that we have RAM that not only isn’t working 100% properly, but is actively goddamn evil and trying its naive but enthusiastic best to specifically kill your data during a scrub:

First, you read a block. This block is good. It is perfectly good data written to a perfectly good disk with a perfectly matching checksum. But that block is read into evil RAM, and the evil RAM flips some bits. Perhaps those bits are in the data itself, or perhaps those bits are in the checksum. Either way, your perfectly good block now does not appear to match its checksum, and since we’re scrubbing, ZFS will attempt to actually repair the “bad” block on disk. Uh-oh! What now?

Next, you read a copy of the same block – this copy might be a redundant copy, or it might be reconstructed from parity, depending on your topology. The redundant copy is easy to visualize – you literally stored another copy of the block on another disk. Now, if your evil RAM leaves this block alone, ZFS will see that the second copy matches its checksum, and so it will overwrite the first block with the same data it had originally – no data was lost here, just a few wasted disk cycles. OK. But what if your evil RAM flips a bit in the second copy? Since it doesn’t match the checksum either, ZFS doesn’t overwrite anything. It logs an unrecoverable data error for that block, and leaves both copies untouched on disk. No data has been corrupted. A later scrub will attempt to read all copies of that block and validate them just as though the error had never happened, and if this time either copy passes, the error will be cleared and the block will be marked valid again (with any copies that don’t pass validation being overwritten from the one that did).

Well, huh. That doesn’t sound so bad. So what does your evil RAM need to do in order to actually overwrite your good data with corrupt data during a scrub? Well, first it needs to flip some bits during the initial read of every block that it wants to corrupt. Then, on the second read of a copy of the block from parity or redundancy, it needs to not only flip bits, it needs to flip them in such a way that you get a hash collision. In other words, random bit-flipping won’t do – you need some bit flipping in the data (with or without some more bit-flipping in the checksum) that adds up to the corrupt data correctly hashing to the value in the checksum. By default, ZFS uses 256-bit SHA validation hashes, which means that a single bit-flip has a 1 in 2^256 chance of giving you a corrupt block which now matches its checksum. To be fair, we’re using evil RAM here, so it’s probably going to do lots of experimenting, and it will try flipping bits in both the data and the checksum itself, and it will do so multiple times for any single block. However, that’s multiple 1 in 2^256 (aka roughly 1 in 10^77) chances, which still makes it vanishingly unlikely to actually happen… and if your RAM is that damn evil, it’s going to kill your data whether you’re using ZFS or not.

But what if I’m not scrubbing?

Well, if you aren’t scrubbing, then your evil RAM will have to wait for you to actually write to the blocks in question before it can corrupt them. Fortunately for it, though, you write to storage pretty much all day long… including to the metadata that organizes the whole kit and kaboodle. First time you update the directory that your files are contained in, BAM! It’s gotcha! If you stop and think about it, in this evil RAM scenario ZFS is incredibly helpful, because your RAM now needs to not only be evil but be bright enough to consistently pull off collision attacks. So if you’re running non-ECC RAM that turns out to be appallingly, Lovecraftianishly evil, ZFS will mitigate the damage, not amplify it.

If you are using ZFS and you aren’t scrubbing, by the way, you’re setting yourself up for long term failure. If you have on-disk corruption, a scrub can fix it only as long as you really do have a redundant or parity copy of the corrupted block which is good. Once you corrupt all copies of a given block, it’s too late to repair it – it’s gone. Don’t be afraid of scrubbing. (Well, maybe be a little wary of the performance impact of scrubbing during high demand times. But don’t be worried about scrubbing killing your data.)

I’ve constructed a doomsday scenario featuring RAM evil enough to kill my data after all! Mwahahaha!

OK. But would using any other filesystem that isn’t ZFS have protected that data? ‘Cause remember, nobody’s arguing that you can lose data to evil RAM – the argument is about whether evil RAM is more dangerous with ZFS than it would be without it.

I really, really want to use the Scrub Of Death in a movie or TV show. How can I make it happen?

What you need here isn’t evil RAM, but an evil disk controller. Have it flip one bit per block read or written from disk B, but leave the data from disk A alone. Now scrub – every block on disk B will be overwritten with a copy from disk A, but the evil controller will flip bits on write, so now, all of disk B is written with garbage blocks. Now start flipping bits on write to disk A, and it will be an unrecoverable wreck pretty quickly, since there’s no parity or redundancy left for any block. Your choice here is whether to ignore the metadata for as long as possible, giving you the chance to overwrite as many actual data blocks as you can before the jig is up as they are written to by the system, or whether to pounce straight on the metadata and render the entire vdev unusable in seconds – but leave the actual data blocks intact for possible forensic recovery.

Alternately you could just skip straight to step B and start flipping bits as data is written on any or all individual devices, and you’ll produce real data loss quickly enough. But you specifically wanted a scrub of death, not just bad hardware, right?

I don’t care about your logic! I wish to appeal to authority!

OK. “Authority” in this case doesn’t get much better than Matthew Ahrens, one of the cofounders of ZFS at Sun Microsystems and current ZFS developer at Delphix. In the comments to one of my filesystem articles on Ars Technica, Matthew said “There’s nothing special about ZFS that requires/encourages the use of ECC RAM more so than any other filesystem.”

Hope that helps. =)

Three Step Guide to X11 Forwarding

Got a graphical application you want to run on a Linux box, but display on a Windows box? It’s stupidly easy. I can’t believe how long it took me to learn how to do this, even though I knew it was possible to. Hopefully, this will save some other sysadmin from not having this trick in the toolbox. (It’s particularly useful for running virt-manager when you don’t have a Linux machine to sit in front of.)

Step 1: download and install Xming (probably from Softpedia, since Sourceforge is full of malware and BS misleading downloads now)

Step 2: in PuTTY’s configs on your Windows box, Connection –> SSH –> X11 –> check the “Enable X11 Forwarding” box.

Step 3: SSH into a Linux box, and run a GUI application from the command line. Poof, the app shows up on your Windows desktop!

Avahi killed my server :'(

Avahi is the equivalent to Apple’s “Bonjour” zeroconf network service. It installs by default with the ubuntu-desktop meta-package, which I generally use to get, you guessed it, a full desktop on virtualization host servers. This never caused me any issues until today.

Today, though – on a server with dual network interfaces, both used as bridge ports on its br0 adapter – Avahi apparently decided “screw the configuration you specified in /etc/network/interfaces, I’m going to give your production virt host bridge an autoconf address. Because I want to be helpful.”

When it did so, the host dropped off the network, I got alarms on my monitoring service, and I couldn’t so much as arp the host, much less log into it. So I drove down to the affected office and did an ifconfig br0, which showed me the following damning bit of evidence:

me@box:~$ ifconfig br0
br0       Link encap:Ethernet  HWaddr 00:0a:e4:ae:7e:4c
         inet6 addr: fe80::20a:e4ff:feae:7e4c/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:11 errors:0 dropped:0 overruns:0 frame:0
         TX packets:96 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:3927 (3.8 KB)  TX bytes:6970 (6.8 KB)

br0:avahi Link encap:Ethernet  HWaddr 00:0a:e4:ae:7e:4c
         inet addr:169.254.6.229  Bcast:169.254.255.255  Mask:255.255.0.0
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

Oh, Avahi, you son-of-a-bitch. Was there anything wrong with the actual NIC? Certainly didn’t look like it – had link lights on the NIC and on the switch, and sure enough, ifdown br0 ; ifup br0 brought it right back online again.

Can we confirm that avahi really was the culprit?

/var/log/syslog:Jan  9 09:10:58 virt0 avahi-daemon[1357]: Withdrawing address record for [redacted IP] on br0.
/var/log/syslog:Jan  9 09:10:58 virt0 avahi-daemon[1357]: Leaving mDNS multicast group on interface br0.IPv4 with address [redacted IP].
/var/log/syslog:Jan  9 09:10:58 virt0 avahi-daemon[1357]: Interface br0.IPv4 no longer relevant for mDNS.
/var/log/syslog:Jan  9 09:10:59 virt0 avahi-autoipd(br0)[12460]: Found user 'avahi-autoipd' (UID 111) and group 'avahi-autoipd' (GID 121).
/var/log/syslog:Jan  9 09:10:59 virt0 avahi-autoipd(br0)[12460]: Successfully called chroot().
/var/log/syslog:Jan  9 09:10:59 virt0 avahi-autoipd(br0)[12460]: Successfully dropped root privileges.
/var/log/syslog:Jan  9 09:10:59 virt0 avahi-autoipd(br0)[12460]: Starting with address 169.254.6.229
/var/log/syslog:Jan  9 09:11:03 virt0 avahi-autoipd(br0)[12460]: Callout BIND, address 169.254.6.229 on interface br0
/var/log/syslog:Jan  9 09:11:03 virt0 avahi-daemon[1357]: Joining mDNS multicast group on interface br0.IPv4 with address 169.254.6.229.
/var/log/syslog:Jan  9 09:11:03 virt0 avahi-daemon[1357]: New relevant interface br0.IPv4 for mDNS.
/var/log/syslog:Jan  9 09:11:03 virt0 avahi-daemon[1357]: Registering new address record for 169.254.6.229 on br0.IPv4.
/var/log/syslog:Jan  9 09:11:07 virt0 avahi-autoipd(br0)[12460]: Successfully claimed IP address 169.254.6.229

I know I said this already, but – oh, avahi, you worthless son of a bitch!

Next step was to kill it and disable it.

me@box:~$ sudo stop avahi-daemon
me@box:~$ echo manual | sudo tee /etc/init/avahi-daemon.override

Grumble grumble grumble. Now I’m just wondering why I’ve never had this problem before… I suspect it’s something to do with having dual NICs on the bridge, and one of them not being plugged in (I only added them both so it wouldn’t matter which one actually got plugged in if the box ever got moved somewhere).

The SSLv3 “POODLE” attack in a (large) nutshell

A summary of the POODLE sslv3 vulnerability and attack:

A vulnerability has been discovered in a decrepit-but-still-widely-supported version of SSL, SSLv3, which allows an attacker a good chance at determining the true value of a single byte of encrypted traffic. This is of limited use in most applications, but in HTTPS (eg your web browser, many mobile applications, etc) an attacker in an MITM (Man-In-The-Middle) position, such as someone operating a wireless router you connect to, can capture and resend the traffic repeatedly until they manage to get a valuable chunk of it assembled in the clear. (This is done by manipulating cleartext traffic, to the same or any other site, injecting some Javascript in that traffic to get your browser to run it. The rogue JS function is what reloads the secure site, offscreen where you can’t see it happening, until the attacker gets what s/he needs out of it.)

That “valuable chunk” is the cookie that validates your user login on whatever secure website you happen to be browsing – your bank, webmail, ebay or amazon account, etc. By replaying that cookie, the attacker can now hijack your logged in session directly on his/her own device, and from there can do anything that you would be able to do – make purchases, transfer funds, change the password, change the associated email account, et cetera.

It reportedly takes 60 seconds or less for an attacker in a MITM position (again, typically someone in control of a router your traffic is being directed through, which is most often going to be a wireless router – maybe even one you don’t realize you’ve connected to) to replay traffic enough to capture the cookie using this attack.

Worth noting: SSLv3 is hopelessly obsolete, but it’s still widely supported in part because IE6/Windows XP need it, and so many large enterprises STILL are using IE6. Many sites and servers have proactively disabled SSLv3 for quite some time already, and for those, you’re fine. However, many large sites still have not – a particularly egregious example being Citibank, to whom you can still connect with SSLv3 today. As long as both your client application (web browser) and the remote site (web server) both support SSLv3, a MITM can force a downgrade dance, telling each side that the OTHER side only supports SSLv3, forcing that protocol even though it’s strongly deprecated.

I’m an end user – what do I do?

Disable SSLv3 in your browser. If you use IE, there’s a checkbox in Internet Options you can uncheck to remove SSLv3 support. If you use Firefox, there’s a plugin for that. If you use Chrome, you can start Chrome with a command-line option that disables SSLv3 for now, but that’s kind of a crappy “fix”, since you’d have to make sure to start Chrome either from the command line or from a particular shortcut every time (and, for example, clicking a link in an email that started up a new Chrome instance would fail to do so).

Instructions, with screenshots, are available at https://zmap.io/sslv3/ and I won’t try to recreate them here; they did a great job.

I will note specifically here that there’s a fix for Chrome users on Ubuntu that does fairly trivially mitigate even use-cases like clicking a link in an email with the browser not already open:

* Open /usr/share/applications/google-chrome.desktop in a text editor * For any line that begins with "Exec", add the argument --ssl-version-min=tls1 * For instance the line "Exec=/usr/bin/google-chrome-stable %U" should become "Exec=/usr/bin/google-chrome-stable --ssl-version-min=tls1 %U

You can test to see if your fix for a given browser worked by visiting https://zmap.io/sslv3/ again afterwards – there’s a banner at the top of the page which will warn you if you’re vulnerable. WARNING, caching is enabled on that page, meaning you will have to force-refresh to make certain that you aren’t seeing the old cached version with the banner intact – on most systems, pressing ctrl-F5 in your browser while on the page will do the trick.

I’m a sysadmin – what do I do?

Disable SSLv3 support in any SSL-enabled service you run – Apache, nginx, postfix, dovecot, etc. Worth noting – there is currently no known way to usefully exploit the POODLE vulnerability with IMAPS or SMTPS or any other arbitrary SSL-wrapped protocol; currently HTTPS is the only known protocol that allows you to manipulate traffic in a useful enough way. I would not advise banking on that, though. Disable this puppy wherever possible.

The simplest way to test if a service is vulnerable (at least, from a real computer – Windows-only admins will need to do some more digging):

openssl s_client -connect mail.jrs-s.net:443 -ssl3

The above snippet would check my mailserver. The correct (sslv3 not available) response begins with a couple of error lines:

CONNECTED(00000003) 140301802776224:error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure:s3_pkt.c:1260:SSL alert number 40 140301802776224:error:1409E0E5:SSL routines:SSL3_WRITE_BYTES:ssl handshake failure:s3_pkt.c:596:

What you DON’T want to see is a return with a certificate chain in it:

CONNECTED(00000003) depth=1 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = PositiveSSL CA 2 verify error:num=20:unable to get local issuer certificate verify return:0 --- Certificate chain 0 s:/OU=Domain Control Validated/OU=PositiveSSL/CN=mail.jrs-s.net i:/C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=PositiveSSL CA 2 1 s:/C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=PositiveSSL CA 2 i:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root

On Apache on Ubuntu, you can edit /etc/apache2/mods-available/ssl.conf and find the SSLProtocol line and change it to the following:

SSLProtocol all -SSLv2 -SSLv3

Then restart Apache with /etc/init.d/apache2 restart, and you’re golden.

I haven’t had time to research Postfix or Dovecot yet, which are my other two big concerns (even though they theoretically shouldn’t be vulnerable since there’s no way for the attacker to manipulate SMTPS or IMAPS clients into replaying traffic repeatedly).

Possibly also worth noting – I can’t think of any way for an attacker to exploit POODLE without access to web traffic running both in the clear and in a Javascript-enabled browser, so if you wanted to disable Javascript completely (which is pretty useless since it would break the vast majority of the web) or if you’re using a command-line tool like wget for something, it should be safe.

Allowing traceroutes to succeed with iptables

There is a LOT of bogus half-correct information about traceroutes and iptables floating around out there. It took me a bit of sifting through it all to figure out the real deal and the best way to allow traceroutes without negatively impacting security this morning, so here’s some documentation in case I forget before the next time.

Traceroute from Windows machines typically uses ICMP Type 8 packets. Traceroute from Unixlike machines typically uses UDP packets with sequentially increasing destination ports, from 33434 to 33534. So your server (the traceroute destination) must not drop incoming ICMP Type 8 or UDP 33434:33534.

Here’s where it gets tricky: it really doesn’t need to accept those packets either, which the vast majority of sites addressing this issue recommends. It just needs to be able to reject them, which won’t happen if they’re being dropped. If you implement the typical advice – accepting those packets – traceroute basically ends up sort of working by accident: those ports shouldn’t be in use by any running applications, and since nothing is monitoring them, the server will issue an ICMP Type 3 response (destination unreachable). However, if you’re accepting packets to these ports, then a rogue application listening on those ports also becomes reachable – which is the sort of thing your firewall should be preventing in the first place.

The good news is, DROP and ACCEPT aren’t your only options – you can REJECT these packets instead, which will do exactly what we want here: allow traceroutes to work properly without also potentially enabling some rogue application to listen on those UDP ports.

So all you really need on your server to allow incoming traceroutes to work properly is:

# allow ICMP Type 8 (ping, ICMP traceroute)
-A INPUT -p icmp --icmp-type 8 -j ACCEPT
# enable UDP traceroute rejections to get sent out
-A INPUT -p udp --dport 33434:33523 -j REJECT

Note: you may very well need and/or want more ICMP functionality than this in general – but this is all you need for incoming traceroutes to complete properly.

OpenVPN on BeagleBone Black

I needed an inexpensive embedded device for OpenVPN use, and my first thought (actually, my tech David’s first thought) was the obvious in this day and age: “Raspberry Pi.”

Unfortunately, the Pi didn’t really fit the bill. Aside from the unfortunate fact that my particular Pi arrived with a broken ethernet port, doing some quick network-less testing of OpenSSL gave me very disappointing 5mbps-ish numbers – 5mbps or so, running flat out, encryption alone, let alone any actual routing. This bore up with some reviews I found online, so I had to give up on the Pi as an embedded solution for OpenVPN use.

Luckily, that didn’t mean I was sunk yet – enter the Beaglebone Black. Beaglebone doesn’t get as much press as the Pi does, but it’s an interesting device with an interesting history – it’s been around longer than the Pi (more than ten years!), it’s fully open source where the Pi is not (hardware plans are published online, and other vendors are not only allowed but encouraged to build bit-for-bit identical devices!), and although it doesn’t have the video chops of the Pi (no 1080p resolution supported), it has a much better CPU – a 1GHZ Cortex A8, vs the Pi’s 700MHz A7. If all that isn’t enough, the Beaglebone also has built-in 2GB eMMC flash with a preloaded installation of Angstrom Linux, and – again unlike the Pi – directly supports being powered from plain old USB connected to a computer. Pretty nifty.

The only real hitch I had with my Beaglebone was not realizing that if I had an SD card in, it would attempt to boot from the SD card, not from the onboard eMMC. Once I disconnected my brand new Samsung MicroSD card and power cycled the Beaglebone, though, I was off to the races. It boots into Angstrom pretty quickly, and thanks to the inclusion of the Avahi daemon in the default installation, you can discover the device (from linux at least – haven’t tested Windows) by just pinging beaglebone.local. Once that resolves, ssh root@beaglebone.local with a default password, and you’re embedded-Linux-ing!

Angstrom doesn’t have any prebuilt packages for OpenVPN, so I downloaded the source from openvpn.net and did the usual ./configure ; make ; make install fandango. I did have one minor hitch – the system clock wasn’t set, so ./configure bombed out complaining about files in the future. Easily fixed – ntpdate us.pool.ntp.org updated my clock, and this time the package built without incident, needing somewhere south of 5 minutes to finish. After that, it was time to test OpenVPN’s throughput – which, spoiler alert, was a total win!

root@beaglebone:~# openvpn --genkey --secret beagle.key ; scp beagle.key me@locutus:/tmp/
root@beaglebone:~# openvpn --secret beagle.key --port 666 --ifconfig 10.98.0.1 10.98.0.2 --dev tun

me@locutus:/tmp$ sudo openvpn --secret beagle.key --remote beaglebone.local --port 666 --ifconfig 10.98.0.2 10.98.0.1 --dev tun

Now I have a working tunnel between locutus and my beaglebone. Opening a new terminal on each, I ran iperf to test throughput. To run iperf (which was already available on Angstrom), you just run iperf -s on the server machine, and run iperf -c [ip address] on the client machine to connect to the server. I tested connectivity both ways across my OpenVPN tunnel:

me@locutus:~$ iperf -c 10.98.0.1
------------------------------------------------------------
Client connecting to 10.98.0.1, TCP port 5001
TCP window size: 21.9 KByte (default)
------------------------------------------------------------
[ 3] local 10.98.0.2 port 55873 connected with 10.98.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.1 sec 46.2 MBytes 38.5 Mbits/sec

me@locutus:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.98.0.2 port 5001 connected with 10.98.0.1 port 32902
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 47.0 MBytes 39.2 Mbits/sec

38+ mbps from an inexpensive embedded device? I’ll take it!

Apache 2.4 / Ubuntu Trusty problems

Found out the hard way today that there’ve been SIGNIFICANT changes in configuration syntax and requirements since Apache 2.2, when I tried to set up a VERY simple couple of vhosts on Apache 2.4.7 on a brand new Ubuntu Trusty Tahr install.

First – the a2ensite/a2dissite scripts refuse to work unless your vhost config files end in .conf. BE WARNED. Example:

you@trusty:~$ ls /etc/apache2/sites-available
000-default.conf
default-ssl.conf
testsite.tld
you@trusty:~$ sudo a2ensite testsite.tld
ERROR: Site testsite.tld does not exist!

The solution is a little annoying; you MUST end the filename of your vhost configs in .conf – after that, a2ensite and a2dissite work as you’d expect.

you@trusty:~$ sudo mv /etc/apache2/sites-available/testsite.tld /etc/apache2/sites-available/testsite.tld.conf
you@trusty:~$ sudo a2ensite testsite.tld
Enabling site testsite.tld
To activate the new configuration, you need to run:
  service apache2 reload

After that, I had a more serious problem. The “site” I was trying to enable was nothing other than a simple exposure of a directory (a local ubuntu mirror I had set up) – no php, no cgi, nothing fancy at all. Here was my vhost config file:

<VirtualHost *:80>
        ServerName us.archive.ubuntu.com
        ServerAlias us.archive.ubuntu.local 
        Options Includes FollowSymLinks MultiViews Indexes
        DocumentRoot /data/apt-mirror/mirror/us.archive.ubuntu.com
	*lt;Directory /data/apt-mirror/mirror/us.archive.ubuntu.com/>
	        Options Indexes FollowSymLinks
	        AllowOverride None
	</Directory>
</VirtualHost>

Can’t get much simpler, right? This would have worked fine in any previous version of Apache, but not in Apache 2.4.7, the version supplied with Trusty Tahr 14.04 LTS.

Every attempt to browse the directory gave me a 403 Forbidden error, which confused me to no end, since the directories were chmod 755 and chgrp www-data. Checking Apache’s error log gave me pages on pages of lines like this:

[Mon Jun 02 10:45:19.948537 2014] [authz_core:error] [pid 27287:tid 140152894646016] [client 127.0.0.1:40921] AH01630: client denied by server configuration: /data/apt-mirror/mirror/us.archive.ubuntu.com/ubuntu/

What I eventually discovered was that since 2.4, Apache not only requires explicit authentication setup and permission for every directory to be browsed, the syntax has changed as well. The old “Order Deny, Allow” and “Allow from all” won’t cut it – you now need “Require all granted”. Here is my final working vhost .conf file:

<VirtualHost *:80>
        ServerName us.archive.ubuntu.com
        ServerAlias us.archive.ubuntu.local 
        Options Includes FollowSymLinks MultiViews Indexes
        DocumentRoot /data/apt-mirror/mirror/us.archive.ubuntu.com
	<Directory /data/apt-mirror/mirror/us.archive.ubuntu.com/>
	        Options Indexes FollowSymLinks
	        AllowOverride None
                Require all granted
	</Directory>
</VirtualHost>

Hope this helps someone else – this was a frustrating start to the morning for me.