Potentially malicious filenames

How can I rename all the files of a certain pattern to a different pattern in Bash?

This question comes up all the time. We’ll walk you through the common answers, how to make them bulletproof as possible, and how they fail. Then we’ll do it right, in Perl.

TL;DR – you cannot safely handle malicious filenames in bash scripts, you need a real language – like Perl – for that.

Simple, and wrong: for loops in Bash

Let’s say you have a simple set of files:

me@banshee:~/demo$ ls
test1.txt test2.txt test3.txt

Maybe you want to rename all the .txt files to .bin files. You could do this with a bash for loop:

for file in `ls test*.txt` ; do newfile=`echo $file | sed 's/\.txt$/.bin/'`; mv $file $newfile; done

For this simple set of files with no nasty pitfalls in the names, this works fine:

me@banshee:~/demo$ ls
test1.bin test2.bin test3.bin

But what if we want to make it ugly? Let’s add a couple of files with some really malicious names to our original list:

me@banshee:~/demo$ ls -l
total 2
-rw-rw-r-- 1 me me 0 Jun 20 17:19 test1.txt
-rw-rw-r-- 1 me me 0 Jun 20 17:19 test2.txt
-rw-rw-r-- 1 me me 0 Jun 20 17:19 test3.txt
-rw-rw-r-- 1 me me 0 Jun 20 17:24 test 4 $?!*--""'';:()[]{}.txt
-rw-rw-r-- 1 me me 0 Jun 20 17:26 test 5 ; rm * ; .txt

Ah, our old friend Little Bobby Tables! What happens if we run our naive little for loop on that set of files?

me@banshee:~/demo$ for file in `ls test*.txt`; do newfile=`echo $file | sed 's/\.txt$/.bin/'`; mv $file $newfile; done
mv: cannot stat ‘test’: No such file or directory
mv: cannot stat ‘4’: No such file or directory
mv: cannot stat ‘lolwtf’: No such file or directory
mv: cannot stat ‘$?!*--;:(){}[].txt’: No such file or directory
mv: cannot stat ‘test’: No such file or directory
mv: cannot stat ‘5’: No such file or directory
mv: cannot stat ‘;’: No such file or directory
mv: cannot stat ‘rm’: No such file or directory
mv: cannot stat ‘test1.txt’: No such file or directory
mv: cannot stat ‘test2.txt’: No such file or directory
mv: cannot stat ‘test3.txt’: No such file or directory
mv: target ‘$?!*--;:(){}[].bin’ is not a directory
mv: target ‘.bin’ is not a directory
mv: cannot stat ‘;’: No such file or directory
mv: cannot stat ‘.txt’: No such file or directory
me@banshee:~/demo$ ls -l
total 2
-rw-rw-r-- 1 me me 0 Jun 20 17:19 test1.bin
-rw-rw-r-- 1 me me 0 Jun 20 17:19 test2.bin
-rw-rw-r-- 1 me me 0 Jun 20 17:19 test3.bin
-rw-rw-r-- 1 me me 0 Jun 20 17:24 test 4 lolwtf $?!*--""'';:()[]{}.txt
-rw-rw-r-- 1 me me 0 Jun 20 17:26 test 5 ; rm * ; .txt

Well, it could have been worse. At least we didn’t end up actually executing that ; rm * ; we snuck in there. Still, the operation clearly didn’t complete successfully – and if it isn’t making you nervous, it damn well should be. So now what?

Complex, and wrong: while read loops in Bash

The for loop broke on spaces, meaning any filename with a space in it would get handled as separate pieces. Upgrading from a for loop to a while read loop will mitigate that. Adding some strategic encapsulation with “double quotes” will mitigate a few more expansion issues beyond that.

me@banshee:~/demo$ ls *.txt | while read file; do newfile=`echo "$file" | sed 's/\.txt$/.bin/'`; mv "$file" "$newfile"; done

Let’s look at each piece of this puzzle in order:

  • ls *.txt
    I think you can figure this one out.

  • while read file; do
    for each complete line of the input, put it in a variable named $file, then execute this loop.

  • newfile=`echo “$file” | sed ‘s/\.txt$/.bin/’`
    Actually we need to break this down further:

    • newfile=“ – execute the contents of the `backticks` as a command, and put the output in the variable $newfile.
    • echo “$file” | – dump the contents of the variable $echo into the input of the command after the pipe. Note how we encapsulated “$file” with “double quotes” here – this prevents bash from expanding the contents of $file, as well as expanding $file itself. This is worth a demonstration of its own:
      me@banshee:/tmp/tmp$ touch 1 2 3
      me@banshee:/tmp/tmp$ ls 
      1  2  3
      me@banshee:/tmp/tmp$ echo 1 * 3
      1 1 2 3 3
      me@banshee:/tmp/tmp$ echo "1 * 3"
      1 * 3
      

      See?

    • sed ‘s/\.txt$/.bin/’ – we’re asking sed to replace any instance of .txt at the end of the line only with .bin. The $ in \.txt$ tells us only to match if .txt is at the end of the line. The \ in \.txt$ tells sed to “escape” the period character, which otherwise would be a wildcard that matches ANY character in the input. We don’t need to escape the period character in .bin, since that’s in the replacement section, where wildcard characters would never be interpreted anyway.
    • mv “$file” “$newfile” – fairly self-explanatory, but note again the “double” “quotes” encapsulation of “$file” and “$newfile” – it’s necessary; this tells bash that the variables $file and $newfile are one single argument apiece, no matter how many spaces, wildcard characters, or anything else might be present once $file and $newfile are expanded to their literal text. You might think that the double quotes in that nasty test 4 lolwtf \$?!*–“””;:()[]{}.txt filename would still break this – but, thankfully, you’d be wrong.
    • done – this justcompletes the loop.

    But did it work?

    me@banshee:~/demo$ ls -l
    total 3
    -rw-rw-r-- 1 me me 0 Jun 20 17:19 test1.bin
    -rw-rw-r-- 1 me me 0 Jun 20 17:19 test2.bin
    -rw-rw-r-- 1 me me 0 Jun 20 17:19 test3.bin
    -rw-rw-r-- 1 me me 0 Jun 20 17:24 test 4 lolwtf $?!*--""'';:()[]{}.bin
    -rw-rw-r-- 1 me me 0 Jun 20 17:27 test 5 ; rm * ; .bin
    

    Woohoo!

    There’s still one trick we haven’t thrown at our little bash loop, though, and that’s escape characters in the filenames. Behold, the mighty backslash:

    me@banshee:~/demo$ ls -l | grep 6
    -rw-rw-r-- 1 me me 0 Jun 20 18:17 test 6 \$?!*.txt
    

    Will it blend?

    me@banshee:~/demo$ ls *.txt | while read file; do newfile="`echo "$file" | sed 's/\.txt$/.bin/'`"; mv "$file" "$newfile"; done
    mv: cannot stat ‘test 6 $?!*.txt’: No such file or directory

    ‘Fraid not – we successfully handled wildcards, quotes, and shell specials, but the lowly backslash brought us down with ease. If there’s a safe and sane way to handle that in Bash, I don’t know it. Also, by now my brain hurts with all the double quoting and the tricky loop selection and the AAAAIIIIEEEE make it stop already!

    So how DO you REALLY safely handle insanely dirty filenames? With Perl. Duh.

    Simple, and correct: while loops in Perl

    #!/usr/bin/perl
    
    # demonstration - rename all files in the current directory ending in .txt
    #                 with the same filename, but ending in .bin
    
    # open the current directory, or die with a useful error message if we can't
    opendir (DIR, '.') or die $!;
    
    # loop through the current directory, file by file
    while (my $file = readdir(DIR)) {
    	# don't do anything unless this is a .txt file
    	if ($file =~ /\.txt$/) {
    		# first set $newfile to $file
    		my $newfile = $file;
    		# now replace .txt at the end of $newfile with .bin
    		$newfile =~ s/\.txt$/.bin/;
    		# now rename the actual file from .txt to .bin
    		rename $file, $newfile;
    	}
    }
    # close the current directory
    closedir (DIR);
    

    This is a lot easier to read and understand. There’s no crazy double and triple encapsulation, we can properly indent things, we can see what the hell we’re doing. Yes.

    But does it work?

    me@banshee:~/demo$ perl /tmp/test.pl
    me@banshee:~/demo$ ls -l
    total 3
    -rw-rw-r-- 1 me me   0 Jun 20 17:19 test1.bin
    -rw-rw-r-- 1 me me   0 Jun 20 17:19 test2.bin
    -rw-rw-r-- 1 me me   0 Jun 20 17:19 test3.bin
    -rw-rw-r-- 1 me me   0 Jun 20 17:24 test 4 lolwtf $?!*--;:(){}[].bin
    -rw-rw-r-- 1 me me   0 Jun 20 17:27 test 5 ; rm * ; .bin
    -rw-rw-r-- 1 me me   0 Jun 20 18:17 test 6 \$?!*.bin
    

    Damn right it does – they don’t call Perl the Swiss Army Chainsaw for nothin’.

    endfile TL;DR: don’t try to handle potentially-malicious filenames with Bash in the first place – handle them with Perl.

PSA: Snapshots are better than ZVOLs

A lot of people new to ZFS, and even a lot of people not-so-new to ZFS, like to wax ecstatic about ZVOLs. But they never seem to mention the very real pitfalls ZVOLs present.

What’s a ZVOL?

Well, if you know what LVM is, a ZVOL is like an LV, but for ZFS. If you don’t know what LVM is, you can think of a ZVOL as, basically, a dynamically allocated “raw partition” inside ZFS. Unlike a normal dataset, a ZVOL doesn’t have a filesystem of its own. And you can access it by a raw devicename, like /dev/zvol/poolname/zvolname. This looks ideal for those use-cases where you want to nest a legacy filesystem underneath ZFS – for example, virtual machine images. Once you have the ZVOL, you have a raw block storage device to interact with – think mkfs.ext4 /dev/zvol/poolname/zvolname, for example – but you still get all the normal ZFS features along with it, like data integrity, compression, snapshots, and so forth. Plus you don’t have to mess with a loopback device, so that should be higher performance, right? What’s not to love?

ZVOLs perform better, though, right?

AFAICT, the increased performance is pretty much a lie. I’ve benchmarked ZVOLs pretty extensively against raw disk partitions, raw LVs, raw files, and even .qcow2 files and there really isn’t much of a performance difference to be seen. A partially-allocated ZVOL isn’t going to perform any better than a partially-allocated .qcow2 file, and a fully-allocated ZVOL isn’t going to perform any better than a fully-allocated .qcow2 file. (Raw disk partitions or LVs don’t really get any significant boost, either.)

Let’s talk about snapshots.

If snapshots aren’t one of the biggest reasons you’re using ZFS, they should be, and ZVOLs and snapshots are really, really tricky and weird. If you have a dataset that’s occupying 85% of your pool, you can snapshot that dataset any time you like. If you have a ZVOL that’s occupying 85% of your pool, you cannot snapshot it, period. This is one of those things that both tyros and vets tend to immediately balk at – I must be misunderstanding something, right? Surely it doesn’t work that way? Afraid it does.

Ooh, is it demo-in-a-VM-time again?! =)

root@xenial:~# zfs create target/dataset -o compress=off -o quota=15G
root@xenial:~# pv < /dev/zero > /target/dataset/0.bin
  15GiB 0:01:13 [10.3MiB/s] [            <=>                            ]
pv: write failed: Disk quota exceeded
root@xenial:~# zfs list
NAME             USED  AVAIL  REFER  MOUNTPOINT
target          15.3G  3.93G    19K  /target
target/dataset  15.0G      0  15.0G  /target/dataset
root@xenial:~# zfs snapshot target/dataset@1
root@xenial:~# 

Above, we created a dataset on a 20G pool, we dumped 15G of data into it, and we snapshotted the dataset. No surprises here, this is exactly what we expect.

But what happens when we try the same thing with a ZVOL?

root@xenial:~# zfs create target/zvol -V 15G -o compress=off
root@xenial:~# pv < /dev/zero > /dev/zvol/target/zvol
  15GiB 0:03:22 [57.3MiB/s] [========================================>  ] 99% ETA 0:00:00
pv: write failed: No space left on device
NAME          USED  AVAIL  REFER  MOUNTPOINT
target       15.8G  3.46G    19K  /target
target/zvol  15.5G  3.90G  15.0G  -
root@xenial:~# zfs snapshot target/zvol@1
cannot create snapshot 'target/zvol@1': out of space

Despite having 3.9G free on our pool, we can’t snapshot the zvol. If you don’t have at least as much free space in a pool as the REFER of a ZVOL on that pool, you can’t snapshot the ZVOL, period. This means for our little baby demonstration here we’d need 15G free to snapshot our 15G ZVOL. In a real-world situation with VM images, this could easily be a case where you can’t snapshot your 15TB VM image without 15 terabytes of free space available – where if you’d stuck with standard datasets, you’d be able to snapshot that same 15TB VM even with just a few hundred megabytes of AVAIL at your disposal.

TL;DR:

Think long and hard before you implement ZVOLs. Then, you know… don’t.

Could not complete SSL handshake, Socket timeout after 10 seconds with Nagios and NSClient++ 0.4x

Adding a couple of new Windows hosts to my monitoring network this morning, my NRPE plugin checks against them were failing.

me@nagios:~$ /usr/lib/nagios/plugins/check_nrpe -H monitoredwindowsserver.vpn
CHECK_NRPE: Error - could not complete SSL handshake.

The usual culprits – password set but not being used, or hosts_allowed not including the host doing the checking – were already set correctly.

Turns out the NSClient++ folks changed up the default configs. In order to get it working again with a relatively vanilla Nagios server on the other end, I needed to set two new directives under [/settings/NRPE/server]:

verify mode = none
insecure = true

With those directives set and a restart to the nsclient service on the Windows end, manual tests to NRPE worked properly:

me@nagios:~$ /usr/lib/nagios/plugins/check_nrpe -H monitoredwindowsserver.vpn
I (0.4.4.19 2015-12-08) seem to be doing fine...

One of these days I should figure out how to get the default modes, which use peer-to-peer certificate checks, working… but for the moment, I’m only allowing traffic over a VPN tunnel anyway, so it was more important to get it working in its existing (secured by VPN) configuration than to blow a few hours untangling the new defaults.

Wasn’t out of the woods yet, though – that got NRPE working, but not NSClientServer, which is what I’m actually using to monitor these Windows hosts for the most part. So I was still seeing “CRITICAL – Socket timeout after 10 seconds” on a lot of tests against the new hosts in Nagios. Doing a netstat -an on the Windows hosts themselves showed that they were listening on 5666 – the NRPE port – but nothing was listening on 12489, the NSClientServer port. This required another fix in the nsclient.ini file. Just underneath [/modules], you’ll need to add (not just uncomment!) this line:

NSClientServer = enabled

And restart the NSClient++ service again, either from Services applet or with net stop nscp ; net start nscp. Now we test the check_nt plugin against the host…

me@nagios:~$ /usr/lib/nagios/plugins/check_nt -H monitoredwindowsserver.vpn -v UPTIME -p 12489
System Uptime - 0 day(s) 1 hour(s) 34 minute(s)

Now, finally, all of my tests are working. Hope this saves somebody else from having the kind of morning I had!

ZFS: Practicing failures on virtual hardware

I always used to sweat, and sweat bullets, when it came time to replace a failed disk in ZFS. It happened infrequently enough that I never did remember the syntax quite right in between issues, and the last thing you want to do with production hardware and data is fumble around in a cold sweat – you want to know what the correct syntax for everything is ahead of time, and be practiced and confident.

Wonderfully, there’s no reason for that anymore – particularly on Linux, which boasts a really excellent set of tools for simulating storage hardware quickly and easily. So today, we’re going to walk through setting up a pool, destroying one of the disks in it, and recovering from the failure. The important part here isn’t really the syntax for the replacement, though… it’s learning how to set up the simulation in the first place, which allows you to test lots of things properly when your butt isn’t on the line!

Prerequisites

Obviously, you need to have ZFS available. If you’re on Ubuntu Xenial, you can get it with apt update ; apt install zfs-linux. If you’re on an earlier version of Ubuntu, you’ll need to add the zfs-native PPA from the ZFS on Linux project, and install from there: apt-add-repository ppa:zfs-native/stable ; apt update ; apt install ubuntu-zfs.

You’ll also need the QEMU tools, to create and manage .qcow2 storage files, and qemu-nbd loopback type devices that access them like real hardware. Again on Ubuntu, that’s apt update ; apt install qemu-utils.

If you aren’t running Ubuntu, the tools are definitely still available, but you’ll need to consult your own distro’s documentation / forums / etc to figure out how to install ’em.

Creating the virtual disk back-end

First up, we create the back end files for the storage. In today’s example, that’ll be a pair of 1GB disks.

root@banshee:~# qemu-img create -f qcow2 /tmp/0.qcow2 1G ; qemu-img create -f qcow2 /tmp/1.qcow2 1G
Formatting '/tmp/0.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off 
Formatting '/tmp/1.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off 

That does exactly what it looks like: creates a pair of 1GB virtual disks, /tmp/0.qcow2 and /tmp/1.qcow2. Qcow2 files are sparsely allocated, so they don’t actually take up any room at all yet – they’ll expand as and if you put data in them. (If you omit the -f qcow2 argument, qemu-img will create fully allocated RAW files instead, which will take longer.)

Creating the virtual disk devices

By themselves, our .qcow2 files don’t help us much. In order to use them as disks with ZFS, what we really need are block devices, in the /dev filesystem, which reference our qcow2 files. That’s where qemu-nbd comes in.

root@banshee:~# modprobe nbd max_part=63

You might or might not actually need that bit – but it never hurts. This makes sure the nbd kernel module is loaded, and, for safety’s sake, that it won’t try to load more than 63 partitions on a single virtual device. This might keep your system from crashing if you try to access a stupendously corrupt or maliciously crafted qcow2 file – I’m not sure what happens if a system thinks it has devices sda1 through sda65537, and I’d really rather not find out!

root@banshee:~# qemu-nbd -c /dev/nbd0 /tmp/0.qcow2 ; qemu-nbd -c /dev/nbd1 /tmp/1.qcow2

Easy peasy – we now have virtual disks /dev/nbd0 and /dev/nbd1, which can be accessed by ZFS (or any other filesystem or linux system utility) just like any other disk would be.

Setting up a zpool

This looks just like setting up any other pool. We’ll use the ashift argument to make the pool use 4K blocks, since that matches my underlying block device (which might or might not really matter, but it’s a good habit to get into anyway, and this is all about building good habits and muscle memory, right?)

root@banshee:~# zpool create -oashift=12 test mirror /dev/nbd0 /dev/nbd1
root@banshee:~# zpool status test
  pool: test
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    nbd0    ONLINE       0     0     0
	    nbd1    ONLINE       0     0     0

errors: No known data errors

Easy peasy! We now have a pool with a simple two-disk mirror vdev.

Playing with topology: add more devices

What if we wanted to make it a pool of mirrors, with a second mirror vdev?

root@banshee:~# qemu-img create -f qcow2 /tmp/2.qcow2 1G ; qemu-img create -f qcow2 /tmp/3.qcow2 1G
Formatting '/tmp/2.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off 
Formatting '/tmp/3.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off 
root@banshee:~# qemu-nbd -c /dev/nbd2 /tmp/2.qcow2 ; qemu-nbd -c /dev/nbd3 /tmp/3.qcow2
root@banshee:~# zpool add -oashift=12 test mirror /dev/nbd2 /dev/nbd3
root@banshee:~# zpool status test
  pool: test
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    nbd0    ONLINE       0     0     0
	    nbd1    ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    nbd2    ONLINE       0     0     0
	    nbd3    ONLINE       0     0     0

errors: No known data errors

Couldn’t be easier.

Save some data on your new virtual hardware

Before we do anything else, let’s store some data on the new pool, so that we have something to detect corruption on later. (ZFS doesn’t checksum or scrub empty blocks, so won’t find corruption in them.)

root@banshee:~# pv < /dev/urandom > /test/urandom.bin
 408MB 0:00:37 [  15MB/s] [                                          <=>                    ]
^C

I used the pipe viewer utility here, pv, which is available on Ubuntu with apt update ; apt install pv.

The ^C is because I hit control-C to interrupt it once I felt that I’d written enough data. In this case, a bit over 400MB of random data, saved on our new pool in the file /test/urandom.bin.

root@banshee:~# zpool list test
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
test  1.97G   414M  1.56G         -    26%    20%  1.00x  ONLINE  -

root@banshee:~# zfs list test
NAME   USED  AVAIL  REFER  MOUNTPOINT
test   413M  1.50G   413M  /test

root@banshee:~# ls -lh /test
total 413M
-rw-r--r-- 1 root root 413M May 16 12:12 urandom.bin

Gravy.

I just want to kill something beautiful

What happens if a drive or a controller port goes berserk, and overwrites large swathes of a disk with zeroes? No time like the present to find out!

We don’t really want to write directly over a .qcow2 file, since our goal today is to test ZFS, not to test the QEMU infrastructure itself. We want to write to the actual device we created, and let the device put the data in the .qcow2 file. Let’s pick on /dev/nbd0, backed by /tmp/0.qcow2 – it looks uppity and in need of some comeuppance.

root@banshee:~# pv < /dev/zero > /dev/nbd0
 777MB 0:00:06 [48.9MB/s] [       <=>                                                     ]
^C

Boom. Errors galore. Does ZFS know about them yet?

root@banshee:~# zpool status test
  pool: test
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    nbd0    ONLINE       0     0     0
	    nbd1    ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    nbd2    ONLINE       0     0     0
	    nbd3    ONLINE       0     0     0

errors: No known data errors

Nope! We really, really did simulate on-disk corruption, which ZFS has no way of knowing about until it tries to actually access the data.

Easter egging for errors

So, let’s actually access the data by reading in the entire /test/urandom.bin file, and then check our status:

root@banshee:~# pv < /test/urandom.bin > /dev/null
 412MB 0:00:00 [6.99GB/s] [=============================================>] 100%            
root@banshee:~# zpool status test
  pool: test
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    nbd0    ONLINE       0     0     0
	    nbd1    ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    nbd2    ONLINE       0     0     0
	    nbd3    ONLINE       0     0     0

errors: No known data errors

Hey – still nothing! Why not? Because /test/urandom.qcow2 is still in the ARC, that’s why! ZFS didn’t actually need to hit the storage to hand us the file, so it didn’t – and we still haven’t detected the massive amount of corruption we injected into /dev/nbd0.

We can get ZFS to actually look for the problem in a couple of ways. The cruder way is to export the pool and reimport it, which will have the side effect of dumping the ARC as well. The more professional way is to scrub the pool, which is a technique with the explicit design of reading and verifying every block. But first, let’s see what happens just reading from the storage normally, after the ARC isn’t holding the data anymore:

root@banshee:~# zpool export test
root@banshee:~# zpool import test
root@banshee:~# zpool status test
  pool: test
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    nbd0    ONLINE       0     0     4
	    nbd1    ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    nbd2    ONLINE       0     0     0
	    nbd3    ONLINE       0     0     0

errors: No known data errors

Ooh, we’ve already found a few errors – we injected so much corruption that we nailed a few of ZFS’ metadata blocks on /dev/nbd0, which were found immediately on importing the pool again. There were redundant copies of the metadata blocks available, though, so ZFS repaired the errors it found already with those.

Now that we’ve exported and imported the pool, which also dumped the ARC, let’s re-read the file in its entirety again, then see what our status looks like:

root@banshee:~# pv < /test/urandom.bin > /dev/null
 412MB 0:00:00 [ 563MB/s] [================================================>] 100%            
root@banshee:~# zpool status test
  pool: test
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    nbd0    ONLINE       0     0     9
	    nbd1    ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    nbd2    ONLINE       0     0     0
	    nbd3    ONLINE       0     0     0

errors: No known data errors

Five more blocks detected corrupt. Doesn’t seem like a lot, does it? Keep in mind, ZFS is only finding blocks that we specifically attempt to read from right now – so a lot of what would be corrupt blocks on /dev/nbd0, we actually read from /dev/nbd1 instead. And only half-ish of the data was saved to mirror-0 in the first place – the rest went to mirror-1. So, ZFS is finding and repairing corrupt blocks… but only a few of them so far. This was worth playing with to see what happens with undetected errors in normal use, but now that we’ve done this, let’s look for errors the right way.

It isn’t really clean until it’s been scrubbed

When we scrub the pool, we explicitly read every single block that’s been written and is actively in use on every single device, all at once.

So far, we’ve found nine corrupt blocks just poking around the system like we normally would in normal use. Will a scrub find more?

root@banshee:~# zpool scrub test
root@banshee:~# zpool status test
  pool: test
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 170M in 0h0m with 0 errors on Mon May 16 12:32:00 2016
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    nbd0    ONLINE       0     0 1.40K
	    nbd1    ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    nbd2    ONLINE       0     0     0
	    nbd3    ONLINE       0     0     0

errors: No known data errors

There we go! We just went from 9 checksum errors, to 1.4 thousand checksum errors detected.

And that, folks, is why you want to regularly scrub your pool.

Can we do worse? What happens if we blow through every block on /dev/nbd0, instead of “just” 3/4 or so of them?

root@banshee:~# pv < /dev/zero > /dev/nbd0
pv: write failed: No space left on device<=>                                                 ]
root@banshee:~# zpool scrub test
root@banshee:~# zpool status test
  pool: test
 state: ONLINE
status: One or more devices could not be used because the label is missing or
	invalid.  Sufficient replicas exist for the pool to continue
	functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 16 12:36:38 2016
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    nbd0    UNAVAIL      0     0 1.40K  corrupted data
	    nbd1    ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    nbd2    ONLINE       0     0     0
	    nbd3    ONLINE       0     0     0

errors: No known data errors

The CKSUM column hasn’t changed – but /dev/nbd0 now shows as UNAVAIL, with the corrupted data flag, because we blew through all the metadata including all copies of the disk label itself. We still have no data errors on the pool itself, but mirror-0, and the pool itself, are operating degraded now. Which we’ll want to fix.

Interestingly, we also just found a bug in ZFS – the pool itself as well as the mirror-0 vdev, should be showing DEGRADED in the STATE column, not ONLINE! I’m gonna go report that, be back in a minute…

Replacing a failed disk (Rise chicken. Chicken, rise.)

OK, now that we’ve thoroughly failed a disk, we can replace it. We happen to know – because we’re evil bastards who did it ourselves – that the actual disk itself is fine on the hardware level, we just basically degaussed it. So we could just replace it in the array in situ. That’s not usually going to be best practice with real hardware, though, so let’s more thoroughly simulate actually removing and replacing the disk with a new one.

First, the removal:

root@banshee:~# qemu-nbd -d /dev/nbd0
/dev/nbd0 disconnected

Simple enough. ZFS doesn’t really know the disk is gone yet, but we can force it to figure out by scrubbing again before checking the status:

root@banshee:~# zpool scrub test
root@banshee:~# zpool status test
  pool: test
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
	invalid.  Sufficient replicas exist for the pool to continue
	functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 16 12:42:35 2016
config:

	NAME        STATE     READ WRITE CKSUM
	test        DEGRADED     0     0     0
	  mirror-0  DEGRADED     0     0     0
	    nbd0    UNAVAIL      0     0 1.40K  corrupted data
	    nbd1    ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    nbd2    ONLINE       0     0     0
	    nbd3    ONLINE       0     0     0

errors: No known data errors

Interestingly, now that we’ve actually removed /dev/nbd0 in a hardware sense, our pool and vdev show – correctly – as DEGRADED, in the actual STATUS column as well as in the status message.

Regardless, now that we’ve “pulled” the disk, let’s “replace” it.

root@banshee:~# rm /tmp/0.qcow2
root@banshee:~# qemu-img create -f qcow2 /tmp/0.qcow2 1G
Formatting '/tmp/0.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off 
root@banshee:~# qemu-nbd -c /dev/nbd0 /tmp/0.qcow2

Easy enough – now, we’re in exactly the same boat we’d be in if this was a real machine and we’d physically removed and replaced the offending drive, but done nothing else. ZFS, of course, is still going to show degraded status – we need to give the new drive to the pool, as well as physically plugging it in. So let’s do that:

root@banshee:~# zpool replace test /dev/nbd0

Is it really that easy? Let’s check the status and find out:

root@banshee:~# zpool status test
  pool: test
 state: ONLINE
  scan: resilvered 206M in 0h0m with 0 errors on Mon May 16 12:47:20 2016
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    nbd0    ONLINE       0     0     0
	    nbd1    ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    nbd2    ONLINE       0     0     0
	    nbd3    ONLINE       0     0     0

errors: No known data errors

Yep, it really is that easy.

Party’s over: clean up before you go home

Now that you’ve successfully tested what you set out to test, the last step is to clean up your virtual mess. In order, you need to destroy the test pool, disconnect the virtual devices, and delete the back-end storage files they referenced.

root@banshee:~# zpool destroy test
root@banshee:~# qemu-nbd -d /dev/nbd0 ; qemu-nbd -d /dev/nbd1 ; qemu-nbd -d /dev/nbd2 ; qemu-nbd -d /dev/nbd3
/dev/nbd0 disconnected
/dev/nbd1 disconnected
/dev/nbd2 disconnected
/dev/nbd3 disconnected
root@banshee:~# rm /tmp/0.qcow2 ; rm /tmp/1.qcow2 ; rm /tmp/2.qcow2 ; rm /tmp/3.qcow2

All done – clean as a whistle, and ready to set up the next simulation!

If you learned how to fail out a disk, awesome, but…

If you ended up here trying to figure out how to deal with and replace a failed disk, cool, and I hope you got what you were looking for. But please, remember the testing steps we did for the environment – they’re what this post is actually about. Learning how to set up your own virtual test environment will make you a much, much better admin down the line – and make you as cool as an action hero that one fateful day when it’s a real disk that’s faulted out of your real pool, and your data’s on the line. Nothing gives you confidence and takes away the stress like plenty of experience having done the exact same thing before.

Testing copies=n resiliency

I decided to see how well ZFS copies=n would stand up to on-disk corruption today. Spoiler alert: not great.

First step, I created a 1GB virtual disk, made a zpool out of it with 8K blocks, and set copies=2.

me@locutus:~$ sudo qemu-img create -f qcow2 /data/test/copies/0.qcow2 1G
me@locutus:~$ sudo qemu-nbd -c /dev/nbd0 /data/test/copies/0.qcow2 1G
me@locutus:~$ sudo zpool create -oashift=12 test /data/test/copies/0.qcow2
me@locutus:~$ sudo zfs set copies=2 test

Now, I wrote 400 1MB files to it – just enough to make the pool roughly 85% full, including the overhead due to copies=2.

me@locutus:~$ cat /tmp/makefiles.pl
#!/usr/bin/perl

for ($x=0; $x<400 ; $x++) {
	print "dd if=/dev/zero bs=1M count=1 of=$x\n";
	print `dd if=/dev/zero bs=1M count=1 of=$x`;
}

With the files written, I backed up my virtual filesystem, fully populated, so I can repeat the experiment later.

me@locutus:~$ sudo zpool export test
me@locutus:~$ sudo cp /data/test/copies/0.qcow2 /data/test/copies/0.qcow2.bak
me@locutus:~$ sudo zpool import test

Now, I write corrupt blocks to 10% of the filesystem. (Roughly: it's possible that the same block was overwritten with a garbage block more than once.) Note that I used a specific seed, so that I can recreate the scenario exactly, for more runs later.

me@locutus:~$ cat /tmp/corruptor.pl
#!/usr/bin/perl

# total number of blocks in the test filesystem
$numblocks=131072;

# desired percentage corrupt blocks
$percentcorrupt=.1;

# so we write this many corrupt blocks
$corruptloop=$numblocks*$percentcorrupt;

# consistent results for testing
srand 32767;

# generate 8K of garbage data
for ($x=0; $x<8*1024; $x++) {
	$garbage .= chr(int(rand(256)));
}

open FH, "> /dev/nbd0";

for ($x=0; $x<$corruptloop; $x++) {
	$blocknum = int(rand($numblocks-100));
	print "Writing garbage data to block $blocknum\n";
	seek FH, ($blocknum*8*1024), 0;
	print FH $garbage;
}

close FH;

Okay. When I scrub the filesystem I just wrote those 10,000 or so corrupt blocks to, what happens?

me@locutus:~$ sudo zpool scrub test ; sudo zpool status test
  pool: test
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 133M in 0h1m with 1989 errors on Mon May  9 15:56:11 2016
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0 1.94K
	  nbd0      ONLINE       0     0 8.94K

errors: 1989 data errors, use '-v' for a list
me@locutus:~$ sudo zpool status -v test | grep /test/ | wc -l
385

OUCH. 385 of my 400 total files were still corrupt after the scrub! Copies=2 didn't do a great job for me here. 🙁

What if I try it again, this time just writing garbage to 1% of the blocks on disk, not 10%? First, let's restore that backup I cleverly made:

root@locutus:/data/test/copies# zpool export test
root@locutus:/data/test/copies# qemu-nbd -d /dev/nbd0
/dev/nbd0 disconnected
root@locutus:/data/test/copies# pv < 0.qcow2.bak > 0.qcow2
 999MB 0:00:00 [1.63GB/s] [==================================>] 100%            
root@locutus:/data/test/copies# qemu-nbd -c /dev/nbd0 /data/test/copies/0.qcow2
root@locutus:/data/test/copies# zpool import test
root@locutus:/data/test/copies# zpool status test | tail -n 5
	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0     0
	  nbd0      ONLINE       0     0     0

errors: No known data errors

Alright, now let's change $percentcorrupt from 0.1 to 0.01, and try again. How'd we do after only corrupting 1% of the blocks on disk?

root@locutus:/data/test/copies# zpool status test
  pool: test
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 101M in 0h0m with 72 errors on Mon May  9 16:13:49 2016
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0    72
	  nbd0      ONLINE       0     0 1.08K

errors: 64 data errors, use '-v' for a list
root@locutus:/data/test/copies# zpool status test -v | grep /test/ | wc -l
64

Still not great. We lost 64 out of our 400 files. Tenth of a percent?

root@locutus:/data/test/copies# zpool status -v test
  pool: test
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 12.1M in 0h0m with 2 errors on Mon May  9 16:22:30 2016
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0     2
	  nbd0      ONLINE       0     0   105

errors: Permanent errors have been detected in the following files:

        /test/300
        /test/371

Damn. We still lost two files, even with only writing 130 or so corrupt blocks. (The missing 26 corrupt blocks weren't picked up by the scrub because they happened in the 15% or so of unused space on the pool, presumably: a scrub won't check unused blocks.) OK, what if we try a control - how about we corrupt the same tenth of a percent of the filesystem (105 blocks or so), this time without copies=2 set? To make it fair, I wrote 800 1MB files to the same filesystem this time using the default copies=1 - this is more files, but it's the same percentage of the filesystem full. (Interestingly, this went a LOT faster. Perceptibly, more than twice as fast, I think, although I didn't actually time it.)

Now with our still 84% full /test zpool, but this time with copies=1, I corrupted the same 0.1% of the total block count.

root@locutus:/data/test/copies# zpool status test
  pool: test
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 8K in 0h0m with 98 errors on Mon May  9 16:28:26 2016
config:

	NAME                         STATE     READ WRITE CKSUM
	test                         ONLINE       0     0    98
	  /data/test/copies/0.qcow2  ONLINE       0     0   198

errors: 93 data errors, use '-v' for a list

Without copies=2 set, we lost 93 files instead of 2. So copies=n was definitely better than nothing for our test of writing 0.1% of the filesystem as bad blocks... but it wasn't fabulous, either, and it fell flat on its face with 1% or 10% of the filesystem corrupted. By comparison, a truly redundant pool - one with two disks in it, in a mirror vdev - would have survived the same test (corrupting ANY number of blocks on a single disk) with flying colors.

The TL;DR here is that copies=n is better than nothing... but not by a long shot, and you do give up a lot of performance for it. Conclusion: play with it if you like, but limit it only to extremely important data, and don't make the mistake of thinking it's any substitute for device redundancy, much less backups.

zfs: copies=n is not a substitute for device redundancy!

I’ve been seeing a lot of misinformation flying around the web lately about the zfs dataset-level feature copies=n. To be clear, dangerous misinformation. So dangerous, I’m going to go ahead and give you the punchline in the title of this post and in its first paragraph: copies=n does not give you device fault tolerance!

Why does copies=n actually exist then? Well, it’s a sort of (extremely) poor cousin that helps give you a better chance of surviving data corruption. Let’s say you have a laptop, you’ve set copies=3 on some extremely critical work-related datasets, and the drive goes absolutely bonkers and starts throwing tens of thousands of checksum errors. Since there’s only one disk in the laptop, ZFS can’t correct the checksum errors, only detect them… except on that critical dataset, maybe and hopefully, because each block has multiple copies. So if a given block has been written three times and any single copy of that block reads so as to match its validation hash, that block will get served up to you intact.

So far, so good. The problem is that I am seeing people advocating scenarios like “oh, I’ll just add five disks as single-disk vdevs to a pool, then make sure to set copies=2, and that way even if I lose a disk I still have all the data.” No, no, and no. But don’t take my word for it: let’s demonstrate.

First, let’s set up a test pool using virtual disks.

root@banshee:/tmp# qemu-img create -f qcow2 0.qcow2 10G ; qemu-img create -f qcow2 1.qcow2 10G
Formatting '0.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off 
Formatting '1.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off 
root@banshee:/tmp# qemu-nbd -c /dev/nbd0 /tmp/0.qcow2 ; qemu-nbd -c /dev/nbd1 /tmp/1.qcow2
root@banshee:/tmp# zpool create test /dev/nbd0 /dev/nbd1
root@banshee:/tmp# zpool status test
  pool: test
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    test        ONLINE       0     0     0
      nbd0      ONLINE       0     0     0
      nbd1      ONLINE       0     0     0

errors: No known data errors

Now let’s set copies=2, and then create a couple of files in our pool.

root@banshee:/tmp# zfs set copies=2 test
root@banshee:/tmp# dd if=/dev/urandom bs=4M count=1 of=/test/test1
1+0 records in
1+0 records out
4194304 bytes (4.2 MB) copied, 0.310805 s, 13.5 MB/s
root@banshee:/tmp# dd if=/dev/urandom bs=4M count=1 of=/test/test2
1+0 records in
1+0 records out
4194304 bytes (4.2 MB) copied, 0.285544 s, 14.7 MB/s

Let’s confirm that copies=2 is working.

We should see about 8MB of data on each of our virtual disks – one for each copy of each of our 4MB test files.

root@banshee:/tmp# ls -lh *.qcow2
-rw-r--r-- 1 root root 9.7M May  2 13:56 0.qcow2
-rw-r--r-- 1 root root 9.8M May  2 13:56 1.qcow2

Yep, we’re good – we’ve written a copy of each of our two 4MB files to each virtual disk.

Now fail out a disk:

root@banshee:/tmp# zpool export test
root@banshee:/tmp# qemu-nbd -d /dev/nbd1
/dev/nbd1 disconnected

Will a pool with copies=2 and one missing disk import?

root@banshee:/tmp# zpool import
   pool: test
     id: 15144803977964153230
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
    devices and try again.
   see: http://zfsonlinux.org/msg/ZFS-8000-6X
 config:

    test         UNAVAIL  missing device
      nbd0       ONLINE

    Additional devices are known to be part of this pool, though their
    exact configuration cannot be determined.

That’s a resounding “no”.

Can we force an import?

root@banshee:/tmp# zpool import -f test
cannot import 'test': one or more devices is currently unavailable

No – your data is gone.

Please let this be a lesson: no, copies=n is not a substitute for redundancy or parity, and yes, losing any vdev does lose the pool.

FIO cheat sheet

With any luck I’ll turn this into a real blog post soon, but for the moment: a cheat sheet for simple usage of fio to benchmark storage. This command will run 16 simultaneous 4k random writers in sync mode. It’s a big enough config to push through the ZIL on most zpools and actually do some testing of the real hardware underneath the cache.

fio --name=random-writers --ioengine=sync --iodepth=4 --rw=randwrite --bs=4k --direct=0 --size=256m --numjobs=16 --end_fsync=1

For reference, a Sanoid Standard host with two 1TB solid state pro mirror vdevs gets 429MB/sec throughput with 16 4k random writers in sync mode:

Run status group 0 (all jobs):
  WRITE: io=4096.0MB, aggrb=426293KB/s, minb=26643KB/s, maxb=28886KB/s, mint=9075msec, maxt=9839msec

Awww, yeah.

378MB/sec for a single 4k random writer, so don’t think you have to have a ridiculous queue depth to see outstanding throughput, either:

Run status group 0 (all jobs):
  WRITE: io=4096.0MB, aggrb=378444KB/s, minb=378444KB/s, maxb=378444KB/s, mint=11083msec, maxt=11083msec

That’s a spicy meatball.

A note from the future: to do a 50% mix of reads and writes:

fio --name=random-readwrite --ioengine=sync --iodepth=4 --rw=randrw --bs=4k --direct=0 --size=256m --numjobs=16 --end_fsync=1

Dual-NIC fanless Celeron 1037u router test – promising!

fanless_celeron_1037u_box_routingFinally found the time to set up my little fanless Celeron 1037u router project today. So far, it’s very promising!

I installed Ubuntu Server on an elderly 4GB SD card I had lying around, with no problems other than the SD card being slow as molasses – which is no fault of the Alibaba machine, of course. Booted from it just fine. I plan on using this little critter at home and don’t want to deal with glacial I/O, though, so the next step was to reinstall Ubuntu Server on a 60GB Kingston SSD, which also had no problems.

With Ubuntu Server (14.04.3 LTS) installed, the next step was getting a basic router-with-NAT iptables config going. I used MASQUERADE so that the LAN side would have NAT, and I went ahead and set up a couple of basic service rules – including a pinhole for forwarding iperf from the WAN side to a client machine on the LAN side – and saved them in /etc/network/iptables, suitable for being restored using /sbin/iptables-restore (ruleset at the end of this post).

Once that was done and I’d gotten dhcpd serving IP addresses on the LAN side, I was ready to plug up the laptop and go! The results were very, very nice:

root@demoserver:~# iperf -c springbok
------------------------------------------------------------
Client connecting to 192.168.0.125, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local demoserver port 48808 connected with springbok port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.09 GBytes   935 Mbits/sec
You have new mail in /var/mail/root
root@demoserver:~# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local demoserver port 5001 connected with springbok port 40378
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  1.10 GBytes   939 Mbits/sec

935mbps up and down… not too freakin’ shabby for a lil’ completely fanless Celeron. What about OpenVPN, with 2048-bit SSL?

------------------------------------------------------------
Client connecting to 10.8.0.38, TCP port 5001
TCP window size: 22.6 KByte (default)
------------------------------------------------------------
[  3] local 10.8.0.1 port 45727 connected with 10.8.0.38 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-11.6 sec   364 MBytes   264 Mbits/sec 

264mbps? Yeah, that’ll do.

To be fair, though, LZO compression is enabled in my OpenVPN setup, which is undoubtedly improving our iperf run. So let’s be fair, and try a slightly more “real-world” test using ssh to bring in a hefty chunk of incompressible pseudorandom data, instead:

root@router:/etc/openvpn# ssh -c arcfour jrs@10.8.0.1 'cat /tmp/test.bin' | pv > /dev/null
 333MB 0:00:17 [19.5MB/s] [                         <=>                                  ]

Still rockin’ a solid 156mbps, over OpenVPN, after SSH overhead, using incompressible data. Niiiiiiice.

For posterity’s sake, here is the iptables ruleset I’m using for testing on the little Celeron.

*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]

# p4p1 is WAN interface
-A POSTROUTING -o p4p1 -j MASQUERADE

# NAT pinhole: iperf from WAN to LAN
-A PREROUTING -p tcp -m tcp -i p4p1 --dport 5001 -j DNAT --to-destination 192.168.100.101:5001

COMMIT

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:LOGDROP - [0:0]

# create LOGDROP target to log and drop packets
-A LOGDROP -j LOG
-A LOGDROP -j DROP

##### basic global accept rules - ICMP, loopback, traceroute, established all accepted
-A INPUT -s 127.0.0.0/8 -d 127.0.0.0/8 -i lo -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -m state --state ESTABLISHED -j ACCEPT

# enable traceroute rejections to get sent out
-A INPUT -p udp -m udp --dport 33434:33523 -j REJECT --reject-with icmp-port-unreachable

##### Service rules
#
# OpenVPN
-A INPUT -p udp -m udp --dport 1194 -j ACCEPT

# ssh - drop any IP that tries more than 10 connections per minute
-A INPUT -i eth0 -p tcp -m tcp --dport 22 -m state --state NEW -m recent --set --name DEFAULT --mask 255.255.255.255 --rsource
-A INPUT -i eth0 -p tcp -m tcp --dport 22 -m state --state NEW -m recent --update --seconds 60 --hitcount 11 --name DEFAULT --mask 255.255.255.255 --rsource -j LOGDROP
-A INPUT -p tcp -m tcp --dport 22 -j ACCEPT

# www
-A INPUT -p tcp -m tcp --dport 80 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 443 -j ACCEPT

# default drop because I'm awesome
-A INPUT -j DROP

##### forwarding ruleset
#
# forward packets along established/related connections
-A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

# forward from LAN (p1p1) to WAN (p4p1)
-A FORWARD -i p1p1 -o p4p1 -j ACCEPT

# NAT pinhole: iperf from WAN to LAN
-A FORWARD -p tcp -d 192.168.100.101 --dport 5001 -j ACCEPT

# drop all other forwarded traffic
-A FORWARD -j DROP

COMMIT

Emoji on Ubuntu Trusty

OK, so this is maybe kinda useless. But I wanted an emoji ONCE for a presentation, and the ONCE I wanted it, the fool thing wouldn’t display on my presentation laptop and I had to scramble at the last minute to do something that wasn’t quite as entertaining. So here’s how you fix that problem:

you@box:~$ sudo apt-get update ; sudo apt-get install ttf-ancient-fonts unifont

Poof, you got emojis. In my case, the one I wanted was this:

?

Yes, I AM comfortable in my masculinity, why do you ask…?

Blindrename.pl – a tool to aid blinded analysis in a lab setting

I made a tiny contribution to science this morning – a friend in neuroscience lamented that she couldn’t find any tools to automate the process of renaming a set of images for blinded analysis, so I made one.

https://github.com/jimsalterjrs/blindanalysis

TL;DR on what it does: you feed it a folder full of files, and it renames them all to random names while preserving their original extensions (such as .tif, .lsm, .jpeg, etc). While doing so, it creates a keyfile.csv which ties the original filename to the new, randomized filename – so that you can open up keyfile.csv in Excel, LibreOffice Calc, etc after your blind analysis is done and associate your blind results with your original data.

It’s reasonably smart and cautious – it refuses to run as root, won’t mess with dotfiles or subdirectories, won’t traverse subdirectories, won’t let you accidentally randomize the same folder twice, and spits out human-readable errors if things go wrong.

This is what it looks like in operation:

me@banshee:~$ ls -l /tmp/test
total 24
-rw-rw-r-- 1 me me 2 Oct 14 13:44 1.tif
-rw-rw-r-- 1 me me 2 Oct 14 13:44 2.tif
-rw-rw-r-- 1 me me 2 Oct 14 13:44 3.tif
-rw-rw-r-- 1 me me 2 Oct 14 13:44 4.tif
-rw-rw-r-- 1 me me 2 Oct 14 13:44 5
drwxrwxr-x 2 me me 4096 Oct 14 12:56 subdir

me@banshee:~$ blindrename.pl /tmp/test
Renaming: 1.tif... 2.tif... 3.tif... 4.tif... 5...
5 files successfully blind renamed; keyfile saved to /tmp/test/keyfile.csv.

me@banshee:~$ ls -l /tmp/test
total 28
-rw-rw-r-- 1 me me 2 Oct 14 13:44 B4LOz.tif
-rw-rw-r-- 1 me me 2 Oct 14 13:44 Ek76e.tif
-rw-rw-r-- 1 me me 2 Oct 14 13:44 kdVFM.tif
-rw-rw-r-- 1 me me 131 Oct 14 14:02 keyfile.csv
-rw-rw-r-- 1 me me 2 Oct 14 13:44 Oklr1
drwxrwxr-x 2 me me 4096 Oct 14 12:56 subdir
-rw-rw-r-- 1 me me 2 Oct 14 13:44 wsy7e.tif

me@banshee:~$ cat /tmp/test/keyfile.csv
"Original Filename","Cloaked Filename"
"1.tif","kdVFM.tif"
"2.tif","Ek76e.tif"
"3.tif","B4LOz.tif"
"4.tif","wsy7e.tif"
"5","Oklr1"

There are no dependencies other than Perl itself, and the script is licensed GPLv3 – free for all to use, as in beer and as in speech. I hope this helps somebody (else); this task has got to come up frequently enough in all sorts of labwork that a free tool should be easy to find!

Future science workers: if this helped you and you’re feeling grateful, the EFF can always use a donation, whether large, small, or micro. =)