A lot of people new to ZFS, and even a lot of people not-so-new to ZFS, like to wax ecstatic about ZVOLs. But they never seem to mention the very real pitfalls ZVOLs present.
What’s a ZVOL?
Well, if you know what LVM is, a ZVOL is like an LV, but for ZFS. If you don’t know what LVM is, you can think of a ZVOL as, basically, a dynamically allocated “raw partition” inside ZFS. Unlike a normal dataset, a ZVOL doesn’t have a filesystem of its own. And you can access it by a raw devicename, like /dev/zvol/poolname/zvolname. This looks ideal for those use-cases where you want to nest a legacy filesystem underneath ZFS – for example, virtual machine images. Once you have the ZVOL, you have a raw block storage device to interact with – think mkfs.ext4 /dev/zvol/poolname/zvolname, for example – but you still get all the normal ZFS features along with it, like data integrity, compression, snapshots, and so forth. Plus you don’t have to mess with a loopback device, so that should be higher performance, right? What’s not to love?
ZVOLs perform better, though, right?
AFAICT, the increased performance is pretty much a lie. I’ve benchmarked ZVOLs pretty extensively against raw disk partitions, raw LVs, raw files, and even .qcow2 files and there really isn’t much of a performance difference to be seen. A partially-allocated ZVOL isn’t going to perform any better than a partially-allocated .qcow2 file, and a fully-allocated ZVOL isn’t going to perform any better than a fully-allocated .qcow2 file. (Raw disk partitions or LVs don’t really get any significant boost, either.)
Let’s talk about snapshots.
If snapshots aren’t one of the biggest reasons you’re using ZFS, they should be, and ZVOLs and snapshots are really, really tricky and weird. If you have a dataset that’s occupying 85% of your pool, you can snapshot that dataset any time you like. If you have a ZVOL that’s occupying 85% of your pool, you cannot snapshot it, period. This is one of those things that both tyros and vets tend to immediately balk at – I must be misunderstanding something, right? Surely it doesn’t work that way? Afraid it does.
Ooh, is it demo-in-a-VM-time again?! =)
root@xenial:~# zfs create target/dataset -o compress=off -o quota=15G root@xenial:~# pv < /dev/zero > /target/dataset/0.bin 15GiB 0:01:13 [10.3MiB/s] [ <=> ] pv: write failed: Disk quota exceeded root@xenial:~# zfs list NAME USED AVAIL REFER MOUNTPOINT target 15.3G 3.93G 19K /target target/dataset 15.0G 0 15.0G /target/dataset root@xenial:~# zfs snapshot target/dataset@1 root@xenial:~#
Above, we created a dataset on a 20G pool, we dumped 15G of data into it, and we snapshotted the dataset. No surprises here, this is exactly what we expect.
But what happens when we try the same thing with a ZVOL?
root@xenial:~# zfs create target/zvol -V 15G -o compress=off root@xenial:~# pv < /dev/zero > /dev/zvol/target/zvol 15GiB 0:03:22 [57.3MiB/s] [========================================> ] 99% ETA 0:00:00 pv: write failed: No space left on device NAME USED AVAIL REFER MOUNTPOINT target 15.8G 3.46G 19K /target target/zvol 15.5G 3.90G 15.0G - root@xenial:~# zfs snapshot target/zvol@1 cannot create snapshot 'target/zvol@1': out of space
Despite having 3.9G free on our pool, we can’t snapshot the zvol. If you don’t have at least as much free space in a pool as the REFER of a ZVOL on that pool, you can’t snapshot the ZVOL, period. This means for our little baby demonstration here we’d need 15G free to snapshot our 15G ZVOL. In a real-world situation with VM images, this could easily be a case where you can’t snapshot your 15TB VM image without 15 terabytes of free space available – where if you’d stuck with standard datasets, you’d be able to snapshot that same 15TB VM even with just a few hundred megabytes of AVAIL at your disposal.
Think long and hard before you implement ZVOLs. Then, you know… don’t.
4 thoughts on “PSA: Snapshots are better than ZVOLs”
Thank you this blog post seems to explain why my snapshots are failing as explained on this forum: https://forums.freenas.org/index.php?threads/interpreting-free-space-outputs-to-fix-failing-snapshots.52740/#post-367001
Any idea why the pool needs as much free space as the zvol’s “Refer”? Would love to understand the technical reasons why.
Also, what is the alternative for ZFS and storage for VMWare virtual machines? Normal datastore with NFS instead of iSCSI?
You should be able to fix the out-of-space issue by setting refreservation property of the ZVOL to none, or something less than the ZVOL size. Snapshotting should then work without out-of-space errors.
Otherwise ZFS will try to guarantee that you can (over)write the volume all over again without running out of space, and this means doubling the reservation when you snapshot a fully written volume.
Hope this helps!
Is there no workaround? One reason I want to use zvols isn’t so much for snapshotting (Though it’d be handy.) but for quick throwaway block devices without having to junk up my directories.
“By default, when you create a zvol in ZFS, if you don’t specify ‘sparse’ (with -s on the command line option), ZFS will make it a ‘thick provisioned’ zvol. All that actually is is ZFS making a ‘refreservation’ equivalent to the ‘volsize’ you specified for the zvol. Create one some time, and take a look for yourself. The fact is, if you’re not planning on making snapshots, refreservations are a very sane way of not only guaranteeing space to your clients, but of easily keeping yourself from overprovisioning your pool. If you want to skip them, don’t set a refreservation — but be warned; if you do so, the onus is now on you as the administrator to keep a close eye on the pool utilization and take action before it gets too full. ”