Rebalancing data on ZFS mirrors

One of the questions that comes up time and time again about ZFS is “how can I migrate my data to a pool on a few of my disks, then add the rest of the disks afterward?”

If you just want to get the data moved and don’t care about balance, you can just copy the data over, then add the new disks and be done with it. But, it won’t be distributed evenly over the vdevs in your pool.

Don’t fret, though, it’s actually pretty easy to rebalance mirrors. In the following example, we’ll assume you’ve got four disks in a RAID array on an old machine, and two disks available to copy the data to in the short term.

Step one: create the new pool, copy data to it

First up, we create a simple temporary zpool with the two available disks.

zpool create temp -oashift=12 mirror /dev/disk/by-id/wwn-disk0 /dev/disk/by-id/disk1

Simple. Now you’ve got a ZFS mirror named temp, and you can start copying your data to it.

Step two: scrub the pool

Do not skip this step!

zpool scrub temp

Once this is done, do a zpool status temp to make sure you don’t have any errors. Assuming you don’t, you’re ready to proceed.

Step three: break the mirror, create a new pool

zpool detach temp /dev/disk/by-id/disk1

Now, your temp pool is down to one single disk vdev, and you’ve freed up one of its original disks. You’ve also got a known good copy of all your data on disk0, and you’ve verified it’s all good by using a zpool scrub command in step two. So, destroy the old machine’s storage, freeing up its four disks for use. 

zpool create tank /dev/disk/by-id/disk1 mirror /dev/disk/by-id/disk2 /dev/disk/by-id/disk3 mirror /dev/disk/by-id/disk4 /dev/disk/by-id/disk5

Now you’ve got your original temporary pool named temp, and a new permanent pool named tank. Pool “temp” is down to one single-disk vdev, and pool “tank” has one single-disk vdev, and two mirror vdevs.

Step four: copy your data from temp to tank

Copy all your data one more time, from the single-disk pool “temp” to the new pool “tank.” You can use zfs replication for this, or just plain old cp or rsync. Your choice.

Step five: scrub tank, destroy temp

Do not skip this step.

zpool scrub tank

Once this is done, do a zpool status tank to make sure you don’t have any errors. Assuming you don’t, now it’s time to destroy your temporary pool to free up its disk.

zpool destroy temp

Almost done!

Step six: attach the final disk from temp to the single-disk vdev in tank

zpool attach tank /dev/disk/by-id/disk0 /dev/disk/by-id/disk1

That’s it—you now have all of your data imported to a six-disk pool of mirrors, and all of the data is evenly distributed (according to disk size, at least) across all vdevs, not all clumped up on the first one to be added.

You can obviously adjust this formula for larger (or smaller!) pools, and it doesn’t necessarily require importing from an older machine—you can use this basic technique to redistribute data across an existing pool of mirrors, too, if you add a new mirror vdev. 

The important concept here is the idea of breaking mirror vdevs using zpool detach, and creating mirror vdevs from single-disk vdevs using zpool attach.

 

Published by

Jim Salter

Mercenary sysadmin, open source advocate, and frotzer of the jim-jam.

7 thoughts on “Rebalancing data on ZFS mirrors”

  1. First: It seems to me like you didn’t need to make the temporary zpool a mirror, a single vdev=single disk should have worked just as well… Once you detached the second disk from the temporary pool, and attached it to the “actual” pool, any data redundancy was lost anyway (you only had the redundancy for the short time it took to copy the data from the original pool, but since you scrubbed before destroying the original pool, it wasn’t needed)… Or am I missing something?

    Second: Small typo near the first “/dev/….disk0” path.

    Third (off-topic): Is there any usecase where it would make sense to NOT have an entire disk as a vdev? For examle /dev/sda2 (instead of /dev/sda)?
    I could imagine using the first partition as a NTFS/FAT32/ext4/whatever backup area, with the second partition belonging to ZFS, but I don’t know if this is even possible to configure this under ZFS. (Assume no-one accesses both partitions at once, the performance hit would obviously be bad otherwise.)

  2. Jarek—we didn’t use zpool split because the whole point is creating a brand new pool. Zpool split is for maintaining a copy of the current pool on the detached disks. That doesn’t help when your whole goal is to write everything over in a balanced way.

  3. Niklas—we started out with a mirror vdev for the temp pool because there’s no good reason not to. It doesn’t slow anything down, and it gives us an extra chance to recover if a bit got flipped in-flight somewhere or one of the disks turns out to be dodgy.

    There’s no reason to risk having to start entirely from scratch due to a dodgy disk when you could have let ZFS instead capture and deal with the errors intelligently.

    Also, it’s handy to write it this way, even though we’re starting from conventional RAID here, because it means this is the same set of steps for rebalancing an EXISTING pool of mirrors:

    1. scrub old pool and check status
    2. break mirrors on old pool
    3. create new pool of single disk vdevs
    4. copy data from old pool to new
    5. scrub new pool and check status
    6. destroy old pool
    7. attach remaining disks from old pool to disks of new pool

  4. > Is there any usecase where it would make sense to NOT have an entire disk as a vdev? For examle /dev/sda2 (instead of /dev/sda)? I could imagine using the first partition as a NTFS/FAT32/ext4/whatever

    Very limited use-cases, but yes. It’s currently frequently not feasible or reliable enough to do ZFS on root on Linux, so you can do a disk layout as follows:

    partition 0 – 1G UEFI
    partition 1 – 60G mdraid1
    partition 2 – (remaining space) ZFS

    You set up an mdraid1 mirror on the first two disks (or on all disks), format it ext4 for / (preferably with the News setting, to give you more inodes available), then install your system. After it’s installed, do your ZFS setup using the last partition on each disk.

    I don’t recommend the “backup area” type idea you mentioned, but it certainly makes sense to reserve system partitions as outlined above, and in fact that’s my typical production build.

  5. Cool guide you have. Many people may not realize how permanent ZFS is supposed to be. It’s best to set it up right the first time. I prefer it though, even over the more flexible file systems due to its complexity, configurability and reliability. BTRFS is trying to best it but I still, even now, do not trust it.

Leave a Reply

Your email address will not be published. Required fields are marked *