When a Terabyte is not a Terabyte
It seems like a stupid question, if you’re not an IT professional – and maybe even if you are – how much storage does it take to store 1TB of data? Unfortunately, it’s not a stupid question in the vein of “what weighs more, a pound of feathers or a pound of bricks”, and the answer isn’t “one terabyte” either. I’m going to try to break down all the various things that make the answer harder – and unhappier – in easy steps. Not everybody will need all of these things, so I’ll try to lay it out in a reasonably likely order from “affects everybody” to “only affects mission-critical business data with real RTO and RPO defined”.
Counting the Costs
Simple Local Storage
Computer TB vs Manufacturer TB
To your computer, and to all computers since the dawn of computing, a KB is actually a “kibibyte”, a megabyte a “mebibyte”, and so forth – they’re powers of two, not of ten. So 1 KiB = 2^10 = 1024. That’s an extra 24 bytes from a proper Kilobyte, which is 10^3 = 1000. No big deal, right? Well, the difference squares itself with each hop up from KB to MB to GB to TB, and gets that much more significant. Storage manufacturers prefer – and also have, since the dawn of time – to measure in those proper power-of-ten units, since that means they get to put bigger numbers on a device of a given actual size and thus try to trick you into thinking it’s somehow better.
At the Terabyte/Tebibyte level, you’re talking about the difference between 2^40 and 10^12. So 1 TiB, as your computer measures data, is 1.0995 TB as the rat bastards who sell hard drives measure storage. Let’s just go ahead and round that up to a nice easy 1.1.
TL;DR: multiply times 1.1 to account for vendor units.
Working Free Space
Remember those sliding number puzzles you had as a kid, where the digits 1-8 were embedded in a 9-square grid, and you were supposed to slide them around one at a time until you got them in order? Without the “9” missing, you wouldn’t be able to slide them. That’s a pretty decent rough analogy of how storage generally works, for all sorts of general reasons. If you don’t have any free space, you can’t move the tiles around and actually get anything done. For our sliding number puzzles when we were kids, that was 8/9 of the available storage occupied. A better rule of thumb for us is 8/10, or 80%. Once your disk(s) are 80% full, you should consider them full, and you should immediately be either deleting things or upgrading. If they hit 90% full, you should consider your own personal pants to be on actual fire, and react with an appropriate amount of immediacy to remedy that.
TL;DR: multiply times 1.25 to account for working free space.
You’re probably not really planning on just storing one chunk of data you have right now and never changing it. You’re almost certainly talking about curating an ever-growing collection of data that changes and accumulates as time goes on. Most people and businesses should plan on their data storage needs to double about every five years – that’s pretty conservative; it can easily get worse than that. Still, five years is also a pretty decent – and very conservative, not aggressive – hardware refresh cycle. So let’s say we want to plan for our storage needs to be fulfilled by what we buy now, until we need new everything anyway. That means doubling everything so you don’t have to upgrade for another few years.
TL;DR: multiply times 2.0 to account for data growth over the next few years.
What, you weren’t planning on not backing your stuff up, were you? At a bare minimum, you’re going to need as much storage for backup as you did for production – most likely, you’ll need considerably more. We’ll be super super generous here and assume all you need is enough space for one single full backup – which usually only applies if you also have redundancy and very heavy-duty “oops recovery” and maybe hotspares as well. But if you don’t have all those things… this really isn’t enough. Really.
TL;DR: multiply times 2.0 to account for one full backup, as disaster recovery.
Redundancy, Hotspares, and Snapshots
Snapshots / “Oops Recovery” Schemes
You want to have a way to fix it pretty much immediately if you accidentally break a document. What this scheme looks like may differ depending on the sophistication of the system you’re working on. At best, you’re talking something like ZFS snapshots. In the middle of the road, Windows’ Volume Shadow Copy service (what powers the “Previous Versions” tab in Windows Explorer). At worst, the Recycle Bin. (And that’s really not good enough and you should figure out a way to do better.) What these things all have in common is that they offer a limiting factor to how badly you can screw yourself with the stroke of a key – you can “undo” whatever it is you broke to a relatively recent version that wasn’t broken in just a few clicks.
Different “oops recovery” schemes have different levels of efficiency, and different amounts of point-in-time granularity. My own ZFS-based systems maintain 30 hourly snapshots, 30 daily snapshots, and 3 monthly snapshots. I generally plan for snapshot space to take up about 33% as much space as my production storage, and that’s not a bad rule of thumb across the board, even if you can’t cram as many of your own schemes level of “oops points” in the same amount of space.
TL;DR: multiply times 1.3 to account for snapshots, VSS, or other “oops recovery”.
Redundancy – in the form of mirrored drives, striped RAID arrays, and so forth – is not a backup! However, it is a very, very useful thing to help you avoid the downtime monster, and in the case of more advanced storage schema like ZFS, to avoid corruption and bitrot. If you’re using 1:1 redundancy – RAID1, RAID10, ZFS mirrors, or btrfs-RAID1 distributed redundancy – this means you need two of every drive. If you’re using two blocks of parity in each eight block stripe (think RAID6 or ZFS RAIDZ2 with eight drives in each vdev), you’re going to be looking at 75% theoretical efficiency that comes out to more like 70% actual efficiency after stripe overhead. I’m just going to go ahead and say “let’s calculate using the more pessimistic number”. So, double everything to account for redundancy.
TL;DR: multiply times 2.0 to account for redundant storage scheme.
This is probably going to be the least common item on the list, but the vast majority of my clients have opted for it at this point. A hotspare server is ready to take over for the production server at a moment’s notice, without an actual “restore the backup” type procedure. With Sanoid, this most frequently means hourly replication from production to hotspare, with the ability to spin up the replicated VMs – both storage and hypervisor – directly on the hotspare server. The hotspare is thus promoted to being production, and what was the production server can be repaired with reduced time pressure and restored into service as a hotspare itself.
If you have a hotspare – and if, say, ten or more people’s payroll and productivity is dependent on your systems being up and running, you probably should – that’s another full redundancy to add to the bill.
TL;DR: bump your “backup” allowance up from x2.0 to x3.0 if you also use hotspare hardware.
The Butcher’s Bill
If you have, and account for, everything we went through above, to store 1 “terabyte” of data you’ll need:
1 “terabyte” (really a tebibyte) of data
x 1.1 TiB per TB
x 1.25 for working free space
x 2.0 for planned growth over the next few years
x 3.0 for disaster recovery + hotspare systems
x 1.3 for snapshot or other “oops recovery”
x 2.0 for redundancy
21.45 TB of actual storage hardware.
That can’t be right! You’re insane!
Alright, let’s break that down somewhat differently, then. Keep in mind that we’re talking about three separate computer systems in the above example, each with its own storage (production, hotspare, and disaster recovery). Now let’s instead assume that we’re talking about using drives of a given size, and see what that breaks down to in terms of actual usable storage on them.
Let’s forget about the hotspare and the disaster recovery boxes, so we’re looking at the purely local level now. Then let’s toss out the redundancy, since we’re only talking about one individual drive. That leaves us with 1TB / 1.1 TiB per TB / 1.25 working TiB per stored TiB / 1.3 TiB of prod+snapshots for every TiB of prod = 0.559 TiB of usable capacity per 1TB drive. Factor in planned growth by cutting that in half, and that means you shouldn’t be planning to start out storing more than 0.28 TiB of data on 1TB of storage.
TL;DR: If you have 280GiB of existing data, you need 1TB of local capacity.
That probably sounds more reasonable in terms of your “gut feel”, right? You have 280GiB of data, so you buy a 1TB disk, and that’ll give you some breathing room for a few years? Maybe you think it feels a bit aggressive (it isn’t), but it should at least be within the ballpark of how you’re used to thinking and feeling.
Now multiply by 2 for storage redundancy (mirrored disks), and by 3 for site/server redundancy (production, hotspare, and DR) and you’re at six 1TB disks total, to store 280GiB of data. 6/.28 = 21.43, and we’re right back where we started from, less a couple of rounding errors: we need to provision 21.45 TB for every 1TiB of data we’ve got right now.
8:1 rule of thumb
Based on the same calculations and with a healthy dose of rounding, we come up with another really handy, useful, memorable rule of thumb: when buying, you need eight times as much raw storage in production as the amount of data you have now.
So if you’ve got 1TiB of data, buy servers with 8TB of disks – whether it’s two 4TB disks in a single mirror, or four 2TB disks in two mirrors, or whatever, your rule of thumb is 8:1. Per system, so if you maintain hotspare and DR systems, you’ll need to do that twice more – but it’s still 8:1 in raw storage per machine.