Unstructured Data Tiering: The Step That Cuts Storage Cost

Blog 12 min read

"Move the cold stuff to a cheaper tier" reads like the most routine line item on the storage roadmap. It sounds like housekeeping, the kind of slide everyone nods through on the way to the interesting problems. That is exactly why it bites. The real question hiding under that bland phrase isn't where the bytes live; it's whether you actually know which bytes are cold, and what happens to your recovery posture when you guess wrong. The cheapest petabyte I ever bought cost us a four-hour outage, and the lifecycle rule that moved it was technically flawless.

Years ago, on an e-commerce platform I helped run, someone wrote a rule to sweep "old" objects to a colder class overnight. The rule was correct. The classification under it was not. A chunk of the data it moved was still being read by a nightly reconciliation job nobody had documented. The job started timing out against archive-latency objects, backed up the queue, and took payments reporting down with it. We didn't have a storage problem that night. We had a data-value problem, and we'd encoded our ignorance of it into an automated policy.

That memory is why I read Datadobi CRO Michael Jack's argument, in Chris Mellor's recent Blocks & Files piece, with both agreement and a raised eyebrow. Jack's thesis is sharp and correct: enterprises cannot keep buying capacity. Storage is a budget slice, and you cannot let that slice grow "like a tumour" until it starves everything else an organization exists to do. NAND flash prices have run up hard in 2026 under what the trade calls "Memflation," so the reflex of bolting on another five petabytes now lands a much bigger invoice than it did two years ago. None of that is in dispute.

What I want to push on is the implied happy ending: classify, tier, and the savings (and the safety) follow. They don't follow automatically. Tiering is leverage, and leverage cuts both ways. Done on bad metadata, it does exactly what it did to us at 2 a.m. It converts a quiet cost problem into a loud availability and recovery problem. Here is where I think the analysis is right, where it's optimistic, and the order of operations that keeps a tiering project from becoming an incident report.

The "expense problem, not a storage problem" framing is the most useful idea here

Jack's reframe is the part worth keeping. As he relays it, Jensen Huang's line that 90 percent of enterprise data is unstructured is usually quoted as a scale problem. Jack reads it as an economics problem instead: we don't have an unstructured data storage problem, we have an unstructured data expense problem. That is the right lens. The fix isn't a bigger reservoir. It's matching the cost of holding each byte to what that byte is worth.

The reservoir-and-rain analogy in the source captures the intuition and then, to the author's credit, breaks it on purpose: water is fungible and data is not. A drop of rainwater equals any other drop. A byte in a closed cleaning contract does not equal a byte in an active product launch. That non-fungibility is the entire game. It is also why "freeze the old rain into a digital north pole" is harder than it sounds, because deciding which rain is old is a content and access question. A date on a folder will not answer it.

In S3-compatible object storage terms, this is just lifecycle policy and storage classes doing their job: hot data on the performant tier, dormant data transitioned to cold, with the transition keyed to real access patterns. The mechanism is mature. The hard part was never the buckets. It's the truth you feed the rules.

Where I part ways: tiering is sold as cost savings, but cold data is a ransomware decision

Here is my disagreement with the optimistic reading. The source frames every extra petabyte as a cyber-threat-surface increase, which is true, and then treats tiering as the cure. Move dormant data to cheap object storage, shrink the attack surface, done. In my operational experience, moving data to a cold tier without deciding its protection model doesn't shrink your risk. It relocates it, and often hides it.

Cold, rarely-touched data is the most attractive ransomware target you own. Nobody is watching it. Restore tests rarely cover it. And if your cold tier is just the same data at a lower price, an attacker who reaches your control plane encrypts or deletes it alongside everything else. The lifecycle rule that saved you money did nothing for your blast radius.

The lever that actually changes this is immutability. The tier alone does nothing for it. Object Lock with a retention policy (write-once-read-many) on the cold tier means the archived copy cannot be altered or deleted for its retention window, by anyone, including a compromised admin account. That is the difference between a cheap copy and a recovery copy. So my position is blunt: a tiering plan that doesn't specify an immutability and recovery model for the cold tier is a cost optimization with a security label it didn't earn. Tier for cost. Lock for survival. Treat them as two separate decisions you make on purpose.

The failure modes nobody puts in the slide deck

Three things go wrong with automated tiering, and I've run into all three on live systems.

Recall latency surprises live workloads. The source notes critics worry about latency on data recalls, then waves it off with non-disruptive migration claims. Migration being non-disruptive is a different thing from access being non-disruptive afterward. A "dormant" object that turns out to feed a quarterly job will, on the day that job runs, hit cold-tier retrieval latency and possibly retrieval fees. That is precisely the class of failure that took down our payments reporting.

Egress and per-operation costs quietly eat the savings. The headline win is per-GB price. The bill you actually get includes retrieval requests and egress when that cold data moves or is read across regions or out to a cloud. I've watched "70% cheaper" storage post a net-flat invoice because the access pattern was wrong for cold class. Model the egress before you celebrate the per-GB number.

Aggressive policies break things that assume local, low-latency access. Legacy apps that mmap a file or expect millisecond reads do not care that your tiering was elegant. Validate every policy against a shadow run before you let it delete or transition anything automatically. I won't enable auto-deletion on a policy that hasn't survived a dry run against real access logs. That is a hard rule on my teams.

The tooling described in the source is real and capable: Datadobi's StorageMAP uses a metadata scanning engine to profile access patterns and can convert SMB and NFS files into S3 objects, deploying as a Linux VM rather than another appliance, with documented migrations in the hundreds of terabytes. That's a legitimate, vendor-neutral way to discover what you have. It tells you what's cold. It does not decide your retention or recovery posture for you. That call stays with you.

How I qualify a data class before a single rule goes live

Before any lifecycle rule ships, I make the protection decision explicit so the cheapest tier never wins by default. The way I do that is to walk each data class through a small grid and refuse to tier anything until its row is filled out with real answers. The grid below is the one I actually use.

Data classAccess patternTierProtection model
Active transactional / projectRead-write, dailyHot object or blockVersioning + replication
Compliance / legal holdRare reads, must surviveCold objectObject Lock (WORM), retention = hold period
Backup / DR copiesWrite often, read on disasterCold object, separate accountObject Lock + isolated credentials
Genuinely dormant, low-valueNear-zero accessCheapest archiveLifecycle to delete after policy date

The second and third rows are where most cost-only plans fail. They put compliance and DR data on a cheap tier and call it done, with no immutability and no credential isolation, which means a single compromised key reaches the one copy that was supposed to save you.

Filling out that grid is only half the work. The other half is proving the grid is true before automation acts on it, and I'd rather argue that proof in plain sentences than reduce it to a wishlist. Classification has to be driven by measured access rate, never by file age alone, because an object untouched for months can still feed a quarterly process. With the rate in hand, I go find the jobs that actually read each candidate dataset: the documented ones, and the undocumented batch and reporting jobs that are the usual culprits behind a "dormant" misclassification.

Only then does the policy get to run in shadow mode against real access logs, and it has to come back with zero unexpected cold hits before I trust it. Any cold-tier object holding compliance or DR data has to carry Object Lock with an explicit retention window, and the DR and backup copies have to live under isolated credentials rather than the same admin scope as production.

The last two checks are the ones people skip. I model retrieval and egress cost for the realistic recall pattern before I quote anyone a saving, and I restore from the cold tier at least once so the archive is a proven backup and not a hopeful one. Miss any of those steps and tiering stops saving money; it just defers an outage to whichever day reality disagrees with your metadata.

On the vendor paradigm shift - Jack is right, and it cuts toward open formats

Jack's broader call is for storage suppliers to become data-lifecycle partners rather than capacity sellers, and he reaches for the VMware-and-Jevons-paradox analogy: vendors feared virtualization would kill server sales, and instead efficiency expanded the whole market. I think that read is correct, and the Manchester University example in the source, a 10 PB PowerScale purchase avoided by moving stale data to public cloud, is a clean illustration that classification beats reflexive capacity buys.

I'd add the part that matters most to anyone implementing this: the leverage only holds if you avoid trading hardware lock-in for software lock-in. The whole benefit of moving dormant data to object storage evaporates if your "lifecycle layer" can only address one proprietary back end. Insist on the standard S3 API for the cold tier so the data, and the lifecycle rules around it, stay portable. A vendor-neutral software layer over a standard API is what makes Jack's paradigm shift real instead of a relabeled procurement cycle.

About

I'm Alex Kumar, a Senior Platform Engineer and Infrastructure Architect at Rabata.io. We run S3-compatible object storage for teams who got tired of surprise egress bills and eight-tier pricing tables. Most of my career has been spent on the on-call side of storage: Kubernetes persistent volumes, backup and DR, and the cost work that comes from staring at a bill nobody can fully explain.

I've cut an infrastructure spend in half through tiering and lifecycle policy, and I've also caused an outage with a lifecycle policy, which is the more instructive of the two. My bias, earned the hard way, is that reliability enables velocity, and that no storage decision is finished until you've written down what happens when it fails.

Conclusion

The bottom line is simple. The Blocks & Files argument is right where it counts: continually buying capacity is a losing financial strategy, and the reframe from "storage problem" to "expense problem" is the correct mental model. The piece overreaches only in its implied conclusion that tiering is the answer. Tiering is an instrument. It rewards accurate, access-driven classification and an explicit protection model, and it punishes guesses with latency, egress charges, and a cold tier that's a ransomware liability instead of a recovery asset.

Before you sign off on any tiering plan, go restore one dataset from the proposed cold tier and read the bill that recall generates; if the archive comes back clean and the number still looks like a saving, you have a real saving rather than a deferred incident waiting for the quarter your metadata turns out to be wrong.

Frequently Asked Questions

Only if access patterns match the tier. Per-GB price drops, but cold tiers add retrieval and egress charges. If "dormant" data turns out to be read by batch jobs or pulled across regions, those fees can erase the savings. Model the realistic recall pattern before committing, not just the per-GB number.

Not by itself. Cold data is an attractive target because nobody watches it, and a compromised control plane reaches it like everything else. What reduces the risk is immutability: Object Lock with a retention window makes the cold copy unalterable even by an admin account. Tier for cost; lock for survival. They are separate decisions.

Age is a weak proxy. Drive classification by measured access rate, then reconcile it against the jobs that actually read the data, including undocumented batch and reporting jobs. An object untouched for months can still feed a quarterly process. The cheapest reliable answer is access logs, not folder dates.

A "dormant" dataset that secretly feeds a live job. When the policy transitions it to cold, the next run hits archive-latency reads and times out, often cascading into dependent systems. The fix is a shadow run against real access logs that shows zero unexpected cold hits before you ever enable auto-transition or deletion.

Keep the cold tier on the standard S3 API so data and lifecycle rules stay portable across providers, and use a vendor-neutral software layer to drive classification rather than a single-back-end tool. The point of moving off proprietary capacity is portability; you lose it the moment your lifecycle engine only speaks one vendor's dialect.