Cloud Storage Egress Fees Decide Your AI Data Bill, Not Per-GB Price

Blog 10 min read

It was a Tuesday standup when the finance lead pulled up the storage line and asked why a migration we had called the cheap option was now the most expensive thing in the cloud budget. The team I supported had moved a training-data lake to the provider with the lowest per-GB storage rate on the comparison sheet. Three weeks later the bill had tripled. Storage itself was a rounding error. The damage was egress: every epoch of model training pulled the same hot shards out across the network boundary, and every pull metered.

That same dynamic is hiding in Google's latest Data Cloud update, and almost nobody framing the announcement is saying it out loud. Google now lists Cloud Storage standard at $0.020/GB, narrowly under Amazon S3's $0.023/GB. Read only that line and GCS looks like the cheaper home for the petabytes of "dark data" that its new agentic tooling wants to feed into Gemini and BigQuery. Google says this dark data, the contracts and images and emails most companies never touch, makes up roughly 90% of enterprise data.

But GCS egress runs $0.12/GB, about 33% higher than the storage saving implies, and at around 10TB of monthly egress the total cost of GCS and S3 converges to the same ~$1,060–$1,100. The per-GB storage number is a decoy. For any AI/ML workload that actually reads its data, egress is the bill.

I run S3-compatible object storage for a living, so I have a stake here, and I want it on the table from the start. My argument is not "pick a cheaper provider." It is that the unit you are shown, price per GB stored, is the wrong unit to decide on, and that the storage-resilience and DR posture you build on top of that decision matters more than the sticker price either way.

The per-GB number is engineered to mislead you

Cloud storage pricing is quoted in the dimension that flatters the seller. Storage-at-rest is cheap to provision and easy to advertise, so that is the headline. The costs that actually move with an AI workload sit in the footnotes: egress when you train, request charges when you list and GET millions of small objects, cross-region replication when you build for resilience.

The Google Data Cloud announcement is a clean case study because the misdirection is visible in Google's own numbers. The $0.020/GB storage rate is real and competitive. The $0.12/GB egress rate is also real, and it is the one that decides whether the migration pays off. The 10TB crossover point matters because 10TB of monthly egress is not exotic for a serious training pipeline. One mid-sized dataset re-read across a few hundred epochs gets you there without trying.

The discipline that fixes this is boring and it works: model the read pattern before you model the store. Estimate egress-GB-per-month from how the data is actually consumed, multiply by the egress rate, and only then add storage-at-rest.

Cost lineWhat it tracksWhen it dominates
Storage per GB/monthBytes at restCold archives, backups you rarely read
Egress per GBBytes leaving the provider boundaryTraining reads, multi-cloud, CDN origin pulls
Request charges (PUT/GET/LIST)Operation countMany small objects, frequent listings
Cross-region replicationBytes copied for resilienceDR copies, geo-redundant buckets

A provider that wins row one and loses row two is the wrong choice for a workload that lives in row two. That is the whole calculation, and it is the calculation the per-GB headline is designed to skip.

Egress lock-in is the real vendor lock-in

The phrase "vendor lock-in" usually conjures proprietary APIs. In object storage that fear is mostly solved. The S3 API is the de-facto standard, and a genuinely compatible store is a drop-in where you change an endpoint and a credential rather than rewriting anything. The lock-in that survives is economic, and egress is the bolt on the door.

Here is the mechanism. The cost to *leave* a provider is the cost to egress everything you have stored there. As your data grows, the exit toll grows with it, and at some volume the toll alone is a multi-month budget line. You did not sign a contract that holds you in place. The pricing model did that for you. This is why I treat egress rate as a strategic number rather than an operational one: it sets the price of every future architectural decision.

This is also where I will state my company's position in the open. At Rabata.io we price S3-compatible storage at roughly $10/TB against AWS S3's ~$23/TB and do not meter egress the way the hyperscalers do, which is the entire reason the model exists: to make leaving cheap so that staying is a choice, not a sentence. Take that as the interested testimony it is. The point holds whoever you buy from. When you evaluate any provider, price the exit and not only the entry. A store you cannot afford to leave has quietly stopped competing for your business.

A storage decision is a disaster-recovery decision

The part of these cloud-data announcements that gets the least attention is the part that wakes you at 3 AM. Every choice above is also a resilience choice, and the cost lines double as failure modes.

Cross-region replication is the obvious one: geo-redundancy is how you survive a region outage, and it is billed as inter-region egress every time bytes copy. Teams that optimized purely for the lowest storage tier routinely discover their DR copy is the line item they cut first, which means their recovery point quietly degrades to "whatever was in the primary region when it died." The cheaper the storage looked, the more tempting it was to skip the replica. That is how a cost optimization becomes a data-loss incident.

There is a second failure mode specific to the AI era: ransomware against training data. An attacker who encrypts or deletes your dataset is betting you have no clean, immutable copy. The defense is object-level immutability, versioning plus a write-once retention lock (S3 calls it Object Lock; compatible stores implement the same semantics), held on a copy with an independent blast radius from your primary. All of it is standard S3-API functionality, which is the argument for staying inside the S3 contract no matter whose backend serves it.

How do you know your storage-cost comparison is actually sound? Start by confirming there is a second copy in a separate failure domain, and that you have priced its replication egress in dollars rather than assumed it away. Then check that immutability, meaning versioning together with a retention lock, is enabled on the recovery copy so ransomware cannot reach through to it.

The criterion most teams skip is the restore test itself: a backup job reporting success proves nothing, so the bytes have to be pulled back and checksummed against the source before you trust them. The last question is whether your RPO survives the cost cuts, because if trimming the bill means dropping the replica, your real recovery point is worse than the runbook claims. A storage tier that wins the cost sheet and fails those questions is the expensive option. You just pay for it later, all at once.

About

I am Alex Kumar. My title is Senior Platform Engineer and Infrastructure Architect at Rabata.io, which runs S3-compatible object storage out of GDPR-compliant facilities in the EU and US; I do the job from Toronto on a fully remote setup. The lessons I trust most arrived as cleanup. A lifecycle policy I once misconfigured queued a few million objects for deletion in one shot, and a different night taught me that a disaster-recovery copy nobody has actually restored is a hope, not a backup.

The technical center of my week is Kubernetes persistent storage, the design of backup and recovery flows, and keeping infrastructure spend legible as it grows. A tiering project I led once pulled 52% off a bill simply because terabytes had been left sitting on premium fast storage with no reason to be there. I carry the CKA and CKS, and I would rather ship proven, unglamorous tooling than chase whatever launched last week. Yes, I am partial to S3-compatible storage that does not punish you on egress; read this with that interest in mind, because the egress arithmetic above stands no matter where you store your bytes.

Conclusion

Google's Data Cloud refresh is a genuinely strong set of tools for turning dormant enterprise data into something agents can use, and the $0.020/GB storage rate is a fair price for bytes at rest. The mistake is letting that number decide the architecture. For any AI/ML workload that reads its data, which is all of them, egress at $0.12/GB is the figure that determines the bill, the migration math flips at roughly 10TB of monthly egress, and the exit toll on accumulated data is the lock-in that actually constrains you.

Decide in the right unit. Estimate egress-GB-per-month from your real read pattern, price the cost to leave alongside the cost to store, and confirm your disaster-recovery copy survives the budget rather than being its first casualty.

Bottom line: before you sign off on any storage migration, put three numbers on the same page - storage-at-rest, projected monthly egress at the provider's per-GB rate, and the dollar cost of pulling everything back out. If the egress and exit figures stay invisible, the quoted per-GB price is telling you almost nothing about what this workload will actually cost. The one figure worth watching after you migrate is your monthly egress spend against forecast; that line, not the storage rate, is where a cheap-looking decision turns expensive.

Frequently Asked Questions

Not usually. Per-GB storage covers bytes at rest, but training and analytics workloads are dominated by egress and request charges. Google Cloud Storage is $0.020/GB to store yet $0.12/GB to egress, and at about 10TB of monthly egress its total cost matches Amazon S3 despite the lower storage rate. Price your read pattern first.

Because GCS trades a slightly lower storage rate for a higher egress rate, around 33% above what the storage saving implies. As egress volume climbs, the higher per-GB egress charge eats the at-rest discount, and at roughly 10TB per month the totals converge near $1,060–$1,100. Below that GCS can win; above it the storage discount stops mattering.

Egress lock-in is the cost of moving your data out of a provider - it grows with the data you accumulate, so a large dataset can be effectively trapped by its own exit toll. Avoid it by treating egress rate as a strategic number, preferring stores that do not meter egress punitively, and pricing the cost to leave before you commit, not after.

The opposite, when the compatibility is genuine. A true S3-compatible store is a drop-in replacement where you change an endpoint and credentials rather than rewriting application code, and existing tools like Terraform, boto3, rclone, and the AWS CLI keep working. The S3 API being a de-facto standard is what makes leaving any single provider cheap.

Add the resilience copy to the math explicitly. Cross-region replication for geo-redundancy is billed as inter-region egress, so a DR copy has an ongoing cost the lowest storage tier tempts you to cut. Keep an immutable, versioned copy with a retention lock in a separate failure domain, price its replication egress honestly, and test the restore end to end before you trust it.