Diskless Kafka Hits Sub-10ms Latency: Read the Throughput Cap First
Picture the day someone has to decide whether to move a real-time Kafka cluster off broker-local disks and onto a diskless, S3-backed design. The pitch on the table is a 94% cost cut. The question that should dominate the room is which line on the current bill that cut actually attacks. It is rarely the line people argue about. It is not the brokers and it is not S3 storage. It is the cross-zone replication traffic, the data your cluster copies between availability zones three times over just to stay durable. Teams routinely shave compute for a quarter only to find that network line untouched and growing. That is the line Diskless Kafka goes after.
On 12 June 2026 AWS published a write-up showing AutoMQ, an S3-backed Diskless Kafka, hitting average write latency of 5.98 ms while claiming 94% lower cost than a traditional cluster sized to the same latency target. The mechanism is a Write-Ahead Log on Amazon FSx for NetApp ONTAP sitting in front of S3. The headline numbers are real and they are good. I work on storage TCO for a living, though, and the figure that decides whether this design fits your workload never makes the headline. It is the FSx throughput cap, and it behaves nothing like the elastic compute layer above it.
My argument is narrow and practical. Diskless Kafka on FSx for ONTAP genuinely removes the cross-AZ tax that dominates traditional Kafka bills, but it trades an elastic problem for a capacity-planning one. You stop tuning replication and start sizing a fixed storage tier you cannot resize at runtime. Get that sizing wrong and the latency win evaporates into backpressure. Read the cap before you read the savings.
Why the cross-AZ bill, not the storage bill, is the real driver
A traditional Kafka cluster in three availability zones pays cross-AZ data transfer from three directions at once: triplicate replication between brokers, cross-AZ consumer reads, and cross-AZ producer writes. The source's own cost model makes the proportion brutally clear. In its comparison sized to a P99 write latency under 10 ms, the traditional Apache Kafka column carries $175,200 a month in cross-AZ payload fees alone, against a total AutoMQ bill of $18,345. The cross-AZ line by itself is nearly ten times the entire diskless cluster.
That is the leverage point. AutoMQ runs the Kafka layer at a replication factor of 1 and leans on the durability already baked into S3 and FSx, so brokers stop copying data to each other. Combined with rack-aware reads and a zone-routing interceptor that proxies a producer's write to a broker in its own zone, only control metadata crosses zone boundaries. The data plane goes near-silent across AZs.
AutoMQ's own materials put the broader infrastructure reduction at 80% to 90% versus disk-based architectures. One point on provenance, because it matters for how much you lean on it: that 80-90% figure comes from AutoMQ, not from AWS. Treat it as a vendor number and verify it against your bill.
What does not change is the protocol. AutoMQ keeps Kafka's network and compute layers intact and replaces only the storage layer's LogSegment with an S3-native engine, preserving full API compatibility with existing clients, Kafka Connect, and ksqlDB. The swap happens at the storage engine; everything your clients talk to stays where it was.
The latency tradeoff that diskless was supposed to lose
Object storage was never built for hot writes. S3 I/O latency runs to dozens of milliseconds because S3 is optimised for cheap, durable, write-once-read-many storage rather than for sub-millisecond acknowledgement. Write every record synchronously to S3 and you lose the one capability latency-sensitive workloads cannot give up.
The obvious patch is a fast buffer in the middle. The obvious buffer is EBS, and that is where teams get caught. S3 is a regional service; EBS volumes are zonal. An EBS-backed WAL cannot be shared across zones, so it forces every broker into a single availability zone to reach the same log. You have just rebuilt the single-AZ fragility that cloud-native design exists to avoid. The historical choice on AWS was exactly this binary: EBS gives you low latency but cross-AZ replication cost and zonal exposure; S3 directly gives you no cross-AZ cost but latency too high for real-time work. Cheap and slow on one side, fast and fragile on the other.
FSx for ONTAP is the third option because it is a Multi-AZ shared file system that delivers sub-millisecond latency on its SSD tier and does not bill cross-AZ traffic. Because AutoMQ's WAL is a fixed-size circular buffer holding only the hot tail of the log, the FSx footprint stays small and fixed regardless of retention. Writes land on FSx first, then flush asynchronously to S3 in batched payloads that also cut S3 API call volume.
The benchmark backs the claim. In us-east-1 on 3x m7g.4xlarge brokers with FSx provisioned at 736 MBps, sustaining 300 MBps writes and 1.2 GiBps reads at a 4:1 ratio, write latency averaged 5.98 ms (P99 12.87 ms) and end-to-end averaged 7.79 ms (P99 18.04 ms). That approaches local-disk Kafka while keeping S3 economics.
One caveat on those numbers: they are a single vendor benchmark on one instance shape, one region, one message size. I would not carry 5.98 ms into a capacity plan as a guarantee. I would carry it as the ceiling you reach when the WAL is correctly provisioned, and the floor of what breaks when it is not.
Where this design bites: the fixed throughput cap
Here is the part that gets glossed over. The compute layer is elastic; the storage layer is not. AutoMQ brokers scale in seconds because they are stateless. FSx for ONTAP throughput, by contrast, is a fixed instance specification you choose at deploy time and cannot resize on the fly. The available tiers map directly to Kafka write ceilings:
| FSx for ONTAP throughput tier | Max Kafka write throughput | Scaling behaviour |
|---|---|---|
| 384 MBps | 150 MiBps | Add instances, not resize |
| 768 MBps | 300 MiBps | Add instances, not resize |
| 1,536 MBps | 600 MiBps | Add instances, not resize |
The consequence is a planning tension absent from broker-local Kafka. Under-provision the WAL and a traffic spike fills the circular buffer faster than the async flush drains it; new writes block and your sub-10 ms latency collapses into backpressure. Over-provision and you have locked capital into bandwidth you never use, because over-allocating storage capacity buys nothing when throughput is tied to the instance class, not the gigabytes. To scale past a tier you add another FSx instance, which then influences how you distribute partitions. None of this is hard, but all of it is upfront, and it is the opposite of the elasticity the compute layer promises.
This is also where the spot-instance economics get interesting. Because durability lives in S3 rather than on local disk, brokers can run on spot capacity without risking data loss on termination, and the source prices an r6i.large spot instance at roughly a quarter of on-demand. The risk shifts from data loss to availability: a mass spot eviction with slow node replacement still hurts, so you want strong auto-scaling groups behind it. You are not removing risk; you are moving it from the storage layer to the scheduler, where it is easier to manage.
How to check the fit before you commit
Before recommending FSx-backed Diskless Kafka over a simpler EBS or S3 WAL, I run a short reconciliation. The table below is how I weigh it: what to check, the answer that points to FSx, and why each one swings the decision rather than just decorating it.
| What to check | The answer that earns FSx | Why it changes the call |
|---|---|---|
| Do you actually need Multi-AZ? | Cross-AZ durability is a hard requirement | For single-AZ deployments AWS recommends EBS WAL outright for best cost and performance; FSx earns its keep only across zones |
| Is your latency budget genuinely sub-10 ms? | Yes, the floor justifies the premium tier | If you tolerate tens of milliseconds end-to-end, an S3 WAL with no cross-AZ fee is the cheaper, simpler answer |
| Have you sized peak, not average, ingestion? | Peak mapped to a 150, 300, or 600 MiBps tier | The cap is fixed; the spike is not, so an average-sized tier backpressures under load |
| Can you tolerate manual storage scaling? | Yes, traffic is predictable enough to pre-provision | Storage tiers do not auto-scale; spiky, unpredictable traffic will fight this design |
| Is monitoring watching flush lag and buffer fill? | Yes, dashboards surface the WAL, not just CPU | The failure mode is the WAL filling because the async flush fell behind; miss it and the cap arrives as an outage |
If the last three rows give you pause, the lower-risk path is to start with an S3 WAL, measure your real peaks, and migrate to FSx once you know the number you are sizing for.
About
I am Marcus Chen, a Cloud Solutions Architect and Developer Advocate at Rabata.io, working remotely out of Singapore. Day to day I live in S3-compatible object storage, Kubernetes persistent storage, and the data plumbing behind AI/ML workloads. I came to it from the operator's chair: solutions engineering at Wasabi, then DevOps at a Kubernetes-native startup, where I learned firsthand how a storage tier that cannot scale behaves when the traffic finally arrives.
My bias as an author is one I will state plainly. I distrust any architecture pitched on a single headline number, because the figure that lands on your invoice is almost never the one in the press release. Diskless Kafka is a genuinely good idea, and FSx for ONTAP is a clean way to keep the WAL fast and cross-AZ durable. My only ask is that you size the cap before you sign the bill.
Conclusion
Diskless Kafka on FSx for ONTAP does what it claims: it removes the cross-AZ replication tax that dominates multi-AZ Kafka bills and holds write latency under 10 ms by keeping the hot path on a shared, fast, cross-AZ file system while S3 carries cold storage. That combination is real and, for cross-AZ workloads that need single-digit-millisecond writes, it is hard to beat on cost.
The catch is that you trade an elastic problem for a fixed one. Compute scales in seconds; FSx throughput is a tier you choose once and resize by adding instances rather than by turning a dial. The architecture rewards teams with mature capacity planning and frustrates teams who treat storage like the elastic compute above it. Profile your peak ingestion first, pick the tier that clears it with headroom, then claim the savings.
If you take this design into production, the signal to watch is WAL flush lag against buffer fill. CPU and broker count will look healthy right up to the cap, so they are the wrong gauges here. Set an alert on the gap between how fast writes hit FSx and how fast the async flush drains them to S3. When that gap starts trending up under load, you are approaching the throughput cap, and that trend line is your early warning weeks before latency breaks. Watch it before the spike, and the cap stays a planning number instead of an incident.
Frequently Asked Questions
It removes the largest source of them. Because brokers run at a replication factor of 1 and rely on S3 and FSx for durability, they stop copying data to each other across zones, and a zone-routing interceptor keeps producer writes local. Only control metadata crosses AZ boundaries. In the source's cost model the cross-AZ payload line drops from $175,200 a month to $0.
No. For single-AZ deployments AWS recommends EBS WAL for the best cost and performance. For Multi-AZ deployments where you can tolerate higher latency, an S3 WAL avoids cross-AZ fees and is simpler. FSx for ONTAP is the right choice specifically when you need Multi-AZ durability and sub-10ms write latency at the same time. Match the WAL tier to your actual latency budget, not the headline.
Backpressure. The WAL is a fixed-size circular buffer, so if a traffic spike fills it faster than the async flush drains to S3, new writes block and your latency advantage collapses. Throughput is tied to the instance class, not the allocated capacity, so over-provisioning gigabytes does not help. Size against peak ingestion, map it to the 150, 300, or 600 MiBps tier, and instrument flush lag before you go live.
You can, because durability lives in S3 rather than on local disk, so a terminated broker does not lose committed data the way a traditional Kafka broker would. The source prices spot at roughly a quarter of on-demand for the broker shape it tested. The remaining risk is availability during a mass spot eviction, so back it with strong auto-scaling so node replacement keeps up. You are moving risk from data loss to scheduling, not erasing it.
Treat the 94% as the result of one vendor benchmark sized to a specific latency target, not a promise. The cross-AZ elimination is structural and will likely show up on any Multi-AZ cluster, but the exact percentage depends on your retention, throughput, and how over-provisioned your current cluster is. The broader 80-90% infrastructure-reduction range is AutoMQ's own claim, not an AWS finding. Model it against your real bill before you commit.