Google Unveils Two New AI Chips, One for Training Models and One for Running Them

Google's TPU 8 Split Is a Market Structure Event, Not a Chip Release

At Google Cloud Next 2026, opening April 22, Google formally unveiled two distinct eighth-generation Tensor Processing Units: the TPU 8t for large-scale pre-training and the TPU 8i for inference, reasoning, and agent serving. The split confirms what the supply-chain rumor cycle had telegraphed for months — and matters far more than a product launch.

Google published hard specs. TPU 8t carries 216 GB HBM, 128 MB SRAM, and 12.6 FP4 petaflops; it scales to 9,600-chip superpods via a 3D-torus fabric called Virgo, capable of 47 petabits per second of non-blocking bisection bandwidth. TPU 8i carries 288 GB HBM, 384 MB SRAM, and 10.1 FP4 petaflops; it connects up to 1,152 chips via a new topology called Boardfly, which cuts network diameter in a 1,024-chip pod from 16 hops to 7. Google claims 2.7× training performance-per-dollar for 8t over the prior Ironwood generation, 80% inference performance-per-dollar improvement for 8i over Ironwood, and 2× performance-per-watt for both.

TPU 8t Is a Defensive Move; TPU 8i Is the Strategic Bet

The market will instinctively focus on 8t because training launches feel frontier. That instinct is backwards.

TPU 8t — codenamed Sunfish, designed by Broadcom, built on TSMC's 2nm process — exists because Google cannot afford to cede the frontier training race. It supports SparseCore for irregular memory access, native four-bit quantization, and 3D-torus networking that scales modestly beyond Ironwood's 9,216-chip limit. The claim of 2.7× cost efficiency over Ironwood is generation-over-generation, not a neutral comparison against Nvidia's current Blackwell or forthcoming Rubin systems. Investors should note the benchmark baseline.

TPU 8i — codenamed Zebrafish, developed with MediaTek — is the more important chip because it reflects where cloud margins and customer retention will actually be decided: not in rare pretraining runs, but in the relentless, continuous economics of serving reasoning models, agents, and retrieval-heavy workloads under latency SLAs. Its architecture encodes three deliberate bets. First, 384 MB of on-chip SRAM — triple Ironwood's — attacks the KV-cache and data-movement bottleneck that stalls inference at scale. Second, the Collectives Acceleration Engine replaces prior sparse-core arrangements and reduces collective latency by 5×, targeting the synchronization overhead that grows severe in autoregressive decoding and chain-of-thought reasoning. Third, Boardfly's seven-hop topology directly addresses the all-to-all communication patterns of mixture-of-experts models, where tail latency, not peak throughput, determines real-world cost. The 8i is less a chip than a thesis: inference is becoming a distributed systems problem before it is a pure silicon problem.

The Inference Era Has Changed the Competitive Map

Google is not inventing a new game. It is proving that AWS was right first.

Amazon's Trainium (training) and Inferentia (inference) product lines already embed the same economic logic. CEO Andy Jassy disclosed this month that Amazon's chip business — including Graviton, Trainium, and Nitro — runs at over $20 billion annualized, growing triple digits, and that custom silicon could save AWS tens of billions in annual capex while delivering several hundred basis points of operating-margin advantage over relying on merchant GPUs for inference. Trainium3, now shipping, carries 144 GB of HBM3e and targets agentic and reasoning workloads explicitly.

When both Google and AWS independently arrive at the same architecture split, that is not marketing. It is market structure. The AI hardware market is segmenting: pre-training, post-training, test-time compute, and agentic inference are now distinct economic regimes requiring distinct memory hierarchies and network topologies.

What This Means for Nvidia, Broadcom, Marvell, and Alphabet

Nvidia is not the loser here, but it is no longer the only answer to the right question. Its structural strength — CUDA, ecosystem, merchant-silicon portability, and developer familiarity — remains decisive for broad enterprise standardization and multi-cloud optionality. What narrows is the set of workloads where Nvidia can charge unquestioned scarcity rents: specifically, the high-volume, repetitive inference lanes where hyperscaler ASICs earn their economics at scale. Anthropic's commitment of up to one million TPUs to Google Cloud, while also running on AWS Trainium and Nvidia GPUs simultaneously, is the clearest external signal: even frontier labs want hardware diversity, not a new monoculture.

Broadcom remains a net winner, but Google's concurrent talks with Marvell for additional inference silicon and a memory processing unit are disciplining leverage, not abandonment. Marvell is strategically relevant; it is not yet revenue-de-risked at current prices. Alphabet's TPU program, long dismissed as internal infrastructure, is now a commercial wedge with anchor customers. That improves the odds that AI becomes a cloud-margin story for the stock — though at roughly $4T market cap, it is not a clean rerating event from a single chip launch.

The deepest truth from today's announcement is this: the industry has stopped debating whether inference economics matter and started building dedicated hardware to win them. That shift was always inevitable. It is now official.

not investment advice