Google Ironwood TPU: Next-Gen AI Inference Chip

Powering Proactive AI: Google’s Ironwood and the Agent Revolution

Google introduced its seventh-generation Tensor Processing Unit (TPU) named Ironwood, which represents a major improvement in its specialized AI hardware built to drive artificial intelligence forward. Google has built this new chip to address the intricate computational requirements of their leading Gemini models, specifically for simulated reasoning tasks, which they name as “thinking.”

Custom hardware together with advanced AI models serves as the foundation of Google’s strategic approach to artificial intelligence. Ironwood serves as a fundamental component that speeds up inference processing while enlarging the context windows of these advanced models. Google describes Ironwood as its leading TPU regarding scalability and power and sees it as essential for developing advanced “agentic AI” capabilities, which represent the “age of inference” where AI will perform actions for users.

The Architecture and Performance of Ironwood

Ironwood demonstrates a notable improvement in throughput performance over previous TPU models. Google plans to use these chips in expansive liquid-cooled clusters, which can support up to 9,216 units. A newly enhanced Inter-Chip Interconnect (ICI) enables direct communication between interconnected chips, ensuring rapid and efficient data transfer throughout the entire system.

Google’s robust infrastructure will support its internal AI projects as well as external developers who use Google Cloud services. Ironwood will be available in two configurations: Ironwood will function in a dual-mode setup including a 256-chip server optimized for controlled environments and the full 9,216-chip cluster built to manage the most challenging AI workloads.

A fully-functional Ironwood pod delivers extraordinary computational power with 42.5 Exaflops for inference tasks. Each Ironwood chip delivers a peak throughput of 4,614 TFLOPs, representing substantial progress over earlier TPU generations as specified by Google. The memory capabilities of each chip now stand at 192GB, which represents a six-fold increase from the memory capacity of the Trillium TPU. The memory bandwidth now stands at 7.2 Tbps, which represents a 4.5-fold enhancement.

Understanding the Benchmarks

Direct comparisons among AI chips become problematic because benchmarking methodologies differ across models. FP8 precision serves as Google’s benchmark standard for its latest TPU. The company’s claim about Ironwood “pods” being 24 times faster than the world’s top supercomputers requires careful examination because certain supercomputing systems lack native hardware support for FP8.

Google’s TPU v6 (Trillium) chip was excluded from the direct performance comparison. The performance per watt on Ironwood reaches twice that of the v6 according to Google. According to company representatives Ironwood succeeds the TPU v5p while Trillium comes after the less powerful TPU v5e. Trillium reached peak performance levels of about 918 TFLOPS in FP8 precision mode.

The Implications for the Future of AI

Despite the complexities inherent in benchmarking AI hardware, the underlying message is clear: Ironwood marks a major advancement in Google’s AI infrastructure development. The improved speed and efficiency of Ironwood builds on the solid foundation that allowed models such as Gemini 2.5 to advance quickly using the previous TPU generation.

Google expects that Ironwood’s improved inference abilities and performance will lead to major AI advancements in the next year. Ironwood will serve as a vital component in Google’s “age of inference” vision by delivering essential computational power to build complex models and enabling genuine agentic capabilities, which will make AI increasingly proactive and essential to our digital experiences.