Google Ironwood TPU Revolutionizes AI Inference Speed

The Agentic Age: Driven by Google’s New Ironwood TPU

Google introduced its seventh-generation Tensor Processing Unit (TPU) called Ironwood which represents a major step forward in their proprietary artificial intelligence hardware for future AI developments. The newly developed chip has been specifically engineered to handle the advanced computational requirements of Google’s Gemini models which perform simulated reasoning known as “thinking.”

Google’s AI strategy fundamentally depends on combining its custom-built hardware with sophisticated AI models. Ironwood serves as an essential component that enhances inference speed and broadens the context range of powerful AI models. Google positions Ironwood as its most powerful and scalable TPU to date, which will enable the development of advanced “agentic AI” capabilities, marking the “age of inference” where AI systems will act on behalf of users.

The Architecture and Performance of Ironwood

Ironwood delivers substantial throughput improvements over earlier models. Google plans to use these chips for massive liquid-cooled clusters with up to 9,216 units. The newly enhanced Inter-Chip Interconnect (ICI) allows these interconnected chips to communicate directly which enables swift and efficient data transmission throughout the entire system.

Through this robust infrastructure Google will support its internal AI projects and developers using Google Cloud. Ironwood will be available in two configurations: Ironwood will feature a 256-chip server to handle limited environments and a 9,216-chip cluster that meets extreme AI demands.

The full Ironwood pod achieves massive computational power with its ability to perform 42.5 Exaflops of inference computing. Each Ironwood chip achieves a peak throughput of 4,614 TFLOPs, which represents a major advancement over earlier TPU generations, according to Google’s specifications. The memory capacity of each chip now holds 192GB, which represents a sixfold increase compared to the Trillium TPU’s memory capacity. The memory bandwidth now reaches 7.2 Tbps, which demonstrates a substantial growth of 4.5 times the previous capacity.

Understanding the Benchmarks

The process of comparing different AI chips directly presents challenges because each chip uses different benchmarking methodologies. The new TPU uses FP8 precision as Google’s performance standard. The company claims Ironwood “pods” outperform the world’s most powerful supercomputers by 24 times but this comparison requires careful consideration since certain supercomputing systems lack native FP8 hardware support.

Google did not include TPU v6 (Trillium) in their direct performance comparison. Google’s reports show that Ironwood provides double the computational efficiency per watt than TPU v6 does. According to a company representative, Ironwood is the successor to the TPU v5p and Trillium, developed from the less powerful TPU v5e. Trillium reached its peak performance at 918 TFLOPS when utilizing FP8 precision.

The Implications for the Future of AI

Despite the complexities inherent in benchmarking AI hardware, the underlying message is clear: Google’s AI infrastructure has advanced significantly with the development of Ironwood. Ironwood’s improved speed and efficiency continue the established groundwork, which has allowed rapid development in AI models such as Gemini 2.5, which runs on earlier TPU generations.

Google expects that Ironwood’s improved inference performance and efficiency will lead to substantial AI advancements during the upcoming year. Ironwood delivers essential computational power to enable the development of complex models and true agentic capabilities that will make it an essential element in realizing Google’s “age of inference” vision, where AI becomes an active and essential component of digital life.