Google Ironwood TPU Powers Advanced AI Inference

Ironwood’s Memory Muscle: Fueling Larger Context Windows

Despite the complexities of benchmarking, the underlying message is clear: Google’s AI infrastructure has reached a new level of advancement through Ironwood. The new system extends the existing foundation that enabled models like Gemini 2.5 to achieve fast progress while operating on older TPUs.

Google expects Ironwood’s improved inference processing and efficiency will serve as the foundation for groundbreaking artificial intelligence innovations over the next year. Ironwood stands to become a critical component for Google’s “age of inference” concept by delivering essential computational power for sophisticated models and enabling true agentic AI capabilities that will make AI systems more proactive and smarter in our digital world.

Decoding the Numbers: Ironwood’s Performance Context

The performance evaluation of various AI chips becomes complicated because they use different benchmarking methodologies. Google uses FP8 precision to benchmark Ironwood’s performance. The company states that Ironwood “pods” deliver 24 times faster performance than equivalent parts of the world’s top supercomputers, but users should approach this claim cautiously because not all supercomputing systems support FP8 natively.

In Google’s direct performance comparisons they did not include their TPU v6 (Trillium). Google confirms that Ironwood delivers double the energy efficiency performance of the v6 model. Ironwood succeeds the TPU v5p according to Google while Trillium follows the TPU v5e. Trillium achieved its highest performance level at about 918 TFLOPS when operating at FP8 precision.

Inside Ironwood: A Performance Powerhouse

Ironwood exceeds performance metrics by delivering significantly greater processing capabilities over previous Google TPUs. The deployment plan requires building large liquid-cooled clusters, which will consist of as many as 9,216 separate Ironwood chips. The new Inter-Chip Interconnect (ICI) enables seamless communication between these massive computational resources to achieve high-speed data transfer throughout the whole system.

This powerful processing capability will serve both Google’s internal AI research and development teams as well as external developers who use Google Cloud services. Ironwood will be offered in two configurations: Ironwood will be available in two configurations including a 256-chip server that handles moderate AI loads alongside a full-scale 9,216-chip cluster designed for processing extremely demanding AI tasks.

A fully operational Ironwood pod delivers an impressive computational output of 42.5 Exaflops for inference tasks. Google states that each Ironwood chip delivers 4,614 TFLOPs peak throughput, which represents a major advancement over earlier TPU generations. Each Ironwood chip now contains 192GB of memory, which represents six times more capacity than found in the Trillium TPU. The memory bandwidth has expanded by 4.5 times to achieve 7.2 Tbps.

Google has just unveiled its latest innovation in custom silicon: The latest version of Google’s Tensor Processing Unit architecture has reached its seventh generation through Ironwood. The new chip design targets the complex demands of Google’s powerful Gemini models by enabling what the company describes as “thinking,” which involves simulated reasoning abilities.

The company maintains that its sophisticated AI models operate optimally because of their specialized underlying infrastructure. Ironwood plays a crucial role in this strategic approach by delivering major improvements in inference speed while enabling the processing of larger amounts of contextual data for these advanced models. Google presents Ironwood as its most scalable and powerful TPU yet, which will enable future AI systems to autonomously collect data and produce outputs that support users in real-time – the fundamental principle behind Google’s “agentic AI” approach known as the “age of inference.”