Nvidia Halves AI Training Time with New Blackwell Chips

Nvidia’s latest generation of chips has significantly reduced the resources required to train large artificial intelligence models, according to new benchmark data released by MLCommons, a nonprofit that tracks AI system performance. The findings indicate a major leap in chip efficiency, positioning Nvidia further ahead in the AI hardware race.

MLCommons’ newest results assessed how chips from major manufacturers—including Nvidia and AMD—perform in training scenarios, where large datasets are processed to teach AI systems. While much investor attention has shifted to AI inference—the part where AI models generate responses—training remains crucial and resource-intensive, especially for the largest models with trillions of parameters.

Blackwell chips outperform Hopper in Llama 3.1 training

In a test involving Meta’s open-source Llama 3.1 405B model, which includes a massive number of parameters suitable for benchmarking training complexity, Nvidia’s new Blackwell chips emerged as the top performer. The data showed that 2,496 Blackwell GPUs completed the training in just 27 minutes.

Also read: Infineon, Nvidia to Build Efficient DC Power Systems

By comparison, Nvidia’s previous-generation Hopper chips required more than three times as many units to beat that time. This reflects more than double the per-chip performance in training efficiency for Blackwell over Hopper, highlighting a substantial generational leap.

AI industry shifts toward modular chip training architectures

Only Nvidia and its partners submitted results for the Llama 3.1 training category, underscoring the firm’s dominance in the high-end AI hardware segment. At a press briefing, CoreWeave Chief Product Officer Chetan Kapoor said the industry is moving toward modular training architectures. Instead of training AI models on single massive clusters of 100,000 chips or more, developers are now assembling smaller subsystems optimized for specific tasks.

“Using a methodology like that, they’re able to continue to accelerate or reduce the time to train some of these crazy, multi-trillion parameter model sizes,” Kapoor explained.

The benchmark also arrives as Chinese player DeepSeek makes headlines for developing a competitive chatbot with significantly fewer chips, suggesting that efficiency gains will play a central role in the global AI race.

This news was first reported by Reuters.

Latest articles

Related articles