NVIDIA Blackwell GPUs are known for their robustness, but just how powerful are they?
Beyond the official promotional data, we now have our first look at real-world test data from MLCommons MLPerf v4.1, the premier platform for assessing AI training and inference performance.
The comparison involves two generations of servers, the HGX B200 and the HGX H200. The former is equipped with up to eight Blackwell GPU B200s, each with a power consumption of up to 1000 watts.
The GPT-3 pre-training benchmarks reveal that Blackwell GPUs have doubled their performance compared to the previous Hopper generation.
In the Llama 2 fine-tuning tasks, with 70 billion parameters, Blackwell shows an improvement of up to 2.2 times in performance.
Notably, the Blackwell platform integrates ConnectX-7 SuperNICs, Quantum-2 InfiniBand switches, and the fifth-generation NVLink interconnect bus. These features ensure robust inter-node communication, enabling a balanced distribution of AI training loads and enhancing overall efficiency.
For example, reaching the same performance with GPT-3's 175 billion parameters requires 256 Hopper generation GPUs, while Blackwell accomplishes this with only 64.