In addition to officially announcing the upgraded Instinct MI325X GPU accelerator card, AMD has unveiled the first product in its next-generation Instinct MI350 series: the Instinct MI355X. Some specifications and performance data have been disclosed. The MI355X will be available in the second half of 2025, which is almost a year away.
The MI350 series will mark the debut of the TSMC 3nm process and CDNA 4 architecture, alongside the introduction of the FP6 and FP4 floating-point data types. While it continues to pair with HBM3E memory, the capacity now soars to an impressive 288GB. Although specific power consumption numbers haven't been released, the MI325X reaches up to 1000W. AMD indicates that the MI355X will align with industry trends (NVIDIA B200 1000W, GB200 1700W), suggesting it will significantly exceed 1000W.
FP6 and FP4 are floating-point data formats representing 6-bit and 4-bit precision respectively, substantially reducing precision compared to FP16 and FP8 but also reducing data volume, making them ideal for quantizing large models, particularly in cases like large language models and mixture of experts models. If precision isn't a priority and speed is desired, FP6 and FP4 excel in this context.
On the MI355X, FP6 and FP4 deliver a performance of 9.2 PFlops (92 trillion operations per second), with an 80% improvement in FP16 and FP8 performance, reaching 2.3 and 4.6 PFlops respectively.
NVIDIA's Blackwell GPUs also introduce FP6 and FP4 precision but surpass with higher performance levels at 20 PFlops and 40 PFlops, respectively.
The generous allocation of up to 288GB of HBM3E memory on a single card is unmatched, with a bandwidth reaching up to 8TB/s. Compared to the MI325X, there are respective increases of 1/8th and 1/3rd, while both metrics have risen by 50% from the currently available MI300X. In contrast, the Blackwell B200 offers only 192GB of HBM3E, yet matches the 8TB/s bandwidth.
The MI355X supports eight cards on a single platform, resulting in a massive combined capacity of 2.3TB of HBM3E memory, 64TB/s of bandwidth, and performance reaching up to 18.5 PFlops in FP16, 37 PFlops in FP8, and 74 PFlops in FP6/FP4. This enhancement will also be available in the latter half of next year.
AMD's Instinct series has shown remarkable progress in performance enhancements, presenting innovative advancements from one generation to the next with striking comparative metrics. When comparing the MI355X to the MI300X, FP16 performance surges to 7.4 times more, HBM capacity is 1.5 times greater, and the capability to process model parameters jumps from 714 billion to 4.2 trillion, indicating a sweeping 6-fold increase.
Looking ahead, the year 2026 will witness the launch of the next-generation Instinct MI400 series, which is expected to incorporate the forthcoming CDNA architecture (possibly CDNA 5?), taking both specifications and performance to a new pinnacle.