AMD to Split MI400 into Two Series Next Year

kyojuro الجمعة، 18 ذو القعدة 1446 بعد الهجرة

According to reports from foreign media, AMD is planning to release its next-generation Instinct MI400 series accelerators in the latter half of 2026, featuring two distinct models: the MI450X designed for artificial intelligence (AI) and the MI430X tailored for high-performance computing (HPC).

The MI400 series is built on AMD's latest CDNA Next architecture. In contrast to the MI300 series, which is suitable for both AI and HPC but limits peak performance due to its general-purpose design, the MI400 series addresses this by providing specialized solutions. The MI450X is optimized for AI functionalities, focusing on lower precision compute formats such as FP4, FP8, and BF16, and excludes FP32 and FP64 logic. Conversely, the MI430X enhances HPC tasks by supporting high-precision FP32 and FP64 calculations while removing lesser precision AI functionalities. This targeted architecture allows the MI450X and MI430X to excel in their respective domains including AI training, inference, and scientific computations.

Regarding technical specs, the MI400 series is expected to maintain AMD's strengths in memory capacity and bandwidth. Looking at the MI300 series, the MI300X includes 192GB of HBM3 memory at 5.3TB/s bandwidth, with the MI325X upgrading to 256GB of HBM3E and 6TB/s bandwidth. The MI400 series might adopt HBM3E or HBM4 reaching up to 288GB of memory and even higher bandwidths, accommodating large-scale AI models and HPC applications. The MI300X achieves a theoretical peak of 2,614.9 TFLOPS at FP8 precision, and the MI400 series is anticipated to enhance this significantly with architectural upgrades and process improvements such as moving to a 3nm process.

A standout feature of the MI400 series is the incorporation of UALink interconnect technology. This high-performance, scalable GPU interconnect solution, developed by AMD in collaboration with Intel, Microsoft, and others, competes directly with Nvidia's NVLink. UALink supports high-bandwidth, low-latency data transfers, apt for constructing extensive AI and HPC clusters. However, its commercialization faces hurdles, as it is unlikely that external vendors like Astera Labs and Enfabrica will provide mature switching silicon before 2026. Thus, the MI400 series' use of UALink may initially be restricted to small mesh or ring topologies due to AMD's reliance on partners for UALink switches, which introduces deployment uncertainties. Meanwhile, the Ultra Ethernet Consortium’s more developed networking solutions may offer alternative scaling options.

Beyond UALink, the MI400 series will continue to support AMD’s Infinity Fabric technology, which ensures high-throughput, low-latency inter-chip communications. AMD plans to introduce Infinity Fabric-based system-level solutions like the MI450X IF64 and MI450X IF128, which support 64 and 128 GPUs respectively, in clustered configurations. These setups will connect via Ethernet, pitting against Nvidia's rack-level offerings such as the VR200 NVL144. Infinity Fabric has already proven beneficial in the MI300 series, with the MI300A APUs achieving up to 5.3 TB/s of bandwidth through a unified CPU-GPU memory structure, a capability expected to be further refined in the MI400 series.

The MI400’s design also reflects AMD’s ongoing innovation in modular architecture. Recent insights reveal that MI400 will feature a chiplet design with two Active Interposer Dies (AID), each housing four Accelerated Compute Dies (XCD), totaling eight XCDs—an expansion from the MI300 series' two XCDs per AID. Moreover, the introduction of the Multimedia IO Die (MID) aims to boost data throughput and processing efficiency, not only enhancing performance but also minimizing manufacturing expenses while increasing product adaptability.

Positioned in the market to directly compete with Nvidia’s Hopper and Blackwell architectures, the MI400 series targets precise performance boosts through low-precision AI optimization and high-precision HPC support. With NVIDIA's H100 GPU delivering a peak of 1978.9 TFLOPS at FP8, AMD's MI325X has already surpassed this performance. Additionally, AMD's ROCm software platform supports the MI400 series. The latest ROCm version 6.2 enhances inference and training efficiencies 2.4x and 1.8x respectively, and incorporates key AI features like FP8 and Flash Attention 3, maintaining the MI400 series' software ecosystem competitiveness.

Nonetheless, challenges persist for the MI400 series. Apart from UALink’s constraints, AMD's brand influence in AI lags behind Nvidia's, which benefits from an entrenched CUDA ecosystem and early market presence. AMD seeks to attract users through its open ROCm platform and superior price/performance propositions. Additionally, the swiftly evolving AI and HPC market demands AMD's consistent iteration and rapid development. Reports suggest the MI350 series (built on CDNA 4 architecture) is set for mid-2025 release, offering FP4 and FP6 formats, and potentially delivering FP16 performance up to 2.3 PFLOPS.

As AI and HPC markets continue to escalate in demand, models like Generative AI (e.g., Llama 3.1 70B) require increased memory and computational power, whereas HPC applications demand precise computation and large-scale support. AMD’s MI400 series meets these challenges through a strategy of differentiation and specialization. Concurrently, advancements in open interconnect standards such as UALink and Ultra Ethernet pave the way for more flexible, scalable architectures, poised to benefit companies like AMD significantly.

The AMD Instinct MI400 series showcases its prowess in AI and HPC through tailored designs, cutting-edge interconnects, and modular architectures. With the imminent release of MI450X and MI430X, users can anticipate specialized solutions, further strengthened by the incorporation of Infinity Fabric and UALink, which bolster its clustering deployment potential. Despite the hurdles in interconnect technology and market rivalry, the innovative blueprint of the MI400 series and AMD's proactive iteration strategy set it on a pathway to becoming a formidable presence in the data center GPU landscape by 2026.

أخبار ذات صلة

اتصل بنا سياسة الخصوصية