IBM Cloud Is First to Widely Implement Intel Gaudi 3 AI Chips

IBM is the first cloud customer for Intel’s Gaudi 3 AI accelerator chip, which it will make available in early 2025. The Gaudi 3 will be available for hybrid and on-site environments via the IBM Cloud, as part of Watsonx AI and on IBM data platforms. Gaudi 3, which began shipping in Q2 and is expected to go into mass production later this year, is IBM’s AI challenger to GPU accelerators from Nvidia and AMD, the latter having in January begun shipping its own HPC solution, the MI300X. Unlike that chip and Nvidia’s Hopper H100 and more recent Blackwell B200, the Gaudi 3 is not a GPU, but built on an architecture specifically for inference and deep learning.

By integrating Gaudi 3 AI Accelerators and Intel’s Xeon 6 CPUs with the IBM Cloud, “we are creating new AI capabilities and meeting the demand for affordable, secure, and innovative AI computing solutions,” Justin Hotard, Intel Data Center and AI Group EVP and GM said in IBM’s announcement. “AI requires an open and collaborative ecosystem that provides customers with choice and accessible solutions.”

“Gaudi 3 delivers impressively high performance on the dollar, early benchmarks show, but bringing on board customers who already have strong relationships with Nvidia is presenting a challenge,” TechCrunch reports.

TechCrunch pegs Intel’s 2024 revenue projections from Gaudi 3 at about $500 million, calling that “a paltry sum compared to the $4.5 billion AMD expects to rake in from sales of its Instinct MI300-series GPUs, and the $40 billion Nvidia expects from its data center business.”

The Gaudi 3 is based on TSMC’s five-nanometer node. “Gaudi 3’s processing power is provided by two sets of onboard computing modules, dubbed MMEs and TPCs, that are each optimized for a different set of tasks,” SiliconANGLE explains. MMEs specialize in mathematical calculations known as matrix multiplications.

“Certain AI models, such as those used for object recognition tasks, carry out most of their processing with matrix multiplications,” SiliconANGLE writes, noting that TPCs are based on a very long instruction word architecture which speeds AI performance through parallel processing. Gaudi 3’s structure boosts its speed as high as 1,835 TFLOPS, or one trillion computations per second for BF16 data, a common AI storage format.

Another plus: an onboard, high-speed Ethernet module that can link multiple Gaudi 3 processors in an AI server and even connect several servers.

Next up for Intel: the Falcon Shores chips, which “take the massively parallel Ethernet fabric and matrix math units of the Gaudi line and merge it with the Xe GPU engines,” as detailed on The Next Platform (which also compares Gaudi 3’s performance against Nvidia GPUs).

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.