In a startling revelation, Intel’s Gaudi 2 silicon has emerged as a formidable competitor in the AI acceleration arena, challenging Nvidia’s long-standing dominance. This revelation comes from a recent fine-tuning performance benchmark for BridgeTower, a cutting-edge Vision-Language (VL) AI model, conducted by Hugging Face. The results have left the tech industry buzzing, and it’s evident that there’s more to the AI acceleration race than just Nvidia’s green dominance.
The Rise of Intel’s Gaudi 2
Intel’s entry into the AI market came through its $2 billion acquisition of Habana in 2019, and it has been silently working on its own AI acceleration technology. The standout moment for Intel was the recent benchmarking of its Gaudi 2 silicon, which outperformed Nvidia’s A100 80 GB by a staggering 2.5 times. Even Nvidia’s prodigy-child, the H100, was left behind by a margin of 1.4 times.
A Game-Changing Acceleration Technique
The secret behind Gaudi 2’s extraordinary performance lies in a hardware-accelerated data-loading system, addressing a critical bottleneck in AI model fine-tuning, particularly for VL models. Traditionally, CPUs faced challenges with resource-intensive operations such as image decoding and augmentation. This often led to the CPU stalling while waiting for data to be processed and sent to the AI accelerator. Gaudi 2’s integrated hardware acceleration revolutionizes this process.
Revolutionizing Data Loading
With Gaudi 2’s hardware acceleration, the CPU is significantly less burdened, freeing up resources for other tasks during the fine-tuning process. This innovation optimizes the following steps:
- Fetch data
- CPU reads encoded images
- Encoded images are sent to devices
- Devices decode images
- Devices apply image transformations to augment images
By accelerating image transformation, Gaudi 2 ensures improved overall performance.
Benchmarking Unveils Gaudi 2’s Prowess
To showcase the performance gains of hardware-accelerated image loading, Habana conducted extensive benchmarking. They fine-tuned a pre-trained BridgeTower checkpoint with 866 million parameters, running workloads across 8 devices, including Nvidia’s A100 80 GB, H100, and Gaudi 2. The results were averaged over three processing runs, with each run introducing more dedicated CPU processes for data loading.
The Dominance of Gaudi 2
The results were clear and eye-opening. In the best-case scenario, where data loading occurred alongside the main training process, Gaudi 2 outperformed Nvidia’s H100 by a remarkable 1.79 times and the A100 by an astonishing 2.23 times. Even in non-optimized scenarios, Gaudi 2 held its ground.
Diminishing Returns for Nvidia
Attempting to further improve performance by spawning additional data-loading processes yielded diminishing returns for Nvidia. For instance, introducing a single dedicated data-loading process improved performance by 1.72 times, but adding a second process contributed only a marginal 3% improvement. In contrast, Habana’s Gaudi 2 achieved an additional 10% performance boost against its own best score by handling most data-loading steps internally.
The Ongoing AI Acceleration Race
Nvidia, with its exceptional product and software stack, enjoys the first-mover advantage in the AI acceleration market. However, this development shows that Intel and other contenders like AMD are determined to challenge Nvidia’s reign.
The AI acceleration space remains fiercely competitive, and while Nvidia still leads, it’s evident that the underdogs are catching up, poised to potentially surpass the current favorite. The tech industry is witnessing a seismic shift, and the future of AI acceleration has never looked more intriguing.