MLCommons, a vendor-neutral and multi-stakeholder organization, has recently expanded its suite of MLPerf AI benchmarks to include testing for large language models (LLMs) for inference and a new benchmark for measuring the performance of storage systems for machine learning (ML) workloads. MLPerf aims to provide a level playing field for vendors to report on various aspects of AI performance.
The MLPerf Inference 3.1 benchmarks, released as the second major update this year, feature a vast dataset with over 13,500 performance results. The submissions include a wide range of contributors such as ASUSTeK, Azure, Dell, Google, Intel, Nvidia, Oracle, Qualcomm, and many others. Notably, many submitters have showcased significant performance improvements of 20% or more compared to the previous benchmark version.
The MLPerf 3.1 benchmarks not only highlight continued improvements in performance but also reflect the evolving AI landscape. The introduction of the LLM benchmark in this quarter acknowledges the increasing importance of generative AI language models. While MLCommons previously included LLMs in the MLPerf 3.0 Training benchmarks, running inference operations with LLMs presents distinct challenges due to their generative nature.
Unlike training LLMs, which involve large foundation models like GPT-J 6B, LLM inference focuses on more deployable use cases, such as text summarization. As highlighted by MLCommons founder David Kanter, many organizations may not have the necessary computing resources or data to support such large models. Therefore, the inference benchmark is designed to address a wider range of scenarios that organizations can adopt.
While high-end GPU accelerators tend to dominate the MLPerf rankings for both training and inference, it is essential to consider diverse compute options based on specific organizational needs. Intel, for instance, asserts that not all organizations prioritize top-ranking performance numbers. They advocate for deploying AI models using a variety of compute resources suited to individual requirements.
Intel, a prominent player in the AI space, has made significant contributions to the MLPerf Inference 3.1 benchmarks. Their submissions include results for Habana Gaudi accelerators, 4th Gen Intel Xeon Scalable processors, and Intel Xeon CPU Max Series processors. According to Intel, the 4th Gen Intel Xeon Scalable processors excelled in the GPT-J news summarization task, achieving real-time server mode performance by summarizing one paragraph per second.
Diverse Hardware Represented in MLPerf Inference 3.1
The MLPerf Inference 3.1 benchmarks showcase a range of hardware representation from multiple vendors. While Intel emphasizes the value of CPUs for inference, Nvidia’s GPUs also feature prominently in the benchmark results. Nvidia’s GH200 Grace Hopper Superchip, specifically designed for the most demanding workloads, exhibited impressive performance gains of up to 17% compared to their previous GPU submissions. Additionally, Nvidia’s L4 GPUs demonstrated up to 6x performance improvements compared to the best x86 CPUs submitted for MLPerf Inference 3.1.
Driving Market Scalability with MLPerf
The diverse representation of both software and hardware in MLPerf Inference 3.1 indicates the market’s focus on scaling out AI models, rather than solely building them. This emphasis on scalability facilitates AI deployment in production environments across various computing infrastructures. As businesses and enterprises seek to integrate AI into their operations, the inclusion of different compute options in MLPerf benchmarks becomes a crucial factor for success.
MLPerf AI benchmarks, with their expansion to include language models and storage systems, provide valuable insights into the performance of AI systems. The MLPerf Inference 3.1 benchmarks demonstrate ongoing performance improvements among vendors while addressing the evolving landscape of generative AI language models. By considering a wide range of compute options and diverse hardware representation, MLPerf aims to empower organizations to deploy AI models effectively and efficiently.
Leave a Reply