Servers & Systems: The Right Compute
1819873 Members
2735 Online
109607 Solutions
New Article ๎ฅ‚
ComputeExperts

Qualcomm unveils AI inference for large language models

HPE is collaborating with Qualcomm on the integration of Qualcomm Cloud AI 100 Ultra accelerators in HPE ProLiant Gen11 Servers to deliver exceptional performance for generative AI/LLM inference solutions.

HPE-Qualcomm-AI-ProLiant.pngAt SC23 Qualcomm announced the Qualcomm Cloud AI 100 Ultra acceleratorโ€”designed for deploying LLMsโ€”to their Cloud AI 100 product lineup. In support of this announcement, HPE is collaborating with Qualcomm on integration testing and delivery of Qualcomm Cloud AI 100 Ultra with select HPE ProLiant Gen11 Servers.

This blog delves into how this offering benefits our customers, its remarkable performance capabilities, and who the ideal customers are for this next-gen AI Inference solution.

Introducing Qualcomm Cloud AI 100 Ultra

The Qualcomm Cloud AI 100 Ultra is an advanced AI accelerator designed to deliver exceptional performance and efficiency for Generative AI and LLMs. It's built to address the skyrocketing demand for AI Inference workloads for both enterprise and cloud-service provider customers.

The Qualcomm Cloud AI 100 ULTRA is optimized for a range of AI workloads, from large language models (LLMs), natural language processing (NLP) and computer vision. It's capable of supporting 100B parameter models on a single slot, 150W PCIe card. Larger models are supported with multi-card software stack from Qualcomm.

Performance for deploying LLMs

The Qualcomm Cloud AI 100 Ultra delivers exceptional performance. The accelerator boasts impressive throughput and low-latency AI processing capabilities, making it ideal for time-sensitive AI applications.  Its peak performance capabilities and support for LLMs are particularly noteworthy, with significant inferences per second (ips) comparable with AI Inference accelerators from leading GPU vendors. This computational power makes it well-suited for applications that require real-time decision-making, such as text-to-code, chatbots, as text-to-language translation.

Qualcomm Cloud AI 100 Ultra is also energy efficient; with a single-width PCIe design operating at a mere 150W TDPโ€”itโ€™s able to perform at levels similar to AI accelerators operating at twice the Wattage (and price). Thus, the Qualcomm Cloud AI 100 deliver industry-leading AI inference per watt, significantly reducing the total cost of ownership for data centers and cloud service providers.   As a result Qualcomm Cloud AI 100 Ultra delivers performance up to 4X of the Qualcomm Cloud AI 100 Standard and Pro models.

The Qualcomm Cloud AI 100 Ultra accelerator also supports leading industry-standard frameworks (eg. PyTorch, ONNX, TensorFlow) and tools, ensuring compatibility with existing AI software ecosystems. This makes the transition to the Qualcomm Cloud AI 100 Ultra smooth for businesses already invested in AI technologies.

Ideal customer

Customers for this AI Inference solution are deploying LLMs, NLP and CV models and require high-performance and energy efficiency.  Example customers and industries include:

Cloud Service Providers: Qualcomm Cloud AI 100 Ultra provides an AI inference solution to a wide range of clients, from e-commerce platforms to content streaming services.

Data Centers: Data centers housing massive amounts of data will appreciate the performance and energy efficiency of this AI accelerator. It enables data centers to manage workloads more efficiently and cost-effectively, reducing the environmental impact.

AI Researchers and Developers: For those pushing the boundaries of AI research and development, the Qualcomm Cloud AI 100 Ultra with HPE Servers will offer industry-leading performance for experimentation and innovation. It can accelerate the development of new AI applications and algorithms.

Product availability

Qualcomm Cloud AI 100 Ultra will be offered with select HPE ProLiant Gen11 Servers that can fit 8 x Qualcomm Ultra accelerator card in a single 2U server form factor.  Expect HPE integrated product and pricing to be available in H1-2024.

Read the press release. 



Server Experts
Hewlett Packard Enterprise

twitter.com/HPE_HPC
linkedin.com/showcase/hpe-servers-and-systems/
hpe.com/servers

 

0 Kudos
About the Author

ComputeExperts

Our team of Hewlett Packard Enterprise server experts helps you to dive deep into relevant infrastructure topics.