Qualcomm Introduces Inference Chip for Cloud Data Centers

April 10, 2019
Qualcomm Introduces Inference Chip for Cloud Data Centers

Qualcomm, the world's largest mobile chip designer, plans to start selling a family of server chips to accelerate artificial intelligence processing in data centers operated by cloud computing companies including Amazon, Microsoft and Google. The company's ASIC is specifically designed to handle inferencing, where algorithms trained on large sets of sample data are used to tell the difference between objects or understand voices.

The Cloud AI 100 family is designed to deliver 350 trillion operations per second (TOPS), which results in 10 times the performance per watt of GPUs and FPGAs used for inferencing, according to Qualcomm. The chip also functions 50 times faster than its latest smartphone chip, the Snapdragon 845. The 7-nanometer accelerator is set to start shipping in the second half of 2019, with production scheduled for 2020.

Qualcomm is trying to translate the energy efficiency of its Snapdragon chips into the data center, said SVP of product management, Keith Kressin. More than one billion client devices today take advantage of the neural processing unit (NPU) inside its latest Snapdragon chips, which combines the Kyro CPU, Adreno GPU and Hexagon DSP. The NPU can handle facial recognition, natural language processing and other tasks.

This is the company's latest attempt to undercut rivals Intel and Nvidia in the data center space. Last year, Qualcomm started to disassemble a unit designing ARM-based server CPUs to compete against Intel in cloud data centers. Qualcomm cut hundreds of employees from the division led by former SVP Anand Chandrasekhar. And the company discontinued its homegrown line of 10-nanometer server chips, called Centriq.

While many companies, including Qualcomm, are trying to take inferencing out of the cloud and handle it inside smartphones, factory equipment and cars, the potential payoff in data centers is still significant. Qualcomm estimates that sales of server chips targeting inferencing will increase to $17 billion by 2025. Many industry analysts say that sales of server chips used for inferencing are going to grow faster than the training market.

Intel, which holds over 95 percent market share in server chips, is currently the dominant player in inferencing. Intel has moved to protect its early lead, enhancing its Cascade Lake line of Xeon processors to handle inferencing more efficiently than general-purpose graphics accelerators. Nvidia GPUs are the currnet standard for training neural networks, the building blocks of deep learning, a type of artificial intelligence. 

Intel also designed its latest line of FPGAs, Agilex, to handle artificial intelligence processing with significantly less latency than CPUs and GPUs. The chips also give customers the ability to reprogram the processor as the fundamental algorithms of artificial intelligence change. The 10-nanometer Agilex FPGAs aim to compete against Xilinx's latest line of FPGAs, Versal, which is also designed to run inferencing in data centers. 

Nvidia is also trying to take over more of the inferencing space. Last year, the company released the latest generation of its TensorRT software, which can be used to significantly speed up inferencing running on top of  Nvidia's GPUs. Nvidia also announced an accelerator for inferencing in data centers based on its Turing architecture. The 12-nanometer GPU delivers up to 260 TOPS with power consumption as low as 70W.

Qualcomm also has to contend with custom chip startups including Mythic, ThinCi and Habana. Some of the semciondcutor industry's largest customers, including Amazon, Microsoft and Google - the three biggest players in the cloud computing, respectively - are building chips for inferencing, which could cut into Qualcomm's potential sales. Google, for instance, is on the third generation of its tensor processing unit (TPU).

Qualcomm said that its latest line of accelerator can hold its own. The Cloud AI 100 family "will significantly raise the bar for AI inference processing relative to any combination of CPUs, GPUs, and/or FPGAs used in today’s data centers," Kressin said in a statement. The company said that the chip family would support some of the most popular software stacks, including Glow, PyTorch, TensorFlow, Keras, and ONNX.

Voice your opinion!

To join the conversation, and become an exclusive member of Supply Chain Connect, create an account today!