At the GPU Technology Conference in China, Nvidia unveiled the P40 and P4 GPU accelerators that aim to speed up research in fields like artificial intelligence and neural networks.

The P40 and P4 will be released in October and November respectively.

– Nvidia

A matter of inference

The accelerators are specifically designed for inferencing, with Marc Hamilton, Nvidia’s VP of solutions architecture and engineering, explaining in a press call: “The way to think about this is: a typical data center running deep learning applications has two different halves.

“It really starts by collecting all sorts of data, about users, about devices connected to the network, and these are all fed into a very large training system, where that data is analyzed by a deep neural network and the network parameters are modeled or trained to recognize new images or speech patterns or data patterns that it hasn’t actually seen before.

“That part of recognizing individual patterns is called inferencing, and as you inference data you then generate more and that gets fed back into the training system.”

The specs, according to Nvidia, are:

SpecificationTesla P4Tesla P40
Single Precision FLOPS 5.5 12
INT8 TOPS (Tera-Operations Per Second) 22 47
CUDA Cores 2,560 3,840
GPU GDDR5 Memory 8GB 24GB
Memory Bandwidth 192GB/s 346GB/s
Power 50 Watt (or higher) 250 Watt

Ian Buck, GM of accelerated computing at Nvidia, said: “With the Tesla P100 and now Tesla P4 and P40, Nvidia offers the only end-to-end deep learning platform for the data center, unlocking the enormous power of AI for a broad range of industries.

“They slash training time from days to hours. They enable insight to be extracted instantly. And they produce real-time responses for consumers from AI-powered services.”

Greg Diamos, senior researcher at Baidu, added: “Delivering simple and responsive experiences to each of our users is very important to us.

“We have deployed NVIDIA GPUs in production to provide AI-powered services such as our Deep Speech 2 system and the use of GPUs enables a level of responsiveness that would not be possible on un-accelerated servers. Pascal with its INT8 capabilities will provide an even bigger leap forward and we look forward to delivering even better experiences to our users.”

Speaking about the P4, Hamilton said: “This is the best accelerator in the world for inferencing from an efficiency perspective, the number of inferences per watt that it can deliver… and it’s designed to fit in literally any server in the world.”

“The P40 is designed for multi-GPU scale up types of servers,” he said.