Google has finally rolled out the second generation of Tensor processing units (TPUs) on its cloud platform, albeit in beta.

TPUs are application-specific integrated circuits (ASICs) with Google’s own AI accelerators, designed to speed up machine learning workloads programmed with TensorFlow framework.

The company began using TPUs in its data centers in 2015, and announced the second model last year. At the time, Google stated it was planning on introducing the in-house inferencing chips as part of a cloud service – but it took the company almost a year to follow through.

Sit tight for the TPU pods

The TPU2 offers floating point performance of up to 180 teraflops and 64GB of high bandwidth memory, according to a blog post written by John Barrus, the product manager for Cloud TPUs at Google Cloud, and Zak Stone, the product manager for TensorFlow and Cloud TPUs on the Google Brain team.

The pair claim that the service allows customers to work faster and more independently: “Instead of waiting for a job to schedule on a shared compute cluster, you can have interactive, exclusive access to a network-attached Cloud TPU via a Google Compute Engine VM that you control and can customize.”

Google Compute Platform
– Google

“Rather than waiting days or weeks to train a business-critical ML model, you can train several variants of the same model overnight on a fleet of Cloud TPUs and deploy the most accurate trained model in production the next day.” 

The post adds that using Cloud TPUs presents a low cost alternative to committing ”the capital, time and expertise required to design, install and maintain an on-site machine learning computing cluster with specialized power, cooling, networking and storage requirements.” 

And finally, the authors state that the new TPUs make the process of programming custom ASICs simpler, as they can be programmed with open source models: ”We have open-sourced a set of reference high-performance Cloud TPU model implementations to help you get started right away: ResNet-50 and other popular models for image classification, Transformer for machine translation and language modeling and RetinaNet for object detection.”

The chips can be linked together, creating supercomputer-like networks with up to 64 TPUs, which Google calls TPU pods.

For now, the Cloud TPUs are available on a limited availability basis, priced at $6.50 per TPU per hour, with pods to become available later this year.