First In-Depth Look at Google’s New Second-Generation TPU

It was only just last month that we spoke with Google distinguished hardware engineer, Norman Jouppi, in depth about the tensor processing unit used internally at the search giant to accelerate deep learning inference, but that device—that first TPU—is already appearing rather out of fashion.

This morning at the Google’s I/O event, the company stole Nvidia’s recent Volta GPU thunder by releasing details about its second-generation tensor processing unit (TPU), which will manage both training and inference in a rather staggering 180 teraflops system board, complete with custom network to lash several together into “TPU pods” that can deliver Top 500-class supercomputing might at up to 11.5 petaflops of peak performance.

“We have a talented ASIC design tea that worked on the first generation TPU and many of the same people were involved in this. The second generation is more of a design of an entire system versus the first, which was a smaller thing because we were just running inference on a single chip. The training process is much more demanding, we need to think holistically about not just the underlying devices, but how they are connected into larger systems like the Pods,” Dean explains.

We will follow up with Google to understand this custom network architecture but below is what were able to glean from the first high-level pre-briefs offered on the newest TPU and how it racks and stacks to get that supercomputer-class performance. Google did not provide the specifications of the TPU2 chip or its motherboard, but here is the only image out there that we can start doing some backwards math with.


full post: