NVIDIA Teases Xavier, a High-Performance ARM SoC for Drive PX & AI
by Ryan Smith on September 28, 2016 7:45 AM ESTEver since NVIDIA bowed out of the highly competitive (and high pressure) market for mobile ARM SoCs, there has been quite a bit of speculation over what would happen with NVIDIA’s SoC business. With the company enjoying a good degree of success with projects like the Drive system and Jetson, signs have pointed towards NVIDIA continuing their SoC efforts. But in what direction they would go remained a mystery, as the public roadmap ended with the current-generation Parker SoC. However we finally have an answer to that, and the answer is Xavier.
At NVIDIA’s GTC Europe 2016 conference this morning, the company has teased just a bit of information on the next generation Tegra SoC, which the company is calling Xavier (ed: in keeping with comic book codenames, this is Professor Xavier of the X-Men). Details on the chip are light – the chip won’t even sample until over a year from now – but NVIDIA has laid out just enough information to make it clear that the Tegra group has left mobile behind for good, and now the company is focused on high performance SoCs for cars and other devices further up the power/performance spectrum.
NVIDIA ARM SoCs | |||
Xavier | Parker | Erista (Tegra X1) | |
CPU | 8x NVIDIA Custom ARM | 2x NVIDIA Denver + 4x ARM Cortex-A57 |
4x ARM Cortex-A57 + 4x ARM Cortex-A53 |
GPU | Volta, 512 CUDA Cores | Pascal, 256 CUDA Cores | Maxwell, 256 CUDA Cores |
Memory | ? | LPDDR4, 128-bit Bus | LPDDR3, 64-bit Bus |
Video Processing | 7680x4320 Encode & Decode | 3840x2160p60 Decode 3840x2160p60 Encode |
3840x2160p60 Decode 3840x2160p30 Encode |
Transistors | 7B | ? | ? |
Manufacturing Process | TSMC 16nm FinFET+ | TSMC 16nm FinFET+ | TSMC 20nm Planar |
So what’s Xavier? In a nutshell, it’s the next generation of Tegra, done bigger and badder. NVIDIA is essentially aiming to capture much of the complete Drive PX 2 system’s computational power (2x SoC + 2x dGPU) on a single SoC. This SoC will have 7 billion transistors – about as many as a GP104 GPU – and will be built on TSMC’s 16nm FinFET+ process. (To put this in perspective, at GP104-like transistor density, we'd be looking at an SoC nearly 300mm2 big)
Under the hood NVIDIA has revealed just a bit of information of what to expect. The CPU will be composed of 8 custom ARM cores. The name “Denver” wasn’t used in this presentation, so at this point it’s anyone’s guess whether this is Denver 3 or another new design altogether. Meanwhile on the GPU side, we’ll be looking at a Volta-generation design with 512 CUDA Cores. Unfortunately we don’t know anything substantial about Volta at this time; the architecture was bumped further down NVIDIA’s previous roadmaps for Pascal, and as Pascal just launched in the last few months, NVIDIA hasn’t said anything further about it.
Meanwhile NVIDIA’s performance expectations for Xavier are significant. As mentioned before, the company wants to condense much of Drive PX 2 into a single chip. With Xavier, NVIDIA wants to get to 20 Deep Learning Tera-Ops (DL TOPS), which is a metric for measuring 8-bit Integer operations. 20 DL TOPS happens to be what Drive PX 2 can hit, and about 43% of what NVIDIA’s flagship Tesla P40 can offer in a 250W card. And perhaps more surprising still, NVIDIA wants to do this all at 20W, or 1 DL TOPS-per-watt, which is one-quarter of the power consumption of Drive PX 2, a lofty goal given that this is based on the same 16nm process as Pascal and all of the Drive PX 2’s various processors.
NVIDIA’s envisioned application for Xavier, as you might expect, is focused on further ramping up their automotive business. They are pitching Xavier as an “AI Supercomputer” in relation to its planned high INT8 performance, which in turn is a key component of fast neural network inferencing. What NVIDIA is essentially proposing then is a beast of an inference processor, one that unlike their Tesla discrete GPUs can function on a stand-alone basis. Coupled with this will be some new computer vision hardware to feed Xavier, including a pair of 8K video processors and what NVIDIA is calling a “new computer vision accelerator.”
Wrapping things up, as we mentioned before, Xavier is a far future product for NVIDIA. While the company is teasing it today, the SoC won’t begin sampling until Q4 of 2017, and that in turn implies that volume shipments won’t even be until 2018. But with that said, with their new focus on the automotive market, NVIDIA has shifted from an industry of agile competitors and cut-throat competition, to one where their customers would like as much of a heads up as possible. So these kinds of early announcements are likely going to become par for the course for NVIDIA.
35 Comments
View All Comments
Yojimbo - Wednesday, September 28, 2016 - link
I dunno if you can assume they won't be introducing further Shield products or even won't have their chips in tablets. This advanced reveal is targeted at a specific segment, one with a lot of forward-looking reveals from their competition (Intel/Movidius, NXP, CEVA). The segment is very young and these companies are all competing to convince developers to use their platforms.It's entirely possible that NVIDIA would release 2 Tegra based SKUs: one high power SKU geared towards Drive PX and Jetson and one low power SKU geared towards consumer electronics products, with the computer vision accelerator and various other blocks stripped out of the design of the latter. Although the fact they are calling this "Xavier" seems to suggest it's taken over the entire Tegra line and so we must not see any more consumer electronics Tegras, I don't think we can be completely sure what "Parker" and "Xavier" really mean to NVIDIA or if they'd switch a consumer electronics version of the technology to a different code name scheme.
SquarePeg - Wednesday, September 28, 2016 - link
Agreed. There's a huge hole in the Chromebook processor landscape. You have Intel Atom based Celerons and Rockchip at the bottom and then from there you jump up to more expensive Intel Broadwell Celerons and i3's with a TDP of 15 watts that require a fan. Nvidia needs a SOC to fill this gap. Something like quad core A73's at 3ghz with a 256 core Pascal GPU and a TDP of 3.5 to 4 watts. This would be great for Chromebooks and tablets and slot in-between the already existing options on the market. Just using stock A73's with their own GPU would make it much quicker and cheaper to bring to market and they would have that "tweener" space to themselves.milli - Wednesday, September 28, 2016 - link
"Something like quad core A73's at 3ghz with a 256 core Pascal GPU and a TDP of 3.5 to 4 watts."Well that's just not possible on 16nm. Maybe in the future on 10 or 7nm but on 16nm that would result in a 10w SOC (if not more).
Yojimbo - Wednesday, September 28, 2016 - link
If a 2560 Pascal core Tesla P4 can operate on a 50W TDP by clocking low enough why can't a mobile SOC with 256 Pascal cores have a TDP under 10 Watts? Maybe not 3.5 to 4, but somewhere from 5 to 10.SquarePeg - Wednesday, September 28, 2016 - link
It would be my assumption that a hypothetical SOC like the one I would like to see would be built on 10nm. TSMC already produced quad core A73 test chips at 10nm for ARM back in May. Tegra X1 was a 10 watt SOC and was built on TSMC 20nm (Planar?). It is my thinking that much more efficient A73's plus Pascal @ 10nm would be possible in the 3.5 to 4 watt TDP range.Yojimbo - Wednesday, September 28, 2016 - link
10nm will ramp up for volume production in 2017. Apple, Samsung, and Qualcomm will buy all the early capacity. NVIDIA has a history of not producing on a new node, anyway. Therefore an NVIDIA 10nm chip wouldn't arrive until 2018. By that time a Volta GPU would make more sense, even if it meant waiting another quarter.Yojimbo - Wednesday, September 28, 2016 - link
Besides, if NVIDIA isn't planning on putting this Xavier chip on 10nm, I doubt they would put a hypothetical consumer electronic chip on the node, as such a chip would be much less important to them.Ktracho - Wednesday, September 28, 2016 - link
It could be a decision similar to IBM's in the 2005 time frame, where they decided overall it was better for their business (and shareholders) to pursue development of higher power CPUs (which eventually evolved to today's POWER 8), rather than lower power CPUs appropriate for laptops, for example. While Apple made prototypes of laptops with IBM's G5, which was not easy, they ultimately decided to switch to Intel CPUs so they could make their computers smaller.TheinsanegamerN - Thursday, October 6, 2016 - link
unfortunately that leaves us with 0 good, powerful tablets. Samsung seems content to put weak GPUs in their newest TAB models, and nobody else will use anything other then bottom barrel mid range chips, and dont dare to make a product over $150. Nobody wants to make a 10 inch $400 tablet with a snapdragon 820 and microSD support it seems.i'm glad I grabbed a shield tablet when I did. Looks like there will not be a good non $500 pixel tablet for quite some time (and the pixel tablet doesnt have microSD, so its a bit of a non starter).
name99 - Wednesday, September 28, 2016 - link
There are two ways designers seem able to track the path they need to follow: you can start at high performance and then try to maintain that while reducing energy every iteration. That's basically been the Intel path. Or you can start at low energy and try to maintain that while improving performance (the Apple/ARM path).Maybe we don't have enough data to draw strong conclusions, but it is notable that Apple and ARM have done OK following their track, while Intel and nV have not. Both have managed to stand still, to protect their existing markets, but they have not managed to grow the market substantially in the way that Apple and ARM have.
Trying to grow downward seems fundamentally more problematic than trying to grow upward. I don't know if that's because of business psychology (management is scared that cheaper chips will steal high end sales, so they cripple the chips so much as to be useless), or technological (it's just a more complicated problem to strip the energy out of a complicated fast design, than to add performance to a low-energy design [being very careful to make sure that everything you add does not add extra energy costs]).