NVIDIA's GeForce GTX 560 Ti: Upsetting The $250 Market
by Ryan Smith on January 25, 2011 9:00 AM ESTThe GF104/GF110 Refresher: Different Architecture & Different Transistors
For all practical purposes GF100 is the Fermi base design, but for sub high-end cards in particular NVIDIA has made a number of changes since we first saw the Fermi architecture a year and a half ago. For those of you reading this article who don’t regularly keep up with the latest NVIDIA hardware releases, we’re going to quickly recap what makes GF114 and GTX 560 Ti different from both the original GF100/GF110 Fermi architecture, and in turn what makes GF114 different from GF104 through NVIDIA’s transistor optimizations. If you’re already familiar with this, please feel free to skip ahead.
With that said, let’s start with architecture. The GF100/GF110 design is ultimately the compute and graphics monster that NVIDIA meant for Fermi to be. It has fantastic graphical performance, but it also extremely solid GPU computing performance in the right scenarios, which is why GF100/GF110 is the backbone of not just NVIDIA’s high-end video cards, but their Tesla line of GPU computing cards.
But Fermi’s compute characteristics only make complete sense at the high-end, as large institutions utilizing GPU computing have no need for weaker GPUs in their servers, and in the meantime home users don’t need features like ECC or full speed FP64 (at least not at this time) so much as they need a more reasonably priced graphics card. As a result only the high-end GF100/GF110 GPUs feature Fermi’s base design, meanwhile GF104 and later use a tweaked design that stripped away some aspects of Fermi’s GPU compute design while leaving much of the graphics hardware intact.
NVIDIA GF104 SM
With GF104 we saw the first GPU released using NVIDIA’s streamlined Fermi architecture that forms the basis of GF104/GF106/GF108/GF114, and we saw a number of firsts from the company. Chief among these was the use of a superscalar architecture, the first time we’ve seen such a design in an NVIDIA part. Superscalar execution allows NVIDIA to take advantage of Instruction Level Parallelism (ILP) – executing the next instruction in a thread when it doesn’t rely on the previous instruction – something they haven’t done previously. It makes this streamlined design notably different from the GF100/GF110 design. And ultimately this design is more efficient than GF100/GF110 on average, while having a wider range of best and worst case scenarios than GF100/GF110, a tradeoff that doesn’t necessarily make sense for GPU computing purposes but does for mainstream graphics.
Meanwhile in terms of low-level design, starting with GF110 NVIDIA began revising the low-level design of their GPUs for production purposes. NVIDIA’s choice of transistors with GF10x was suboptimal, and as a result they used leaky transistors in functional units and parts thereof where they didn’t want them, limiting the number of functional units they could utilize and the overall performance they could achieve in the power envelopes they were targeting.
For GF110 NVIDIA focused on better matching the types of transistors they used with what a block needed, allowing them to reduce leakage on parts of the chip that didn’t require such fast & leaky transistors. This meant not only replacing fast leaky transistors with slower, less leaky transistors in parts of the chip that didn’t require such fast transistors, but also introducing a 3rd mid-grade transistor that could bridge the gap between fast/slow transistors. With 3 speed grades of transistors, NVIDIA was able to get away with only using the leakiest transistors where they needed to, and could conserve power elsewhere.
A typical CMOS transitor: Thin gate dielectrics lead to leakage
GF110 wasn’t the only chip to see this kind of optimization however, and the rest of the GF11x line is getting the same treatment. GF114 is in a particularly interesting position since as a smaller GPU, its predecessor GF104 wasn’t as badly affected. Though we can’t speak with respect to enabling additional functional units, at the clockspeeds and voltages NVIDIA was targeting we did not have any issues with the stock voltage. In short while GF100 suffered notably from leakage, GF104 either didn’t suffer from it or did a good job of hiding it. For this reason GF114 doesn’t necessarily stand to gain the same benefit.
As we touched on in our introduction, NVIDIA is putting their gains here in to performance rather than power consumption. The official TDP is going up 10W, while performance is going up anywhere between 10% and 40%. This is the only difference compared to GF104, as GF114 does not contain any architectural changes (GF110’s changes were backported from GF104). Everything we see today will be a result of a better built chip.
87 Comments
View All Comments
surt - Tuesday, January 25, 2011 - link
It doesn't work like that. Computer games generally have no more than 3 frames rendered at a time. The remainder is pre-render textures, models, etc ... everything that might be needed to render a frame. They need to have everything handy that might be needed to make a picture of your view-point. Any enemy that might step into view. The mountains behind you in case you turn around, etc. If it isn't already stored on the card, they have to go get it from disk, which is comparatively extremely slow (thousands of times slower than anything that is already in memory).ATOmega - Tuesday, January 25, 2011 - link
Okay, so it's released today. Where are they? I can't find one at local stores and they don't even know when they're getting them in.MrJim - Tuesday, January 25, 2011 - link
Full bitstreaming audio capabilities? Thinking about this card and Adobe Premiere CUDA-hack :)heflys - Tuesday, January 25, 2011 - link
The 560 is only faster when you feature games that show a bias towards Nvidia products. Heck, in some of those tests, it was beating a 6970!Lolimaster - Tuesday, January 25, 2011 - link
And know you can adjust that absurd tessellation levels(that no one notices) with 11.1a hotfix.560 is unimpressive, without much effort 6870 1GB / 6950 1GB are taking the bang for buck crown. Specially 6870 at near $200.
If you want futureproof don't look back, 6950 2GB all the way.
cknobman - Tuesday, January 25, 2011 - link
thats exactly what I was thinking.heflys - Tuesday, January 25, 2011 - link
I just don't see how a game like HAWX, with its history, can be a indicator of the 560 being faster.Sufo - Wednesday, January 26, 2011 - link
Yeah, i have to agree here. In the the two games generally considered the most taxing of modern systems (crysis and metro) the 6950 comes out on top (by like 15% at that) - i think it's a mistake to say:"The GTX 560 Ti ultimately has the edge: it’s a bit faster and it’s quieter than the 6950, and if that’s all you care about then there’s the answer you seek."
I haven't totted it up but even if the 560 over all benches has a few frames over the 6950 (tho tbh, it doesn't even look that way :/) the fact that it loses in games like this means people buying it on that recommendation, thinking it's the faster card, will be disappointed.
murray13 - Tuesday, January 25, 2011 - link
I just don't get your conclusion. You say that the 560 is faster than the 6950 1g but looking at the graphs it's a draw. It's faster is some games the AMD is faster in others.I originally just looked at the games I play and when I did that the 6950 won all of them. It's a good thing I went back and looked at all of them before I posted. I was ready to really give it to ya. But as it is I just think these two cards are about as evenly matched both in performance and price as I've seen from the two camps in a long time.
Running a 8800GTX I'm about due for a gfx card upgrade. With what is out right now the 6970 is about the best bang for my buck.
Ever since Anand stopped doing the vid card reviews (yeah it's been a while) I haven't exactly agreed with the conclusions being given. Everyone's entitled to theirs I guess.
vol7ron - Tuesday, January 25, 2011 - link
I'm still on the E6600 C2D with a 9800GTX - the next upgrade is going to be crucial. It all hinges on that Z68 chipset.We've been looking at i7-2G for our workstations as well. I don't think we're considering the Z68 for that, since it's not necessary and the P67 is a good capable board.
All-in-all; personal upgrade is still on schedule (sometime around 2011Q2-2012Q2), hinging on Z68 and SSD releases.