Blackwell: Nvidia's GPU

by pellaon 6/29/25, 12:40 AMwith 31 comments
by ggreg84on 6/29/25, 10:59 AM

Chips and Cheese GPU analysis are pretty detailed, but they need to be taken with a huge grain of salt because the results only really apply to OpenCL and nobody buying NVIDIA or AMD GPUs for Compute runs OpenCL on them; its either CUDA or HIP, which differ widely in parts of their compilation stack.

After reading the entire analysis, I'm left wondering, what observations in this analysis - if any - actually apply to CUDA?

by CalChrison 6/29/25, 6:28 AM

The Nvidia technical brief says 208 billion transistors.

https://resources.nvidia.com/en-us-blackwell-architecture

Blackwell uses the TSMC 4NP process. It has two layers. A very back of the envelope estimate:

  750mm^2 / (208/2) * 10^9 = 7211 nm^2
  85 nm x 85 nm
NB: process feature size does not equal transistor size. Process feature size doesn't even equal process feature size.

by ksecon 6/29/25, 5:10 AM

I heard there is still trouble to buy consumer grade Nvidia GPU. At this point I am wondering if it is Gaming market demand, AI, or simply a supply issue.

On another note I am waiting for Nvidia's entry to CPU. At some point down the line I expect the CPU will be less important, ( relatively speaking ) and Nvidia could afford to throw a CPU in the system as bonus. Especially when we are expecting ARM X930 to rival Apple's M4 in terms of IPC. CPU design has become somewhat of a commodity.

by Aissenon 6/29/25, 12:34 PM

Does the comparison even makes sense, considering there's (more than) an order of magnitude difference in price between the AMD's Desktop GPU and NVIDIA's Workstation accelerator?

by dist-epochon 6/29/25, 6:34 AM

Why doesn't NVIDIA also build something like Google TPU, a systolic array processor? Less programmable, but more throughput/power efficiency?

It seems there is a huge market for inference.