The Nvidia technical brief says 208 billion transistors.
https://resources.nvidia.com/en-us-blackwell-architecture
Blackwell uses the TSMC 4NP process. It has two layers. A very back of the envelope estimate:
750mm^2 / (208/2) * 10^9 = 7211 nm^2
85 nm x 85 nm
NB: process feature size does not equal transistor size. Process feature size doesn't even equal process feature size.I heard there is still trouble to buy consumer grade Nvidia GPU. At this point I am wondering if it is Gaming market demand, AI, or simply a supply issue.
On another note I am waiting for Nvidia's entry to CPU. At some point down the line I expect the CPU will be less important, ( relatively speaking ) and Nvidia could afford to throw a CPU in the system as bonus. Especially when we are expecting ARM X930 to rival Apple's M4 in terms of IPC. CPU design has become somewhat of a commodity.
Does the comparison even makes sense, considering there's (more than) an order of magnitude difference in price between the AMD's Desktop GPU and NVIDIA's Workstation accelerator?
Why doesn't NVIDIA also build something like Google TPU, a systolic array processor? Less programmable, but more throughput/power efficiency?
It seems there is a huge market for inference.
Chips and Cheese GPU analysis are pretty detailed, but they need to be taken with a huge grain of salt because the results only really apply to OpenCL and nobody buying NVIDIA or AMD GPUs for Compute runs OpenCL on them; its either CUDA or HIP, which differ widely in parts of their compilation stack.
After reading the entire analysis, I'm left wondering, what observations in this analysis - if any - actually apply to CUDA?