Good article, but couple remarks.
> most hardware seemingly just runs workgroups in a serial order
The hardware runs them in parallel, but it’s complicated.
The nVidia GPU I’m currently using has 32-wide SIMD, which means groups of 32 threads run in parallel, exactly in lockstep. Different GPU APIs call such group of threads wavefronts or warps. Each core (my particular GPU has 28 of these) can run 4 of such wavefronts = 128 threads in parallel.
When a shader has more than 128 threads, or when the GPU core is multi-tasking running multiple workgroups of the same or different shaders, different wavefronts will run sequentially. And one more thing, the entire workgroup runs within a single GPU core, even when the shader pushes workgroup size to the limit with 1024 threads per workgroup.
“Sequentially” doesn’t mean the order of execution is fixed, or predefined, or fair. Instead, the GPU is doing rather complicated scheduling trying to hide latency of computations and memory transactions. While some wavefront is waiting for data to arrive from memory, instead of sleeping the GPU will typically switch to another active wavefront. Many modern CPUs do that too because hyperthreading, but CPUs only have 2 threads per core, they are visible to OS as two distinct virtual cores. For GPUs the number is way higher, only limited by amount of in-core memory, and amount of that memory required by the running shaders.
> as the difference between running a shader with @workgroup_size(64) or @workgroup_size(8, 8) is negligible. So this concept is considered somewhat legacy.
I think it’s convenience, not legacy. When a shader handles 2D data like a matrix or an image, it’s natural to have 2D workgroup sizes like 8x8. Similarly, when a shader processes 3D data like a field defined on elements or nodes of 3D Cartesian grid, it can be slightly easier to write compute shaders with workgroups of 4x4x4 or 8x8x8 threads.
Following the article, you build a simple 2D physic simulation (only for balls). Did by chance anyone expand on that to include boxes, or know of a different approach to build a physic engine in WebGPU?
I experiemented a bit with it and implemented raycasting, but it is really not trivial getting the data in and out. (Limiting it to boxes and circles would satisfy my use case and seems doable, but getting polygons would be very hard, as then you have a dynamic size of their edges to account for and that gives me headache)
A 3D physic engine on the GPU would be the obvious dream goal to get maximum performance for more advanced stuff, but that is really not an easy thing to do.
Right now I am using a Box2D for wasm and it has good performance, but it could be better.
https://github.com/Birch-san/box2d-wasm
The main problem with all this is the overhead of getting data into the gpu and back. Once it is on the gpu it is amazingly fast. But the back and forth can really make your framerates drop - so to make it worth it, most of the simulation data has to remain on the gpu and you only put small chanks of data that have changed in and out. And ideally render it all on the gpu in the next step.
(The performance bottleneck of this simulation is exactly that, it gets simulated on the gpu, then retrieved and drawn with the normal canvasAPI which is slow)
My disappointment with WebGPU has been limited data type support. I wanted to write some compute stuff with it, but the limitation of not supporting a lot of integer sizes made it undesirable.
Does anyone know if the spec is likely to be revised to add more support over time?
Will there be better typography in WEbGPU?
can't wait to see what exciting new exploits are in store for us with this
> The most popular of the next-gen GPU APIs are Vulkan by the Khronos Group, Metal by Apple and DirectX 12 by Microsoft. ... (WebGPU) introduces its own abstractions and doesn’t directly mirror any of these native APIs.
Huh. I was wondering about that.
Until now I just figured every "Web*" thing was browsers exposing (to JS alone) something that they already compiled in:
- WebRTC is ffmpeg
- Canvas is Skia
- WebGL is ANGLE
- WebCodecs is also ffmpeg
- WebTransport is QUIC
- WebSockets are TCP
I might be wrong on some of those.
I'm a graphics programmer who has quite a bit of experience with WebGL, and (disclaimer) I've also contributed to the WebGPU spec.
> Quite honestly, I have no idea how ThreeJS manages to be so robust, but it does manage somehow.
> To be clear, me not being able to internalize WebGL is probably a shortcoming of my own. People smarter than me have been able to build amazing stuff with WebGL (and OpenGL outside the web), but it just never really clicked for me.
WebGL (and OpenGL) are awful APIs that can give you a very backwards impression about how to use them, and are very state-sensitive. It is not your fault for getting stuck here. Basically one of the first things everybody does is build a sane layer on top of OpenGL; if you are using gl.enable(gl.BLEND) in your core render loop, you have basically already failed.
The first thing everybody does when they start working with WebGL is basically build a little helper on top that makes it easier to control its state logic and do draws all in one go. You can find this helper in three.js here: https://github.com/mrdoob/three.js/blob/master/src/renderers...
> Luckily, accessing an array is safe-guarded by an implicit clamp, so every write past the end of the array will end up writing to the last element of the array
This article might be a bit out of date (mind putting a publish date on these articles?), but these days, the language has been a bit relaxed. From https://gpuweb.github.io/gpuweb/#security-shader :
> If the shader attempts to write data outside of physical resource bounds, the implementation is allowed to:
> * write the value to a different location within the resource bounds
> * discard the write operation
> * partially discard the draw or dispatch call
The rest seems accurate.