A high-throughput parser for the Zig programming language

by jedisct1on 4/16/25, 2:11 PMwith 16 comments
by ww520on 4/16/25, 9:06 PM

This is very cool. Extremely fast lexical tokenizer is the basis for a fast compiler. Zig has good integration and support for SIMD operations that's perfect for this kind of things. It's definitely doable. I did a proof of concept on using SIMD to operate on 32-byte chunk to parse identifiers a while back.

https://github.com/williamw520/misc_zig/blob/main/identifier...

by dreamoffireon 4/16/25, 5:05 PM

The talks that Niles gave at the Utah Zig meetups (linked in the repo) were great, just wished the AV setup was a little smoother. There seemed like there some really neat visualizations that Niles prepared that flopped. Either way, I recommend it. Inspired me to read a lot more machine code these days.

by neerajsion 4/16/25, 4:43 PM

Very interesting project!

I wonder if there's a way to make this set of techniques less brittle and more applicable to any language. I guess you're looking at a new backend or some enhancements to one of the parser generator tools.

by matu3baon 4/16/25, 5:47 PM

Would be very cool, if once finished, the techniques are applied to user-schedulable languages https://www.hytradboi.com/2025/7d2e91c8-aced-415d-b993-f6f85....

I guess they are too tailored to the actual memory layout with respective memory access delay of the architecture, but I would like to be shown that I am wrong and it is feasible.

by asdfman123on 4/16/25, 11:48 PM

This really moves Zig