The Alder Lake anomaly, explained

by bo0tzzon 1/4/25, 7:48 PMwith 9 comments
by pbsdon 1/4/25, 9:04 PM

While the observation has previously focused on latency it also affects throughput: whereas you could run 2 independent shifts per cycle before, each shift going to either p0 or p6, this anomaly lowers this to a single shift per cycle.

Besides the shifts and rotations BZHI and BEXTR are also affected. While they are not a shift per se, it makes sense that it would be implemented with the same circuitry (e.g., BZHI is dst = arg1 & ~(-1 << arg2)). BEXTR in particular goes from 2 to 6 cycle latency!

Another thing I'm noticing is that the affected instructions are all p06 shift ops. You can alternatively implement rotation using SHLD r,r,c but this is a p1 operation and I have not seen any slowdown from this issue.

by ChrisArchitecton 1/4/25, 7:49 PM

Related:

The Alder Lake SHLX Anomaly

https://news.ycombinator.com/item?id=42579969