For what it's worth, it appears the paper "Everything we know about CRC but afraid to forget" was originally published as part of the release of crcutil on Google Code[1]. This is a hg repository with one commit that includes the paper, the source of the paper, and an implementation.
FYI, I forked and improved [1] a Rust implementation that supports both table- and SIMD-accelerated CRC-64/NVME [2] calculations. The SIMD-accelerated (x86/x86_64 and aarch64) version delivers 10X over the table (16-bytes at a time) implementation.
The original implementation [3] did the same thing but for CRC-64/XZ [4].
[1]: https://github.com/awesomized/crc64fast-nvme
[2]: https://reveng.sourceforge.io/crc-catalogue/all.htm#crc.cat....
[3]: https://github.com/tikv/crc64fast
[4]: https://reveng.sourceforge.io/crc-catalogue/all.htm#crc.cat....
https://github.com/py2many/py2many/pull/653
Transpiling the python version in the blog to mojo gives me a 4x speedup.
Had to hand edit a couple of things:
- List to bytearray conversion is not working yet
- Can't iterate over SIMD. So a "for x in list" loop has to be rewritten as a range based loop.
With a bit more work, the manual edits won't be necessary.How is mojo doing ? Has it made its way as a niche language in some places ?
Great post @fnands!
For those that thought the process of speeding up CRC was interesting, I strongly recommend reading [1]. It describes a step by step process on how a naive CRC implementation might be improved, until finally arriving at an implementation in assembly with a staggering throughput of 62 processed bits (almost 8 bytes) per CPU cycle. Yes, you read that right.
[1]: https://github.com/komrad36/CRC