Hacker News

by fnandson 10/6/24, 11:11 AMwith 17 comments

by molenzwiebelon 10/6/24, 12:40 PM

For those that thought the process of speeding up CRC was interesting, I strongly recommend reading [1]. It describes a step by step process on how a naive CRC implementation might be improved, until finally arriving at an implementation in assembly with a staggering throughput of 62 processed bits (almost 8 bytes) per CPU cycle. Yes, you read that right.

[1]: https://github.com/komrad36/CRC

by joramson 10/6/24, 1:29 PM

For what it's worth, it appears the paper "Everything we know about CRC but afraid to forget" was originally published as part of the release of crcutil on Google Code[1]. This is a hg repository with one commit that includes the paper, the source of the paper, and an implementation.

[1]: https://code.google.com/archive/p/crcutil/

by onethumbon 10/6/24, 8:36 PM

FYI, I forked and improved [1] a Rust implementation that supports both table- and SIMD-accelerated CRC-64/NVME [2] calculations. The SIMD-accelerated (x86/x86_64 and aarch64) version delivers 10X over the table (16-bytes at a time) implementation.

The original implementation [3] did the same thing but for CRC-64/XZ [4].

[1]: https://github.com/awesomized/crc64fast-nvme

[2]: https://reveng.sourceforge.io/crc-catalogue/all.htm#crc.cat....

[3]: https://github.com/tikv/crc64fast

[4]: https://reveng.sourceforge.io/crc-catalogue/all.htm#crc.cat....

by adsharmaon 10/6/24, 3:15 PM

https://github.com/py2many/py2many/pull/653

Transpiling the python version in the blog to mojo gives me a 4x speedup.

Had to hand edit a couple of things:

    - List to bytearray conversion is not working yet
    - Can't iterate over SIMD. So a "for x in list" loop has to be rewritten as a range based loop.

With a bit more work, the manual edits won't be necessary.

by bsaulon 10/6/24, 1:00 PM

How is mojo doing ? Has it made its way as a niche language in some places ?

by chrislattneron 10/6/24, 5:18 PM

Great post @fnands!

Making CRC calculations in Mojo 18x faster than Python and 3x slower than Python