CRC32 & Adler-32 Calculator
Non-cryptographic checksums for ZIP files, network frames & integrity checks
CRC32 and Adler-32: The Workhorses of Lightweight Data Integrity
When a ZIP file lands on your hard drive intact, when an Ethernet frame traverses a noisy cable without corruption, when a PNG image renders pixel-perfect after crossing the internet โ you have a non-cryptographic checksum to thank. CRC32, its modern sibling CRC32C, and Adler-32 are three of the most widely deployed integrity mechanisms in computing history. They are not designed to resist adversaries (that job belongs to SHA-256 or BLAKE3), but they are spectacularly good at one specific task: detecting accidental corruption quickly and cheaply.
The Mathematics of CRC32
CRC stands for Cyclic Redundancy Check. Despite the intimidating name, the concept is elegant: treat the entire input as one enormous binary polynomial, divide it by a fixed generator polynomial, and the 32-bit remainder is your checksum. Because division is reversible, a single flipped bit almost certainly changes the remainder, making accidental corruption detectable with near-certainty.
The standard CRC32 used in ZIP archives, zlib, Ethernet (IEEE 802.3), and PNG files uses the IEEE 802.3 generator polynomial: 0x04C11DB7. In practice, implementations use its bit-reversed form 0xEDB88320 to enable a clean right-shifting hardware circuit. The algorithm initializes a 32-bit register to 0xFFFFFFFF, processes each byte by XOR-ing it into the low byte of the register, then shifting right 8 times with the polynomial XOR-ed in when a 1-bit falls off the bottom. The final result is XOR-ed with 0xFFFFFFFF again (a bitwise NOT) to give the output. This pre- and post-conditioning ensures that a stream of leading or trailing zeros actually changes the checksum.
To avoid recomputing the polynomial XOR on every bit, all practical implementations precompute a 256-entry lookup table โ one entry per possible byte value โ reducing the cost to one table lookup and one XOR per byte of input. This is fast enough to run at memory-bus speeds on modern CPUs.
CRC32C: The Castagnoli Variant
CRC32C uses a different generator polynomial: 0x1EDC6F41 (reversed: 0x82F63B78), chosen by Guy Castagnoli and colleagues in 1993 specifically for its superior error detection profile. Compared to the IEEE polynomial, CRC32C catches a wider class of burst errors for the same 32-bit cost.
More practically, Intel's SSE4.2 instruction set (introduced in 2008) added a hardware CRC32 instruction that computes CRC32C natively, processing four bytes per clock cycle. This made CRC32C the preferred checksum for storage systems: it is used in iSCSI, SCTP (a network protocol), Google's LevelDB and RocksDB key-value stores, and the Btrfs filesystem. If you are choosing between CRC32 and CRC32C for a new application, CRC32C wins unless you need compatibility with existing ZIP/PNG/zlib ecosystems.
The test vector for both: the ASCII string 123456789 hashes to 0xCBF43926 under CRC32 and 0xE3069283 under CRC32C. These are the canonical validation values published in the spec โ if your implementation does not hit these numbers, something is wrong.
Adler-32: Speed Over Polynomial Rigor
Mark Adler (co-author of gzip and zlib) designed Adler-32 in 1995 as a faster alternative to CRC32 for the zlib compression library. The algorithm is almost childishly simple: maintain two 16-bit running sums, A and B, initialized to 1 and 0 respectively. For each input byte, add it to A; add the new value of A to B. Both sums are taken modulo 65521 (the largest prime below 216). The final 32-bit output packs B in the high 16 bits and A in the low 16 bits.
Why 65521 instead of 65536? Using a prime modulus eliminates patterns that would cause many distinct inputs to produce the same checksum. With a power-of-two modulus, certain symmetric inputs would alias โ the prime breaks that symmetry.
The trade-off is detection quality. Adler-32 is weaker than CRC32 for very short inputs (under ~128 bytes) because A and B haven't had enough bytes to accumulate meaningful variation. For bulk data transfer โ which is exactly what zlib processes โ it performs adequately and runs faster on software implementations because it avoids table lookups entirely. The PNG specification, for instance, uses CRC32 for per-chunk integrity but wraps the whole compressed payload in a zlib Adler-32 as well.
Where These Algorithms Actually Appear in Production
CRC32 (IEEE): Every ZIP file ends with a local file header containing the CRC32 of the original uncompressed content. PNG chunks (IHDR, IDAT, IEND, etc.) each carry a CRC32 trailer. The Ethernet FCS (Frame Check Sequence) is CRC32. The SATA and USB protocols use it internally. So does Gzip (.gz files) โ the format stores both the Adler-32 of the zlib stream and the CRC32 of the original file.
CRC32C: Used by Google Spanner's storage layer, Apache Kafka, NVMe drives (in the optional end-to-end data integrity feature), and AWS's S3 which now accepts x-amz-checksum-crc32c headers as a first-class integrity mechanism for object uploads.
Adler-32: Embedded in every zlib stream โ which means every HTTP response compressed with deflate, every PNG file's IDAT chunk, and every Java JAR (which are ZIPs with zlib-compressed entries) runs an Adler-32 under the hood.
What These Algorithms Cannot Do
Non-cryptographic checksums guarantee nothing against intentional manipulation. Given a target CRC32 value, an attacker can append four bytes to any file to produce exactly that checksum โ this is a trivial algebraic inversion. They can also construct two different files with the same CRC32 in seconds on commodity hardware. For anything adversarial โ file authenticity, tamper detection, digital signatures โ use SHA-256, SHA-3, or BLAKE3. CRC32 is not a substitute for a cryptographic hash.
Also worth noting: all three algorithms are non-keyed. There is no secret involved. If you need to authenticate a message as coming from a specific sender, you want HMAC-SHA256, not a CRC.
Performance Characteristics
On modern hardware, a software CRC32 table implementation processes roughly 500โ1000 MB/s per core. Using SSE4.2 hardware instructions, CRC32C reaches 10โ30 GB/s. Adler-32 in software sits in the 800โ1500 MB/s range because the modular arithmetic (especially the modulo prime) is the bottleneck rather than memory access. For embedded systems with no hardware acceleration, Adler-32's lack of a lookup table makes it preferable when ROM or RAM is scarce.
The 32-bit output size is a deliberate engineering choice. At 32 bits, the probability of an undetected random error is roughly 1 in 4 billion โ adequate for the frame sizes and file sizes these algorithms were designed to protect. For multi-gigabyte objects where you want even stronger accidental-error detection, CRC64 variants exist, but they remain niche compared to the ubiquitous 32-bit forms.
Understanding these three algorithms โ their polynomials, their edge cases, their strengths, and their deliberate limitations โ gives you a solid foundation for making the right integrity-checking choice in your own storage, networking, or compression code. Sometimes the right tool really is a 1970s cyclic polynomial, running faster than your memory bus.