δ ω » 05.05.2024 · 133 words · 1 min read · compression

5 May 2024

The mistakes i made when developing bzip3 are as follows:

applying LZP front-to-back (mistake shared by most codecs)
no end of file marker
RLE stage (it looked lucrative at first when i tested it on some corpora, but ultimately it's a mistake)
following LZ4's idea of a frame and block format (people get confused to hell and back when they learn that bz3_decompress uses a different format than the bzip3 CLI tool; tldr the API function knows more about the input so it can compress more efficiently, while the cli tool has to take pipe input)
using a strong statistical model by default without a knob to turn it off and replace it with something faster.
not taking advantage of block size reduction on known input sizes