5 May 2024

The mistakes i made when developing bzip3 are as follows:

  • applying LZP front-to-back (mistake shared by most codecs)
  • no end of file marker
  • RLE stage (it looked lucrative at first when i tested it on some corpora, but ultimately it's a mistake)
  • following LZ4's idea of a frame and block format (people get confused to hell and back when they learn that bz3_decompress uses a different format than the bzip3 CLI tool; tldr the API function knows more about the input so it can compress more efficiently, while the cli tool has to take pipe input)
  • using a strong statistical model by default without a knob to turn it off and replace it with something faster.
  • not taking advantage of block size reduction on known input sizes
< back to journal