The mistakes i made when developing bzip3 are as follows:
- applying LZP front-to-back (mistake shared by most codecs)
- no end of file marker
- RLE stage (it looked lucrative at first when i tested it on some corpora, but ultimately it's a mistake)
- following LZ4's idea of a frame and block format (people get confused to hell and back when they learn that bz3_decompress uses a different format than the bzip3 CLI tool; tldr the API function knows more about the input so it can compress more efficiently, while the cli tool has to take pipe input)
- using a strong statistical model by default without a knob to turn it off and replace it with something faster.
- not taking advantage of block size reduction on known input sizes
