:: commit 2d1e49d0d2fed95a13928d58fb4363de3d37720f

Kamila Szewczyk <27734421+kspalaiologos@users.noreply.github.com> — 2022-05-15 16:08

parents: 8f95b59f79

Update README.md

diff --git a/README.md b/README.md
index 7227378..8544d82 100644
--- a/README.md
+++ b/README.md
@@ -16,6 +16,59 @@ $ make
 $ sudo make install
 ```
 
+## Perl source code benchmark
+
+First, I have downloaded every version of Perl5 ever released and decompressed them.
+
+```bash
+% wget -r -l1 -nH --cut-dirs=2 --no-parent -A.tar.gz --no-directories https://www.cpan.org/src/5.0/
+% for g in *.gz; do gunzip $g; done
+% ls -la | wc -l
+262
+```
+
+Then, I put all the resulting `.tar` files in a single `.tar` file and tried to compress it using various compressors:
+
+```
+xz -T16 -9 -k all.tar  10829.91s user 26.91s system 1488% cpu 14658M memory 12:09.24 total
+bzip2 -9 -k all.tar  981.78s user 9.77s system 95% cpu 8M memory 17:16.64 total
+bzip3 -e -b 256 -j 12 all.tar  2713.81s user 16.28s system 634% cpu 18301M memory 7:10.10 total
+zstd -T12 -16 all.tar  4162.94s user 16.40s system 1056% cpu 687M memory 6:35.62 total
+```
+
+The results follow:
+
+* LZMA (xz) - 2'056'645'240 bytes
+* bzip2 - 3'441'163'911 bytes
+* bzip3 - 1'001'957'587 bytes
+* Zstandard - 3'076'143'660 bytes
+
+Then, I used `lrzip` to perform long-range deduplication on the original `.tar` file:
+
+```
+% time lrzip -n -o all_none.tar.lrz all.tar
+546.17s user 160.87s system 102% cpu 10970M memory 11:28.00 total
+
+% time lrzip --lzma -o all_lzma.tar.lrz all.tar
+702.16s user 161.87s system 122% cpu 10792M memory 11:44.83 total
+
+% time lrzip -b -o all_bzip2.tar.lrz all.tar
+563.93s user 147.38s system 112% cpu 10970M memory 10:34.10 total
+```
+
+Finally, I compressed the resulting `none.tar.lrz` file using bzip3:
+
+```
+% time bzip3 -e -b 256 -j 2 all_none.tar.lrz
+32.05s user 0.76s system 146% cpu 2751M memory 22.411 total
+```
+
+The results follow:
+
+* lrzip + bzip3 - 60'672'608 bytes.
+* lrzip + lzma - 64'774'202 bytes.
+* lrzip + bzip2 - 75'685'065 bytes.
+
 ## Disclaimers
 
 **I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE USE OF THIS PROGRAM/LIBRARY, HOWSOEVER CAUSED.**
@@ -30,7 +83,7 @@ That is not to say this program is inherently unreliable. Indeed, I very much ho
 
 **Bzip3's performance is _heavily_ dependent on the compiler. x64 Linux clang13 builds usually can go as high as 17MiB/s compression and 23MiB/s decompression _per thread_. Windows and 32-bit builds might be considerably slower.**
 
-## Benchmarks
+## Corpus benchmarks
 
 ![visualisation of the benchmarks](etc/benchmark.png)
 
tab: 248 wrap: offon