Which Linux/UNIX compression algorithm is best?

Which Linux/UNIX compression algorithm is best?

Intro

In this article, we'll be showing compress + decompress benchmarks for 4 of the most popular Linux compression algorithms: gzip, bzip2 (using lbzip2), xz, and lz4.

Spoiler: There isn't a "best" compression algorithm, because every tool / algorithm has tradeoffs, whether it's speed, RAM usage, CPU usage, faster/slow decompression than compression, among other tradeoffs.

However, after seeing these benchmarks and our summary, you should be able to make an educated decision on which algorithm to use per file/folder you're compressing.

Important notes

  • The compression benchmark file is a 619 MB uncompressed Tar file containing a debootstrap'd Ubuntu 20.04 (Focal Fossa) root filesystem, which contains a large mixture of different filetypes, some which compress well, and some which don't compress at all due to already being compressed.
  • These stats are from a single run at level 1 and level 9, so there may be discrepancies.
  • YMMV, depending on what you're (de-)compressing, different compression algorithms may work better or worse.
  • Compression tools which support threading (xz and lbzip2) were ran with 10 threads in these benchmarks.

Benchmark System:

Distro:          Ubuntu Linux (Server) 20.04 (Focal Fossa)
Kernel:          Linux 5.4.0-66-generic #74-Ubuntu SMP x86_64
CPU:             2x Intel(R) Xeon(R) CPU E5-2630L 0 @ 2.00GHz (6c/12t per CPU, total 12c/24t)
RAM:             64GB DDR3 ECC (1333 MHz) - 8x 8GB DIMMs
Disks:           2x 500GB Samsung 860 EVO SSDs
Filesystem/RAID: ZFS with SSDs in ZFS Mirror, ashift 12, all pools filesystem compressed with lz4

Columns:

  • SECS PER MB SAVED is a statistic that measures how many seconds on average that the compressor takes to compress 1 single megabyte (1024 * 1024 * 1024 = 1,073,741,824 bytes). The lower the number, the better - a lower number means the compressor compresses data faster than others.
  • REDUCTION refers to how many megabytes that the original file was reduced by, after compression.
  • FSIZE refers to the size of the resulting compressed file
  • (DE)COMP TIME is the benchmarked time of how long the compressor took to compress/decompress the file.
  • MB DECOMP PER SEC is a statistic of the estimated megabytes per second that the compressor can decompress per second (based on uncompressed megabytes, not compressed megabytes).

Where to download/install each compressor?

All of the compressors mentioned in this article are generally available via Linux/UNIX package managers, such as apt, yum, dnf, and pacman.

Debian/Ubuntu

apt update -y
apt install -y xz-utils gzip lbzip2 liblz4-tool

CentOS/Fedora/Oracle/RHEL

dnf install -y lz4 xz gzip

# For 'lbzip2', you'll likely need to enable the EPEL repo for your distro.
# On CentOS, Fedora and Oracle, this can usually be done by simply adding the Fedora
# EPEL package like so:
dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm

# Now you can install lbzip2 using your package manager
dnf install -y lbzip2

Archlinux

pacman -Sy
pacman -Sy lz4 xz lbzip2 gzip

Looking to buy a Virtual or Dedicated server? Do you like privacy and low prices? Try Privex!

We have virtual servers starting from just US$0.99/mo, and dedicated servers starting from as low as US$50/mo

Unlike other hosts, we don't ask for any personal details - only a name (can be an alias / username), and an e-mail address so we can send you your server details and renewal invoices.

We also accept several different cryptocurrencies with our own in-house payment processor - no third parties involved in processing your payments.

At the time of writing, we currently accept: Bitcoin (BTC), Litecoin (LTC), Monero (XMR), Dogecoin (DOGE), HIVE, and HBD

Order a server TODAY! Privacy is affordable™


Compress/Decompress Benchmarks

LEVEL 1 COMPRESSION (minimum compression)

Level 1 compression is used when you want to compress a file quickly, rather than with the best possible compression ratio.

The following benchmarks are from compressing and decompressing a 619MB tarball containing Ubuntu 20.04 using level 1 compression.

With level 1 compression, lz4 is the clear winner in terms of pure speed, at 0.008 seconds per megabyte compressed, while gzip is the slowest at 0.039 seconds per meg.

In terms of the "strongest" compression, xz is king with 26% compression ratio, with bzip2 (via lbzip2) in second place, with 30% compression ratio.

COMPRESSION

NAME        COMP TIME          FSIZE     RATIO       REDUCTION     SECS PER MB SAVED
lbzip2:     6.023 seconds      190M      30.00 %     428M          0.014
lz4:        2.768 seconds      286M      46.00 %     332M          0.008
gzip:       15.645 seconds     223M      36.00 %     395M          0.039
xz:         8.782 seconds      163M      26.00 %     455M          0.019

DECOMPRESSION

As mentioned at the start of the article, every compression algorithm/tool has it's tradeoffs, and xz's high compression is paid for by very slow decompression, while lz4 decompresses even faster than it compressed.

lz4's extremely fast compression and decompression is one of the reasons that it's used in realtime / on-the-fly compression systems, such as ZRAM - a Linux kernel module that compresses and decompresses a system's RAM in realtime, allowing more data to be stored in RAM than is physically available, without using SWAP.

NAME        DECOMP TIME        MB DECOMP PER SEC
lbzip2:     2.142 seconds      288.51 MB/S
lz4:        1.590 seconds      388.67 MB/S
gzip:       6.832 seconds      90.45 MB/S
xz:         18.737 seconds     32.98 MB/S

LEVEL 9 COMPRESSION (maximum compression)

Level 9 compression is used when you want to compress a file with the best possible compression ratio, allowing the compressor to take as much time and CPU/RAM as it needs to maximize compression.

The following benchmarks are from compressing and decompressing a 619MB tarball containing Ubuntu 20.04 using level 9 compression.

You'll notice that the completion time is much much higher than with level 1 compression, with xz now taking 136 seconds to compress the file, instead of 8.7 seconds with level 1.

Most compressors had a considerable boost in compression ratio (lower is better), with lz4 dropping from 46% (level 1) down to 37% (level 9), while xz dropped from 26% to 20%. lbzip2 however, only dropped 2%.

With xz - it's clear that increasing the compression level dramatically reduces it's speed per megabyte, with a poor 0.276 seconds per megabyte, compared to the 26x faster 0.019 seconds per megabyte it compressed at with level 1.

COMPRESSION

NAME        COMP TIME           FSIZE     RATIO       REDUCTION     SECS PER MB SAVED
lbzip2:     6.028 seconds       176M      28.00 %     442M          0.013
lz4:        33.698 seconds      233M      37.00 %     385M          0.087
gzip:       118.157 seconds     199M      32.00 %     419M          0.281
xz:         136.160 seconds     125M      20.00 %     493M          0.276

DECOMPRESSION

You may notice that lz4 decompresses even faster despite a higher compression level. While it's compression ratio is weak compared to other algorithms, it makes up for that in terms in pure speed, higher compression levels may take more time to compress, but actually REDUCE decompression time.

All of the other compressors however, take a little more time to decompress than they did with level 1. Though their decompression time is generally only slightly affected, unlike the compression time which is considerably higher.

NAME        DECOMP TIME        MB DECOMP PER SEC
lbzip2:     2.776 seconds      222.62 MB/S
lz4:        1.484 seconds      416.44 MB/S
gzip:       7.666 seconds      80.61 MB/S
xz:         15.532 seconds     39.78 MB/S

Summary - which compression algorithms are best for each usecase?

Fastest compression and decompression: LZ4

Highest overall compression ratio: XZ

Compromise between compression ratio and speed: BZIP2 using LBZIP2

Most widely used / compatible compression algorithm: GZIP


Looking to buy a Virtual or Dedicated server? Do you like privacy and low prices? Try Privex!

We have virtual servers starting from just US$0.99/mo, and dedicated servers starting from as low as US$50/mo

Unlike other hosts, we don't ask for any personal details - only a name (can be an alias / username), and an e-mail address so we can send you your server details and renewal invoices.

We also accept several different cryptocurrencies with our own in-house payment processor - no third parties involved in processing your payments.

At the time of writing, we currently accept: Bitcoin (BTC), Litecoin (LTC), Monero (XMR), Dogecoin (DOGE), HIVE, and HBD

Order a server TODAY! Privacy is affordable™