Sunday, August 2, 2009

A comparison of compressors on SQL

I was cleaning and reorganizing data on my computers, and taking the opportunity to compress anything large. that I wouldn't need to see inside as part of, e.g. indexing.



I looked at one of my snapshots of Rosetta Code's database; Uncompressed, it occupied over 600MB. Compressed with bzip2, it occupied about one tenth of that. I decompressed and recompressed it with rzip, and was sufficiently surprised at the results that I tried to do a fairly thorough comparison of bzip2, rzip and gzip. Based on my use case, I collected data on compression ratio and speed. I did not collect data on RAM usage. (Though I do know that rzip at max compression exceeds the amount of RAM available on my basic Slice.)



It took (bzip2,rzip,gzip) (5m29s, 3m1s, 1m3s) to achieve compression ratios of (11.2, 605, 7.48).



Here's the raw data:

shortcircuit@dodo~/comprcompa
04:54:56 $ ./comprcompa.sh rcode_20090704_2029.sql
The primary purpose here is to compare compression ratios for database SQL dump backups.
Running environment is a Gentoo system running an AMD Phenom 9650.
As most things in Gentoo are typically compiled from source, these are the CFLAGS used:
CFLAGS="-march=amdfam10 -O2 -pipe"
our source database dump
-rw------- 1 shortcircuit shortcircuit 633986350 2009-08-02 02:58 rcode_20090704_2029.sql
Starting memory conditions; If there's a great deal of room for cache, we won't hit disk as frequently
total used free shared buffers cached
Mem: 7936588 7730932 205656 0 207452 6504216
-/+ buffers/cache: 1019264 6917324
Swap: 0 0 0
streaming original file into /dev/null via dd, to pull it into cache
1238254+1 records in
1238254+1 records out
633986350 bytes (634 MB) copied, 1.07369 s, 590 MB/s
starting uptime: 04:55:01 up 13 days, 17:13, 3 users, load average: 0.67, 0.85, 0.71
Starting bzip2 -fk9

real 5m28.838s
user 5m28.279s
sys 0m0.537s
Post-bzip2 uptime: 05:00:29 up 13 days, 17:18, 3 users, load average: 1.05, 1.03, 0.83
Pull original back into cache, for fair comparison
1238254+1 records in
1238254+1 records out
633986350 bytes (634 MB) copied, 1.097 s, 578 MB/s
Starting rzip -k9

real 3m0.765s
user 3m0.445s
sys 0m0.307s
Post-rzip uptime: 05:03:31 up 13 days, 17:21, 3 users, load average: 1.21, 1.08, 0.87
Pull original back into cache, for fair comparison
1238254+1 records in
1238254+1 records out
633986350 bytes (634 MB) copied, 1.0498 s, 604 MB/s
Starting gzip -9

real 1m33.356s
user 1m32.011s
sys 0m0.890s
Post-gzip uptime: 05:05:06 up 13 days, 17:23, 3 users, load average: 1.11, 1.07, 0.89
Final file sizes:
-rw------- 1 shortcircuit shortcircuit 56789389 2009-08-02 02:58 rcode_20090704_2029.sql.bz2
-rw------- 1 shortcircuit shortcircuit 84747906 2009-08-02 02:58 rcode_20090704_2029.sql.gz
-rw------- 1 shortcircuit shortcircuit 10472407 2009-08-02 05:03 rcode_20090704_2029.sql.rz
shortcircuit@dodo~/comprcompa
05:05:06 $

No comments:

Post a Comment