[lug] ssh compression

Paul Walmsley shag-blug at booyaka.com
Wed May 1 13:28:13 MDT 2002


On Wed, 1 May 2002, D. Stimits wrote:

> "Sexton, George" wrote:
> >
> > The best solution I have found for things like this is to write a cron job
> > that bzips the data, pipes the output through gpg, and then use FTP to move
> > the encrypted data. You could use NCFTPGET at the destination site to
> > retrieve the file. I move a client's SQL Database to my site once a week
> > this way. 2.5GB compresses down to 350MB. I also do it early in the morning,
> > so it doesn't use bandwidth during production hours.
>
> One advantage here is that bzip is the better compression, and any ssh
> tunneling should then have compression turned off. The bzip with -9
> compression (or even just -7) is very very good, and a cron job can use
> a nice level to cut down how much cpu is used, whereas you might find
> problems if you cut the nice level to very small priority in ssh.
> Separation of compression from transmission is appealing for batch
> processes.

the added compression from bzip2 comes at a price, though: it is much much
slower than gzip.

the upshot is, on my 700MHz P3 with PC100 SDRAM, at default compression
settings, bzip2 is between two to seven times slower than gzip.

two somewhat pathological examples follow.  (although data that's already
compressed, like image data or MP3s, for example, will be similar to the
second file, below.)

a 10MB file full of zeros:
[shag at localhost good]$ dd if=/dev/zero of=foo count=10 bs=1024k ; time
bzip2 foo ; bzip2 -d foo.bz2 ; time gzip foo ; gzip -d foo.gz
10+0 records in
10+0 records out
1.970u 0.050s 0:02.06 98.0%	0+0k 0+0io 120pf+0w
0.820u 0.050s 0:00.89 97.7%	0+0k 0+0io 97pf+0w

a 10MB file full of pseudorandom data:
[shag at localhost good]$ dd if=/dev/urandom of=foo count=10 bs=1024k ; time
bzip2 foo ; bzip2 -d foo.bz2 ; time gzip foo ; gzip -d foo.gz
10+0 records in
10+0 records out
21.540u 0.330s 0:22.04 99.2%	0+0k 0+0io 121pf+0w
3.360u 0.310s 0:03.72 98.6%	0+0k 0+0io 97pf+0w

my guess is that bzip2 will also exercise your main memory bandwidth more
than gzip will.  which could make the speed difference even more evident
on systems where the cpu/memory speed ratio is higher than mine.

but of course, bzip2 does provide somewhat better compression, as others
have noted.  so if your link is slow enough, your files compressible
enough, and cpu time cheap enough, bzip2 might work best.


- Paul




More information about the LUG mailing list