[lug] recovering a tar file that spans CDs

Thu Jun 19 06:20:00 MDT 2008

Sean Reifschneider wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Kenneth D Weinert wrote:
>   
>> As a separate point, anyone have a better scheme for creating backups?
>> I don't mind saving to CD/DVD, but perhaps one file that spans disks
>> isn't the best choice, but I'd prefer to not have to sort out ahead of
>> time exactly which files will fit on each disk.
>>     
>
> If the stream is compressed without specific support for restarting the
> compression, you're probably screwed unless you can recover the bad part.
>   
There are compression reset records, but I think you may have to write 
some code with libz to recover the stream.

> If you really must write tars to CDs, I wrote a tool years ago called
> "pytarsplit" that you can find on ftp.tummy.com:/pub/tummy/pytarsplit/
> which will take a tar file from stdin and a size and resulting file name
> and split the tar so that it's smaller than that size:
>
>    tar c . | pytarsplit 5000000 /tmp/mytarfile.%05d.tar
>   
Tar has native support for multi-disk volumes, btw.  It would have to, 
when you consider the original purpose.  I don't know how well it's 
supported for our typical needs though -- I think it's more of a "change 
media and press return" type of splitter.
> That doesn't really allow you to compress the files though, as they won't
> consistently compress to the same size, so you're just wasting space if you
> compress...
I'm not sure what you're saying here.

BTW zip format handles compression (and encryption) of individual files, 
and can be used in a streaming mode.  It requires a well-known extension 
for unix ownership & permissions.  It also has an end-of-file directory 
so you can quickly find individual files.  Tar requires you to scan the 
entire archive.
> Ideally what you'd want is a "pytargzip" that you could put in the middle of
> that pipe that compressed each tar entry as it went by, but because the tar
> header is BEFORE the file, and it has the size, you would need to spool it
> off to disc, get the resulting compressed size, and then write the header
> and the file...  But that would compress each file independently.
That's an unusual format that breaks tools.  With libz it's trivial to 
watch for headers and just reset the compression engine.  The results 
can be read by any tool that understands compressed tar format.