[lug] Bacula

George Sexton gsexton at mhsoftware.com
Thu Sep 15 08:44:51 MDT 2005


> -----Original Message-----
> From: lug-bounces at lug.boulder.co.us 
> [mailto:lug-bounces at lug.boulder.co.us] On Behalf Of Alan Robertson
> Sent: Wednesday, September 14, 2005 10:50 PM
> To: Boulder (Colorado) Linux Users Group -- General Mailing List
> Subject: Re: [lug] Bacula
> 
> 
> A simple example:
> 	If you want to back up 10 million small files, that means
> 	probably 10 million inserts to a database - which is a HUGE
> 	amount of work - and incredibly slow when compared to writing a
> 	10 million records to a flat file.
> 	And when you recycle a backup, that means 10 million deletions.
> 

Actually, for the single backup case, your argument is sound. For the second
backup, it's wrong. Consider a table structure:

BackedUpFiles
-------------
File_ID
FileName
Size
Owner
Attributes
Date
MD5SUM

Tape
------------
Tape_ID
InitialUseDate
UseCount
Etc.

FilesTape
---------
File_ID
Tape_ID

It becomes pretty trivial to enumerate the tapes a specific file is on. For
the 2nd and subsequent backups, the space usage is vastly more efficient.

I'll give you that removing 10 million entries from FilesTape is a job, but
the advantage of being able to quickly enumerate the tapes a specific file
is on vastly outweigh that issue.

>From a space perspective, the second backup a database file name would use
something like 30MB additional to store the contents while the flat file
would probably use something like:

10^6*(80(avg file name length)+6(owner and attributes)+8(file
size)+8(MD5SUM)+8(file date)+10(delimiters and CRLF)

Or approximately 120MB


George Sexton
MH Software, Inc.
http://www.mhsoftware.com/
Voice: 303 438 9585
  




More information about the LUG mailing list