[lug] Bacula

Sat Sep 17 10:36:45 MDT 2005

Alan Robertson wrote:
> George Sexton wrote:
> 
>>> -----Original Message-----
>>> From: lug-bounces at lug.boulder.co.us 
>>> [mailto:lug-bounces at lug.boulder.co.us] On Behalf Of Alan Robertson
>>> Sent: Wednesday, September 14, 2005 10:50 PM
>>> To: Boulder (Colorado) Linux Users Group -- General Mailing List
>>> Subject: Re: [lug] Bacula
>>>
>>>
>>> A simple example:
>>>     If you want to back up 10 million small files, that means
>>>     probably 10 million inserts to a database - which is a HUGE
>>>     amount of work - and incredibly slow when compared to writing a
>>>     10 million records to a flat file.
>>>     And when you recycle a backup, that means 10 million deletions.
>>>
>>
>> Actually, for the single backup case, your argument is sound. For the 
>> second
>> backup, it's wrong. Consider a table structure:
>>
>> BackedUpFiles
>> -------------
>> File_ID
>> FileName
>> Size
>> Owner
>> Attributes
>> Date
>> MD5SUM
> 
> 
> AND very importantly you need a reference count - or you can never 
> delete anything from this relation.  And, this relation has to be 
> indexed by anything you want to search on.  This includes AT LEAST the 
> file id, and the combination of (FileName, Size, Owner, Attributes, Date 
> and MD5sum).  Or you could add another field which you might 
> attribute_sum which is the md5 sum of all the fields except the 
> reference count.

Ok, this is not practical info, I'm just musing, so skip this if you are 
looking for practical info. This is just for people who are entertained 
by odd uses of software.

This whole thing reminds me of version control systems. I was just 
thinking it'd be interesting if you could take a version control system 
(any will do if it handles binary files and sym links and other special 
file types), and check in the entire operating system to a repository, 
and then create new installs via checking it out. The new installs would 
be a working copy. An incremental backup would merely be a commit 
operation. Most repositories also have an ability to do some sort of 
trigger under some operations, so for example you could cause a checkout 
to a given MAC address to configure certain things different than 
checkouts to a different MAC address...like hardware lists and network 
config. The hard part would be that the computer that acts as the 
working copy (the day-to-day computer being backed up) would be littered 
with metadata subdirectories, or other metadata, such as a .svn 
subdirectory; but if you could turn the metadata into an overlay 
filesystem that exists only at the moment of using a repository command, 
then the original filesystem would not even need the metadata. A central 
repository could even be updated via something like yum, and machines 
being backed up could be updated via a typical repository update 
command. The advantage for updating this way would be for large 
commercial environments using a uniform install that mainly needs minor 
variations. One could even think of things like creating repository 
exports on NFS for diskless stuff.

D. Stimits, stimits AT comcast DOT net