[lug] Bacula
D. Stimits
stimits at comcast.net
Sat Sep 17 10:36:45 MDT 2005
Alan Robertson wrote:
> George Sexton wrote:
>
>>> -----Original Message-----
>>> From: lug-bounces at lug.boulder.co.us
>>> [mailto:lug-bounces at lug.boulder.co.us] On Behalf Of Alan Robertson
>>> Sent: Wednesday, September 14, 2005 10:50 PM
>>> To: Boulder (Colorado) Linux Users Group -- General Mailing List
>>> Subject: Re: [lug] Bacula
>>>
>>>
>>> A simple example:
>>> If you want to back up 10 million small files, that means
>>> probably 10 million inserts to a database - which is a HUGE
>>> amount of work - and incredibly slow when compared to writing a
>>> 10 million records to a flat file.
>>> And when you recycle a backup, that means 10 million deletions.
>>>
>>
>> Actually, for the single backup case, your argument is sound. For the
>> second
>> backup, it's wrong. Consider a table structure:
>>
>> BackedUpFiles
>> -------------
>> File_ID
>> FileName
>> Size
>> Owner
>> Attributes
>> Date
>> MD5SUM
>
>
> AND very importantly you need a reference count - or you can never
> delete anything from this relation. And, this relation has to be
> indexed by anything you want to search on. This includes AT LEAST the
> file id, and the combination of (FileName, Size, Owner, Attributes, Date
> and MD5sum). Or you could add another field which you might
> attribute_sum which is the md5 sum of all the fields except the
> reference count.
Ok, this is not practical info, I'm just musing, so skip this if you are
looking for practical info. This is just for people who are entertained
by odd uses of software.
This whole thing reminds me of version control systems. I was just
thinking it'd be interesting if you could take a version control system
(any will do if it handles binary files and sym links and other special
file types), and check in the entire operating system to a repository,
and then create new installs via checking it out. The new installs would
be a working copy. An incremental backup would merely be a commit
operation. Most repositories also have an ability to do some sort of
trigger under some operations, so for example you could cause a checkout
to a given MAC address to configure certain things different than
checkouts to a different MAC address...like hardware lists and network
config. The hard part would be that the computer that acts as the
working copy (the day-to-day computer being backed up) would be littered
with metadata subdirectories, or other metadata, such as a .svn
subdirectory; but if you could turn the metadata into an overlay
filesystem that exists only at the moment of using a repository command,
then the original filesystem would not even need the metadata. A central
repository could even be updated via something like yum, and machines
being backed up could be updated via a typical repository update
command. The advantage for updating this way would be for large
commercial environments using a uniform install that mainly needs minor
variations. One could even think of things like creating repository
exports on NFS for diskless stuff.
D. Stimits, stimits AT comcast DOT net
More information about the LUG
mailing list