[lug] Bacula

Wed Sep 14 22:50:25 MDT 2005

Dan Ferris wrote:
> Thanks.
> 
> I hate AMANDA for the following reasons:
> 
> 1.  can't span tapes without major hassel
> 2.  insecure, sorry, IP address authentication went out about 1996.
> 3.  doesn't back up to disk easily (yes it can be done, but it's not easy)
> 4.  Doesn't restore easily with just a rescue disk (you have to go 
> through grief with dd).
> 5.  Doesn't use a database.
> 6.  It just reeks of kludgy.
> 
> It works, but it could be WAY better in my opinion.
> 
> I have about 2TB of data.  So I need something that can span tapes 
> effortlessly.  I'm not going to spend hours going through my filesystem 
> with du to find stuff that's under 200GB to fit on an LTO tape.
> 
> So I'll try Bacula.  It looks cool.  If it does what I want I'll be sold.

Please report back what your impressions were.  I didn't much like 
Amanda either - for a variety of reasons.

Warning:  Backup rant follows...

Interestingly enough, I think the major scalability _problem_ with 
backup software is when they use a conventional database - because 
databases are designed for managing data with nearly exactly opposite 
characteristics to backups indexes.

A simple example:
	If you want to back up 10 million small files, that means
	probably 10 million inserts to a database - which is a HUGE
	amount of work - and incredibly slow when compared to writing a
	10 million records to a flat file.
	And when you recycle a backup, that means 10 million deletions.

So, if you keep a constant number of backups (finite media), then you 
have to do 10 million inserts AND 10 million deletes just to do one 
backup.  That's an incredible amount of resources just to write the 
indexes that you hardly ever use.   Add to that the fact that find(1) 
often puts files in mostly-sorted order - you find that you're pretty 
much in the worst possible case for using btrees.

But if you put the data into flat files - one per tape - then all you 
have to do is create 1 file with 10 million lines in it, and remove
one file.  This is orders of magnitude faster than a database.

B-trees are just the wrong data structure for keeping this kind of data. 
  You never update it.  You just write it, keep it unmodified for a 
while and throw away the whole thing all at once.  B-trees are really 
nice if you update a lot over a long period of time.  In that case 
they're a huge win.

One can argue that it flat files become hard to search.  This is not 
hard to solve.   There are a number of ways - some more clever than others.

-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 
Wilberforce