[lug] clustering/network help and ideas...

Alan Robertson alanr at unix.sh
Thu Aug 4 00:36:30 MDT 2005


Dallas Masters wrote:
> I mostly lurk, but I'm working on some ideas for a network or
> pseudo-cluster that need help with the file system.  The LAN will be
> made up of Gigabit connected "clients."  I want use the cheap disks in
> each machine to be part of a growable, LAN-wide, transparent file
> system rather than use a centralized file system server.  In fact, I
> don't foresee that I will have any real server, except for one or two
> clients which do web services, etc.  This is because it's simple and
> frequent that we buy new desktops machine which always have lots of
> disk space and are under-utilized (the LAN will be a new model network
> in the Aerospace Dept. at CU).  I have looked into GFS, PVFS, and AFS
> for the file system.  Not exactly sure which one is best.  I'm sure
> NFS would work, but it seems slow and insecure (NFSv4?).  I am
> imagining GFS using local disks as global network block devices (GNBD)
> served from each client machine.  But my understanding of GFS is still
> vague.  Is it a faster or easier solution than NFS serving and
> mounting on each client?  Is GFS only really useful in "real"
> clusters?  Any ideas or advice would be appreciated.

I'm not sure you can easily use GFS in this way.  GFS mostly expects 
that the disk for a filesystem "directly" accessible from each machine. 
  You can (as you said) get at these from GNBD, but I'm having trouble 
seeing how you will do what you want (paste them together into one 
coherent whole).

Another filesystem which might be a little better match would be Lustre. 
  It does not assume that each machine can access every disk.  But, it's 
kind of difficult to set up, and the open source version is not the 
newest one.

The implications for backups, crashes, down machines of any of these 
arrangements aren't trivial - and will require a lot of careful 
planning.  I don't think this is easily done in such a decentralized 
arrangement.

Another thought would be to look at 3ware controllers (or equivalent) 
and cheap IDE or SATA drives on a server - maybe running DRBD between 
two of them to get high availability (and even greater redundancy).  You 
can get terabytes of storage for pretty cheap - and then not be bothered 
when someone decides to do something weird with their workstation.

Another way of putting it:
	You can probably make what you want work.
	But do you really want your storage accessibility to be a
		research project?

Let's look at some sample costs:
	4x 300GB drives		$203*4	812
	1x 3ware controller	$360	360

This gives you 900GB (effective) of storage for $1172

If you replace those with 500GB drives, you spend $1800 for 1.5 TB 
(delivered capacity).

But, the point is, that it's not _that_ expensive.  And, it's 
RAID-protected, always available, and fairly easy to manage.


If you want to get even better protection from data loss, you can 
replicate the data with DRBD - and locate the two complete 
(raid-protected) copies of the data up to 100M away from each other.  It 
does double the costs, raising it to $3600 for your 1.5 TB.  But, you 
can take either server down for maintenance without losing access to 
your data, and you get two fully RAID protected copies of the data - 
located in different parts of your building.

If you want to forgo the RAID5 protection, and just go with DRBD, the 
costs stay the same, but you get 2 TB of data instead.

-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 
Wilberforce



More information about the LUG mailing list