[lug] clustering/network help and ideas...
Alan Robertson
alanr at unix.sh
Thu Aug 4 00:36:30 MDT 2005
Dallas Masters wrote:
> I mostly lurk, but I'm working on some ideas for a network or
> pseudo-cluster that need help with the file system. The LAN will be
> made up of Gigabit connected "clients." I want use the cheap disks in
> each machine to be part of a growable, LAN-wide, transparent file
> system rather than use a centralized file system server. In fact, I
> don't foresee that I will have any real server, except for one or two
> clients which do web services, etc. This is because it's simple and
> frequent that we buy new desktops machine which always have lots of
> disk space and are under-utilized (the LAN will be a new model network
> in the Aerospace Dept. at CU). I have looked into GFS, PVFS, and AFS
> for the file system. Not exactly sure which one is best. I'm sure
> NFS would work, but it seems slow and insecure (NFSv4?). I am
> imagining GFS using local disks as global network block devices (GNBD)
> served from each client machine. But my understanding of GFS is still
> vague. Is it a faster or easier solution than NFS serving and
> mounting on each client? Is GFS only really useful in "real"
> clusters? Any ideas or advice would be appreciated.
I'm not sure you can easily use GFS in this way. GFS mostly expects
that the disk for a filesystem "directly" accessible from each machine.
You can (as you said) get at these from GNBD, but I'm having trouble
seeing how you will do what you want (paste them together into one
coherent whole).
Another filesystem which might be a little better match would be Lustre.
It does not assume that each machine can access every disk. But, it's
kind of difficult to set up, and the open source version is not the
newest one.
The implications for backups, crashes, down machines of any of these
arrangements aren't trivial - and will require a lot of careful
planning. I don't think this is easily done in such a decentralized
arrangement.
Another thought would be to look at 3ware controllers (or equivalent)
and cheap IDE or SATA drives on a server - maybe running DRBD between
two of them to get high availability (and even greater redundancy). You
can get terabytes of storage for pretty cheap - and then not be bothered
when someone decides to do something weird with their workstation.
Another way of putting it:
You can probably make what you want work.
But do you really want your storage accessibility to be a
research project?
Let's look at some sample costs:
4x 300GB drives $203*4 812
1x 3ware controller $360 360
This gives you 900GB (effective) of storage for $1172
If you replace those with 500GB drives, you spend $1800 for 1.5 TB
(delivered capacity).
But, the point is, that it's not _that_ expensive. And, it's
RAID-protected, always available, and fairly easy to manage.
If you want to get even better protection from data loss, you can
replicate the data with DRBD - and locate the two complete
(raid-protected) copies of the data up to 100M away from each other. It
does double the costs, raising it to $3600 for your 1.5 TB. But, you
can take either server down for maintenance without losing access to
your data, and you get two fully RAID protected copies of the data -
located in different parts of your building.
If you want to forgo the RAID5 protection, and just go with DRBD, the
costs stay the same, but you get 2 TB of data instead.
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
More information about the LUG
mailing list