[lug] Clustering and GFS

Thu Sep 12 15:33:59 MDT 2002

Jeff Schroeder wrote:
> D. wrote:
> 
> 
>>In any case, before
>>you can figure out how you can best obtain 100 MB/s, you need to know
>>what kind of files are used: lots of small files, or a few very large
>>files.
> 
> 
> It's image-processing stuff, so there are a handful of gigantic files.

SGI designed XFS to handle enormous files that were being used for 
graphics/sound rendering and editing/polygon-crunching on large 
clusters. You will not find any other underlying filesystem that can 
handle (efficiently) that much data  on large files with continuous 
output. And it is journaling.

> 
> 
>>Frankly, I cannot imagine a cluster of
>>machines with sustained disk throughput of 100 MB/s without lots of
>>cash.
> 
> 
> The disks themselves (over which I have no control) are in the $30,000 
> range and use fiber channel for the high data rates.  So yes, there's 
> definitely a recognition that a filesystem with the throughput and 
> accessibility desired is going to cost a bundle.
> 
> 
>>PS: Has the client stated why the GFS is so important?
> 
> 
> The disks being used are a custom hardware solution which have only been 
> tested with GFS.  While it's certainly possible to hash over the merits 
> and detriments of all sorts of filesystems, the client is sticking to 
> this point.

But what is GFS? Is it the base filesystem, or is it a layer over the 
real filesystem? For example, NFS is called a filesystem, but it always 
has something else *under* it, e.g., an exported ext2 partition. Or is 
GFS natively a base filesystem that also does something more?

I am also curious if the machines in question have a short test period 
available where you could at least do a technology demo to test 
something other than GFS, and simply wipe it clean if the test is 
unaccepted?

D. Stimits, stimits AT attbi.com