[lug] strange nfs behavior

D. Stimits stimits at idcomm.com
Mon Jul 15 13:44:15 MDT 2002


Stroud James wrote:
> 
> I am having strange behavior with my nfs from our sgi machines (exporting)
> onto our linux machines (mounting). Under certain circumstances the linux
> machines fail to see certain files. For example, if in tcsh
> 
> structure 66% ls
> total 4
>    0 dbate/     0 han/     4 stroud/     0 lin/
> structure 67% echo *
> dbate han lin
> 
> Notice how stroud (me, unfortunately) is missing in the ehco. csh has
> similar behavior. However bash works fine in this case:
> 
> [adm at structure chn]$ ls
> dbate  han  stroud  lin
> [adm at structure chn]$ echo *
> dbate han stroud lin

Sometimes ls or echo are built into the shell, but there are separate
binaries as well. For example, sometimes ls has an alias to ls-F in
tcsh, which is a faster ls -F built into the shell. Not all of these
programs are combined to see large files (and I think this can include
the NFS code at either side of the NFS) over 2 GB. Is it possible to
find the reported size on any of the "missing" directories or files (if
exporting from a non-linux filesystem, or even some of the journaled
linux filesystems, directories can be listed as rather large rather than
just 512 bytes)?

There might also be some version sensitivity just between the brands of
NFS, and the filesystem type that is being mounted as NFS might also
matter (especially since the SGI side probably supports ACL's and EA's,
it could be those files are marked in a way that your local filesystem
does not support). I'd strongly suggest you go to oss.sgi.com and find a
mailing list on the topic. The XFS filesystem list might even be the
right list if the SGI machine is running XFS (and these guys have their
act together, they listen and fix and respond quickly).

> 
> (some names have been changed to protect the innocent)
> 
> Here I exist in both cases, thankfully. This is not an isolated
> phenomenon. I find many files that display this behavior. Also there seems
> to be subclasses within this phenomenon. For instance, 'ls' will display
> all, 'echo *' will display less than 'ls' and a mozilla window will
> display only a subset from the 'echo *' You may suggest to just stick with
> bash. For me that would be okay, but to the innocents (non-administrative
> types), learning a new shell is impractical--if you are an administrator
> you will understand how regular-old users, perhaps justifiably so, resist
> learning new ways to use computers. And also, we all must use mozilla and
> other browsers for accessing log files produced in html.

If you are feeling motivated, try to find the source to tcsh, and
compile it both with and without large file support:
 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 \
   -D_LARGEFILE_SOURCE

> 
> I should also point out that Mandrake 8.2 seems better behaved in csh
> behavior than Redhat 7.1 (less missing entries), but Mandrake 8.2 is worse
> in its mozilla behavior (more missing entries).

This tends to say it is the shell, and for this the large file support
is probably it. Note that SGI filesystems are able to support absolutely
enormous sizes, it could be the volume that is being mounted in some way
marks files as exceeding the size the particular programs can deal with.
The above flags are a large part of what is needed in those cases to get
them to work right (there might be some sort of file open with "64" in
it that is substituted, but I believe the above flags cause the headers
to use the right open command, unless a custom header defines it). In
fact, you might find source to any failing program, and try to compile
with those flags above and see if it fixes it.

> 
> So, my question is whether this is a problem with linux or a problem with
> with the way sgi exports its file systems? Also, is there a way to get
> csh, bash, tcsh, mozilla, etc., on the same page by renaming a library or
> something?

If the problem is NFS version, then kernel patch and/or selecting a
different version might work. If it is due to ACL (Access Control Lists,
an advanced permissions device that is part of XFS filesystem), then the
ACL would have to be altered directly or indirectly (I doubt this is the
case, because then all shells or apps would behave the same). If it is
due to large file size, then either compiling the apps to support large
files will do it, or else moving the files in question to a lower offset
on the disk will fix it (this assumes small files offset large distances
from the beginning of disk; if the files themselves were over some large
limit like 2 GB, then the files would have to be broken into pieces).

D. Stimits, stimits at idcomm.com

PS: Find out what filesystem type is natively holding those files, if
that filesystem type supports ACL's, and if it supports very large files
that exceed 2 GB. If it is XFS filesystem, then you can get a lot of
good help from the oss.sgi.com XFS list.

> 
> Any help would be greatly appreciated.
> 
> James
> 
> ---------------------------
> James Stroud
> University of Colorado
> Boulder, CO 80309
> USA
> Tel: 303-492-4503
> Fax: 303-735-1347
> ---------------------------
> 
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667 channel=#colug



More information about the LUG mailing list