[lug] Stale NFS File Handle
Dan Ferris
dan at usrsbin.com
Thu Aug 4 19:42:47 MDT 2005
It was occuring under load, I wrote a shell script to see if the
automounter was causing the problem. I'm running in on our Sun E450
Solaris POS, a Fedora Core 3 box, and a Red Hat Enterprise 3 box. Most
of the problems were actually on a the Sun box. The Sun box is MUCH
more suceptiable to this problem than any of the Linux boxes. The Sun
box runs Apache from /usr/local which is an NFS mount. It's Document
Root is also coming via NFS. I have grad students that use the web
server on the Sun box all day every day. Plus there are people from all
over the world that access it 24x7. So the load isn't high, but its
constant. The rest of the load on the NFS server comes from the Linux
workstations. During the mornings, the load does get pretty high. I've
seen load averages well over 5 at times and over 10 when it gets really bad.
According to the NFS FAQ (http://nfs.sourceforge.net/) there are only 5
reasons that you will get this error:
/
- The file resides in an export that is not accessible./ It could have
been unexported, the export's access list may have changed, or the
server could be up but simply not exporting its shares
I don't think this is my problem. All the clients and the server have
been rebooted since the migration to LVM2. Some have been rebooted
multiple times.
/
The file handle refers to a deleted file./ After a file is deleted on
the server, clients don't find out until they try to access the file
with a file handle they had cached from a previous LOOKUP. Using rsync
or mv to replace a file while it is in use on another client is a common
scenario that results in an ESTALE error.
I doubt this as well. Most people use only 1 workstation, if somebody
deleted somebody elses file then the problem would only happen to one or
two people once or twice.
/
The file was renamed to another directory, and subtree checking is
enabled on a share exported by a Linux NFS server./ See question C7
<http://nfs.sourceforge.net/#faq_c7> for more details on subtree
checking on Linux NFS servers.
This makes the most sense. Especially with the automounter. I did turn
off subtree checking, which seems to have helped.
/
The device ID of the partition that holds your exported files has
changed./ File handles often contain all or part of a physical device
ID, and that ID can change after a reboot, RAID-related changes, or a
hardware hot-swap event on your server. Using the "fsid" export option
on Linux will force the fsid of an exported partition to remain the
same. See the "exports" man page for more details.
I might set this at a later date, but it's not worth the hassle of
rebooting everything again. Especially since I'm not doing any failover
or redundancy.
/
The exported file system doesn't support permanent inode numbers./
Exporting FAT file systems via NFS is problematic for this reason. This
problem can be avoided by exporting only local filesystems which have
good NFS support. See question C6 <http://nfs.sourceforge.net/#faq_c6>
for more information.
Nope. According to the FAQ, ext3, jfs, xfs, and reiser should all work
find over NFS.
The only other possibility is "other" which would be problems with LVM2.
At the moment everything finally seems stable.
I can use rsync to make a fairly high load on the ext3 volume to see if
the problem occurs there.
Dan Ferris
Lee Woodworth wrote:
> Dan Ferris wrote:
>
>> An Update...
>>
>> Turning off subtree checking seems to have mitigated the problem, but
>> not solved it.
>
> Do things change under load? The problem I had didn't show up unless
> there was some load. But once there was load the stale handle error
> consistently occurred -- rsyncing a dir tree with 110,000+ files would
> always trigger the problem. I use xfs with lvm2. I didn't change file
> systems to fix the problem.
>
>>
>> Tomorrow I'm going to test out an LVM volume with ext3 instead of
>> jfs. I'm going to test with both fstab mounts and the automounter to
>> see if the automounter makes any difference. If ext3 solves the
>> problem I guess I'll just have to figure out a way to live with it's
>> limits. Or I'll try reiserfs to see if that works. I've had
>> problems with reiser in the past, maybe it's better now. It would
>> kind of suck to move off of jfs, I've been very happy with it.
>>
>> I'll report back to the list what I find out.
>>
>> Thank you again for all the responses, they have been most helpful.
>>
>> Dan Ferris
>>
>>>
>> _______________________________________________
>> Web Page: http://lug.boulder.co.us
>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>> Join us on IRC: lug.boulder.co.us port=6667 channel=#colug
>
>
> _______________________________________________
> Web Page: http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667 channel=#colug
>
>
More information about the LUG
mailing list