[lug] Interesting Common File Locking Problem

D. Stimits stimits at comcast.net
Sat Mar 6 16:15:27 MST 2004


Zan Lynx wrote:
> On Fri, 2004-03-05 at 19:20, D. Stimits wrote:
> 
>>Zan Lynx wrote:
>>
>>>I spent some time figuring this one out.  I spent a few hours scratching
>>>my head over this one.  I thought I would share for any of you
>>>programmers out there.  Also, I just have to vent the frustration build
>>>up to _someone_, or it'll bother me all weekend!  Heh.
>>>
>>>Some programs appear to lock files using this method:
>>>
>>>- open the file, get a file descriptor.
>>>- lock the file descriptor.
>>>- write data into a temporary file.
>>>- optionally fsync the data to ensure it is really in there.
>>>- rename temporary into real file name, which deletes the original,
>>>which removes the lock.
>>>
>>>
>>>
>>>At first glance, this looks reasonable and safe.  What I discovered is
>>>that this can happen:
>>>- ProgA opens the file, get a file descriptor.
>>>- ProgB opens the file, gets a file descriptor.
>>>- ProgB locks the file descriptor.  Now ProgA is waiting...
>>>- ProgB creates temporary file, writes data into it.
>>>- ProgB now renames temporary to real name.
>>>- ProgB unlocks, closes, exits, etc.
>>>- ProgA finally gets its lock!  Yay for ProgA!
>>>But wait!  What lock is this!?
>>>Could it be a lock on the now removed original file!?  The file ProgB
>>>just deleted by renaming over it?
>>>Why Yes!  It could!
>>>
>>>Does anything stop ProgC from coming along and getting a lock on the
>>>file name?
>>>Why No, nothing does!  Because ProgA's lock is on a file that no longer
>>>has that name!  In fact, ProgA's locked file no longer has any name!
>>>
>>>And then does it stop there?  No indeed!  Because ProgC has the lock on
>>>the file name, it assumes no one else is using the file.  Now, using the
>>>same temporary file name ProgA is also using, ProgC goes ahead to
>>>truncate and write into the temporary file.  What does this lead to?  Me
>>>getting large chunks of zeros in my mail client spool file.
>>>
>>
>>You just described a textbook example from a thread programming book. 
>>Sounds like you need a mutex/semaphore system built into the filesystem 
>>itself. It might be interesting to see how journals in various 
>>journaling filesystems do it.
> 
> 
> Well, that is what file locks are for.  The filesystem locks implemented
> by lockf, flock and fcntl are all intended to make this work.  They are
> the equivalent of the mutex.
> 
> Programmers must use them _correctly_, however.

Heck, I know of one successful software company that says the *marketing 
department* has to use them correctly...programmers are not required to 
do that. :P

Seriously though, threaded programming has so many gotchas, doing it 
correctly often requires several failures first.

D. Stimits, stimits AT comcast DOT net



More information about the LUG mailing list