[lug] System.map issue

D. Stimits stimits at idcomm.com
Fri Feb 22 17:24:51 MST 2002


Chris Riddoch wrote:
> 
> "D. Stimits" <stimits at idcomm.com> writes:
> > > http://www.peakpeak.com/~socket/System.map-corrupted.txt
> >
> > Corrupted is indeed strange. Did you test if the actual copy from
> > /usr/src/linux/ to the /boot/ directory is uncorrupted *prior* to
> > running lilo?
> 
> I'm quite positive that /usr/src/linux/System.map is NOT corrupted.
> Again, it's at http://www.peakpeak.com/~socket/System.map-original.txt
> if you're unconvinced.

Nope...I said just the opposite, that I *am* convinced that corrupted is
corrupted and that the original is not corrupted.

> 
> > > I didn't, originally. I apparently use the same naming convention as
> > > you for System.map, and I changed lilo.conf appropriately. I tried it
> > > both ways, and both ways, it gets corrupted.
> >
> > Before or after running lilo or your boot loader program[...]?
> 
> It gets corrupted sometime between the time I run 'shutdown -r now'
> and the time the init scripts start trying to do things that touch
> System.map for whatever reason.
> 
> grepping /etc/init.d/ for System.map shows that it only gets passed as
> a parameter to /sbin/klogd. I wonder if klogd is doing it?

Unlikely. There are actually a number of programs that might access it
indirectly if module support is used during a boot.

> 
> > > And this is what it gets changed to, in it's corrupted form.
> > >
> > > socket at laptop:~$ ls -l /boot/System.map
> > > -rw-------    1 root     root        26624 Feb 22 12:53 /boot/System.map
> >
> > It is truncated.
> 
> Nope.
> 
> socket at laptop:/boot$ sudo md5sum System.map
> 78274e85a299d62404d8776859e235ea  System.map
> socket at laptop:/boot$ cd
> socket at laptop:~$ dd if=/usr/src/linux/System.map of=./Sys.map bs=1 count=22624
> 22624+0 records in
> 22624+0 records out
> socket at laptop:~$ ls -l Sys.map
> -rw-rw-r--    1 socket   socket      22624 Feb 22 16:37 Sys.map
> socket at laptop:~$ md5sum Sys.map
> 330fd046a390c73e2fb80fc2809da7a7  Sys.map
> 
> Now, if it were only *truncated*, the md5sum would match here, wouldn't it?

I did not intend it to say it was *only* truncated. My point is that the
size changed and existing data being altered was not the only event.
Because it shrunk, it could be an EOF was inserted, or it could have
lost inodes during fsck. Determining that it did not lose inodes is
important because it limits the problem to applications writing to it,
rather than an unclean umount.

> 
> socket at laptop:~$ md5sum System.map-corrupted.txt
> 78274e85a299d62404d8776859e235ea  System.map-corrupted.txt
> 
> If you, or someone else, could help me figure out what the heck it's
> being replaced *with*, that would probably be a hint about what's
> going on here. 'file System.map' just says a rather uninformative
> "data". It's at a URL I've already given in this thread.

One thing that is interesting is that a chunk appears to just be plain
text and other parts look like random binary, at least on an earlier
corrupted version that was posted. [but I notice the current web site
corrupted version is now much smaller than it used to be]

> 
> > Is /boot/ on the same partition as the main partition, or is it a
> > separate partition?
> 
> Separate partition.
> 
> socket at laptop:~$ ls /boot/lost+found/
> socket at laptop:~$ ls /lost+found/

Actually, you'd want ls -aF, else "hidden" dot files won't show up. But
likely there is nothing here.

> socket at laptop:~$
> 
> And there's no fscking going on at bootup. No evidence of crashing or
> general trashing of the filesystem.

Is /boot/ ext2? Also, as an experiment, try this. Copy the uncorrupted
version to /boot/. Verify that it is not corrupted. Then use the mount
command and tell it to remount read-only. Verify once more. Then do a
reboot and see if it corrupts still. If it does corrupt after that, then
you can probably eliminate the shutdown scripts as a cause and consider
it a bootup script problem.

> 
> > > I even tried removing write permissions from System.map. It *still*
> > > got changed.
> >
> > Most bootup scripts will use root permissions and force any writes. I'm
> > curious if there is some access to /boot/ that is stalling out and thus
> > causing an unclean umount. Now if /boot/ is a separate partition, you
> > can run fuser -v on /boot and I would expect the only access just before
> > reboot would show up as "/boot  root kernel  mount  /boot". If you have
> > a shell open and have done a cd to that directory, it could conceivably
> > cause an unclean umount if the process of the shell or something the
> > shell is running fails to terminate before the filesystem umount.
> 
> User processes *should* get killed long before the unmount processes
> run.  In my experience, unmount operations are relatively conservative
> and will refuse to unmount a busy system - so if this were the case,
> I'd at least notice something like what happens when I try to unmount
> my CDROM when I'm playing MP3s from it:

If it remains busy a timeout occurs and shutdown continues. This results
in corruption. So this is the very point...if something is accessing it
and making it refuse to cleanly umount, it will still shut down, but
possibly doing damage. There are conditions under which normal
termination signals do not work, such as during some crash conditions or
filesystem errors, so although it *should* kill processes, it might not.
Your system has an error, don't expect it to do what it *should* do,
otherwise this would not be happening.

> 
> umount: /cdrom: device is busy
> 
> > >
> > > *Something* is breaking this, and it's clearly not a problem with the
> > > original file.
> >
> > What is in your lilo.conf? I would have to wonder if the System.map is
> > being named for something other than what it should be named as.
> 
> This is the significant part of what's in my lilo.conf:
> map=/boot/System.map

BINGO. "map" is NOT System.map, it is part of the chain loader. In fact
System.map should not be listed in lilo.conf at all. System.map is
accessed by programs directly or via kernel programming, it is never
hard coded into the boot sector. In your /boot/ should be a file
literally called "map". Try this instead:
map=/boot/map

One is a map for the boot loader, another is a symbol map for module
symbol exports. Very very different.

> 
> Previously, it was:
> map=/boot/System.map-2.4.17

This too was incorrect. It would have overwritten the file. There is no
bug, nor unexpected corruption, it is doing exactly what you told it to
do.

D. Stimits, stimits at idcomm.com

> 
> The only thing this seems to determine is which file gets corrupted.
> The same thing happens either way.
> 
> In case it's relevant, lilo is the one from debian's testing package:
> 22.1-6
> 
> --
> Chris Riddoch       | epistemological
> socket at peakpeak.com | humility
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug



More information about the LUG mailing list