[lug] Unable to cleanly reboot

Michael Deck deckm at cleansoft.com
Thu Oct 18 15:04:51 MDT 2001


On Thursday 18 October 2001 02:59 pm, you wrote:
> Michael Deck wrote:
> > Thanks. I'm learning something here but, unfortunately, I haven't yet
> > collared the problem.
> >
> > First I did "init 3" which sent me back to a command-line prompt. At that
> > run level a lot of stuff was still in use but there were a number of
> > services that I saw I didn't want or need on this machine so I killed
> > them and took them out of chkconfig. Then I did "init 2" and that killed
> > a few more services. I did lsof on all the partitions again and, after
> > killing some services manually, saw that only 'login', 'bash', and 'lsof'
> > were using files on /dev/hda6. I assumed that was OK and did "init 1"
> > that froze the system. It told me it had stopped the random service and
> > eth0 successfully, then 'no more processes in this runlevel" and it's
> > once again time for a hard restart and fsck.
>
> I wonder if possibly it is eth0 that killed things. This was the last
> thing it did before apparently locking up? While in init 2 or init 1,

I can't get to init 1. When in init 2, if I type "init 1" it hangs. I can 
kill eth0 without apparent problem. When I ctrl-alt-del it says "initlevel 6" 
and hangs. 

> can you do "init 6" and have it reboot? Or "init 0" and have it
> correctly halt? Also, for your normal shutdown, instead of calling
> "reboot", try:
> shutdown -r now
> OR:
> shutdown -h now

Doesn't matter, they all have the same effect. 

>
> If those work right, maybe the problem is just the "reboot" scripts is
> messed up.
>
> > What else should I be looking for? Or was there something I missed in all
> > that?
>
> Look at the output of "ps aux" for more hints of what is still running.
> Look at "chkconfig --list" and do "/etc/rc.d/init.d/whatever status" on
> everything that is supposed to be running, see if it is really running,
> and not failed but subsystem locked (PostgeSQL does this if the system
> isn't cleanly rebooted). Maybe show a list of "lsof /dev/whatever" for
> all partitions that are relevant, just before you get to a stage where
> it locks up.

>
> And a very minor thing, when doing your last init or shutdown command,
> first cd to "/". I say this because it is possible that a subdirectory
> can be considered in use if someone has done a cd to that directory. In
> theory it should kill your shell and not lock it in use.

That sounds promising. I'll try it. 

>
> D. Stimits, stimits at idcomm.com
>
> > -Mike
> >
> > On Thursday 18 October 2001 12:53 pm, you wrote:
> > > Michael Deck wrote:
> > > > Help! I'm unable to reboot my Linux box without a hard reset (and
> > > > fsck all drives). When, as root, I type "reboot" it goes through the
> > > > steps of stopping relevant services and then says, "No more processes
> > > > in this runlevel" but then it just hangs. Unlike it did before (or on
> > > > my other Linux boxen) where it pauses and then the md recovery thread
> > > > gets woken up which powers off or reboots the box. This is a huge
> > > > drag. I really need some advice on how to fix this because it's 15
> > > > minutes to reboot otherwise. I'm not (presently) running X so the
> > > > default runlevel is 3.
> > > >
> > > > Here's what has been happening on this box, in case it's of use.
> > > >
> > > > I've been trying to replace Mandrake 7.2 with KRUD 9-01 for the past
> > > > 3 days. First, I couldn get initrd.img to boot and the hardware was
> > > > suspected. So I put a newer CDROM drive in and tried again.
> > > >
> > > > Booting worked, but I was getting sporadic failures to find rpm files
> > > > during the actual install. Each of these install attempts left the
> > > > system in a more or less unusable state. I did go back and re-install
> > > > Mandrake from CD successfully but it has many older RPMs and
> > > > (unfortunately) at some point I trashed /usr so my various patches
> > > > and updates were lost. Sigh.
> > > >
> > > > This morning I figured out how to do a hard-disk based install and
> > > > tried that. This particular box can't successfully copy the CD's but
> > > > I was able to us another Linux box to copy them into ISO images and
> > > > then upload them to the target system. Voila!, I thought, and I (once
> > > > again) commenced the KRUD installation. Text mode, but it got done.
> > > >
> > > > But it still has this same ugly problem of not shutting down cleanly.
> > > >
> > > > Your suggestions appreciated!
> > > >
> > > > -Mike
> > > >
> > > > _______________________________________________
> > > > Web Page:  http://lug.boulder.co.us
> > > > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> > >
> > > A couple of general tools. The main one being lsof. It lists the users
> > > of a filesystem resource. For example, if you have hard drive hda, and
> > > partition hda1, and program is using a file on hda1, then you can do
> > > "lsof /dev/hda1" and it'll list the users.
> > >
> > > Second tool, you can go to /etc/rc.d/init.d and use "./whatever stop"
> > > to stop a given service in the same way that a shutdown would do (it'll
> > > be back after reboot or "./whatever start").
> > >
> > > I believe you will also find that runlevel 2 is single user with
> > > networking, and runlevel 1 is single user without networking. Manually
> > > run "init 2" to drop to runlevel 2. If nothing was hung up, run "init
> > > 1" and drop to runlevel 1. If nothing hung up, then you have basically
> > > the minimum of services running before hang. If you want to go back to
> > > runlevel 3, just "init 3".
> > >
> > > While in your lowest runlevel (runlevel 0 is halt, runlevel 6 reboots,
> > > runlevel 1 is the lowest interactive level), run lsof against the hard
> > > drives that are mounted. You will have to run it against each
> > > partition, not just the drive (I once had a similar hang because of a
> > > bug with a sym link from a man page causing the partition to think it
> > > was still in use, had to delete the sym and relink it once a newer
> > > kernel version was in). Before you bother with panicking over a large
> > > number of users of a partition, run "chkconfig --list" and view
> > > services that are running from your current runlevel (or rather, for
> > > services that are supposed to be running). If you see something
> > > optional that will complicate your search, got o /etc/rc.d/init.d/ and
> > > run the "./whatever status" to see if it really runs (maybe it'll say
> > > "service is stopped but subsystem is locked" instead); then run
> > > "./wahtever stop" to stop the service. Just be careful not to stop
> > > something you need. Eventually you can
> > > investigate the partition with lsof and decide exactly what processes
> > > are candidates for the lockup, and attempt to work on each in turn. If
> > > for example you saw that netscape was still locking, you know damn well
> > > you found your problem. Maybe it'll be like the problem I found long
> > > ago, and a sym link will be mistaken for an open file.
> > >
> > > D. Stimits, stimits at idcomm.cmo
> > > _______________________________________________
> > > Web Page:  http://lug.boulder.co.us
> > > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> >
> > _______________________________________________
> > Web Page:  http://lug.boulder.co.us
> > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug




More information about the LUG mailing list