[lug] Unable to cleanly reboot

D. Stimits stimits at idcomm.com
Thu Oct 18 12:53:31 MDT 2001


Michael Deck wrote:
> 
> Help! I'm unable to reboot my Linux box without a hard reset (and fsck all
> drives). When, as root, I type "reboot" it goes through the steps of stopping
> relevant services and then says, "No more processes in this runlevel" but
> then it just hangs. Unlike it did before (or on my other Linux boxen) where
> it pauses and then the md recovery thread gets woken up which powers off or
> reboots the box. This is a huge drag. I really need some advice on how to fix
> this because it's 15 minutes to reboot otherwise. I'm not (presently) running
> X so the default runlevel is 3.
> 
> Here's what has been happening on this box, in case it's of use.
> 
> I've been trying to replace Mandrake 7.2 with KRUD 9-01 for the past 3 days.
> First, I couldn get initrd.img to boot and the hardware was suspected. So I
> put a newer CDROM drive in and tried again.
> 
> Booting worked, but I was getting sporadic failures to find rpm files during
> the actual install. Each of these install attempts left the system in a more
> or less unusable state. I did go back and re-install Mandrake from CD
> successfully but it has many older RPMs and (unfortunately) at some point I
> trashed /usr so my various patches and updates were lost. Sigh.
> 
> This morning I figured out how to do a hard-disk based install and tried
> that. This particular box can't successfully copy the CD's but I was able to
> us another Linux box to copy them into ISO images and then upload them to the
> target system. Voila!, I thought, and I (once again) commenced the KRUD
> installation. Text mode, but it got done.
> 
> But it still has this same ugly problem of not shutting down cleanly.
> 
> Your suggestions appreciated!
> 
> -Mike
> 
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug

A couple of general tools. The main one being lsof. It lists the users
of a filesystem resource. For example, if you have hard drive hda, and
partition hda1, and program is using a file on hda1, then you can do
"lsof /dev/hda1" and it'll list the users.

Second tool, you can go to /etc/rc.d/init.d and use "./whatever stop" to
stop a given service in the same way that a shutdown would do (it'll be
back after reboot or "./whatever start").

I believe you will also find that runlevel 2 is single user with
networking, and runlevel 1 is single user without networking. Manually
run "init 2" to drop to runlevel 2. If nothing was hung up, run "init 1"
and drop to runlevel 1. If nothing hung up, then you have basically the
minimum of services running before hang. If you want to go back to
runlevel 3, just "init 3".

While in your lowest runlevel (runlevel 0 is halt, runlevel 6 reboots,
runlevel 1 is the lowest interactive level), run lsof against the hard
drives that are mounted. You will have to run it against each partition,
not just the drive (I once had a similar hang because of a bug with a
sym link from a man page causing the partition to think it was still in
use, had to delete the sym and relink it once a newer kernel version was
in). Before you bother with panicking over a large number of users of a
partition, run "chkconfig --list" and view services that are running
from your current runlevel (or rather, for services that are supposed to
be running). If you see something optional that will complicate your
search, got o /etc/rc.d/init.d/ and run the "./whatever status" to see
if it really runs (maybe it'll say "service is stopped but subsystem is
locked" instead); then run "./wahtever stop" to stop the service. Just
be careful not to stop something you need. Eventually you can
investigate the partition with lsof and decide exactly what processes
are candidates for the lockup, and attempt to work on each in turn. If
for example you saw that netscape was still locking, you know damn well
you found your problem. Maybe it'll be like the problem I found long
ago, and a sym link will be mistaken for an open file.

D. Stimits, stimits at idcomm.cmo



More information about the LUG mailing list