[lug] Unable to cleanly reboot

Michael Deck deckm at cleansoft.com
Thu Oct 18 13:40:37 MDT 2001


Thanks. I'm learning something here but, unfortunately, I haven't yet 
collared the problem. 

First I did "init 3" which sent me back to a command-line prompt. At that run 
level a lot of stuff was still in use but there were a number of services 
that I saw I didn't want or need on this machine so I killed them and took 
them out of chkconfig. Then I did "init 2" and that killed a few more 
services. I did lsof on all the partitions again and, after killing some 
services manually, saw that only 'login', 'bash', and 'lsof' were using files 
on /dev/hda6. I assumed that was OK and did "init 1" that froze the system. 
It told me it had stopped the random service and eth0 successfully, then 'no 
more processes in this runlevel" and it's once again time for a hard restart 
and fsck. 

What else should I be looking for? Or was there something I missed in all 
that?

-Mike

On Thursday 18 October 2001 12:53 pm, you wrote:
> Michael Deck wrote:
> > Help! I'm unable to reboot my Linux box without a hard reset (and fsck
> > all drives). When, as root, I type "reboot" it goes through the steps of
> > stopping relevant services and then says, "No more processes in this
> > runlevel" but then it just hangs. Unlike it did before (or on my other
> > Linux boxen) where it pauses and then the md recovery thread gets woken
> > up which powers off or reboots the box. This is a huge drag. I really
> > need some advice on how to fix this because it's 15 minutes to reboot
> > otherwise. I'm not (presently) running X so the default runlevel is 3.
> >
> > Here's what has been happening on this box, in case it's of use.
> >
> > I've been trying to replace Mandrake 7.2 with KRUD 9-01 for the past 3
> > days. First, I couldn get initrd.img to boot and the hardware was
> > suspected. So I put a newer CDROM drive in and tried again.
> >
> > Booting worked, but I was getting sporadic failures to find rpm files
> > during the actual install. Each of these install attempts left the system
> > in a more or less unusable state. I did go back and re-install Mandrake
> > from CD successfully but it has many older RPMs and (unfortunately) at
> > some point I trashed /usr so my various patches and updates were lost.
> > Sigh.
> >
> > This morning I figured out how to do a hard-disk based install and tried
> > that. This particular box can't successfully copy the CD's but I was able
> > to us another Linux box to copy them into ISO images and then upload them
> > to the target system. Voila!, I thought, and I (once again) commenced the
> > KRUD installation. Text mode, but it got done.
> >
> > But it still has this same ugly problem of not shutting down cleanly.
> >
> > Your suggestions appreciated!
> >
> > -Mike
> >
> > _______________________________________________
> > Web Page:  http://lug.boulder.co.us
> > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>
> A couple of general tools. The main one being lsof. It lists the users
> of a filesystem resource. For example, if you have hard drive hda, and
> partition hda1, and program is using a file on hda1, then you can do
> "lsof /dev/hda1" and it'll list the users.
>
> Second tool, you can go to /etc/rc.d/init.d and use "./whatever stop" to
> stop a given service in the same way that a shutdown would do (it'll be
> back after reboot or "./whatever start").
>
> I believe you will also find that runlevel 2 is single user with
> networking, and runlevel 1 is single user without networking. Manually
> run "init 2" to drop to runlevel 2. If nothing was hung up, run "init 1"
> and drop to runlevel 1. If nothing hung up, then you have basically the
> minimum of services running before hang. If you want to go back to
> runlevel 3, just "init 3".
>
> While in your lowest runlevel (runlevel 0 is halt, runlevel 6 reboots,
> runlevel 1 is the lowest interactive level), run lsof against the hard
> drives that are mounted. You will have to run it against each partition,
> not just the drive (I once had a similar hang because of a bug with a
> sym link from a man page causing the partition to think it was still in
> use, had to delete the sym and relink it once a newer kernel version was
> in). Before you bother with panicking over a large number of users of a
> partition, run "chkconfig --list" and view services that are running
> from your current runlevel (or rather, for services that are supposed to
> be running). If you see something optional that will complicate your
> search, got o /etc/rc.d/init.d/ and run the "./whatever status" to see
> if it really runs (maybe it'll say "service is stopped but subsystem is
> locked" instead); then run "./wahtever stop" to stop the service. Just
> be careful not to stop something you need. Eventually you can
> investigate the partition with lsof and decide exactly what processes
> are candidates for the lockup, and attempt to work on each in turn. If
> for example you saw that netscape was still locking, you know damn well
> you found your problem. Maybe it'll be like the problem I found long
> ago, and a sym link will be mistaken for an open file.
>
> D. Stimits, stimits at idcomm.cmo
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug




More information about the LUG mailing list