[lug] My last hope....and nerve

D. Stimits stimits at comcast.net
Mon Oct 29 18:28:17 MDT 2007


Steve A Hart wrote:
> I'm still dealing with two Promise UltraTrak RM8000 raid arrays and 
> I'm getting desperate to find an answer to my problem.  Here's the 
> rundown and hopefully someone out there can help.
>
> Let's keep this simple.   I have a single Promise UltraTrak RM8000 
> connected to an LSI logic SCSI card.  The OS is Fedora Core 6 and when 
> the OS starts up, all I see is a repeating SCSI bus reset over and over.
>
> I can say with 100% certainty that the problem is NOT the following:
> SCSI host ID
> The SCSI cable
> the terminator (terminated correctly)
> LSI card
> motherboard of the host system
>
> That only leaves the OS and the promise raid itself.  I know the 
> RM8000 did run on FC4 running the 2.6.16 kernel but ever since the 
> 2.6.18 kernels came out it's has not worked.  Now I have it connected 
> to a FC6 system and still no luck.
A few things for the desparate to try or consider...

Drives themselves often go bad, and it isn't unusual for a batch of 
brand new drives to arrive with many bad. Can you try to swap the drives 
themselves between the good array and a bad array? Look very closely at 
the pins, and the "feel" of how they seat into the bays. Swap the entire 
set of drives, not just 1 at a time. Get a single bay hot swap 
carrier/tray, and test them one at a time as well (this is an 
extraordinarily useful and cheap test tool).

Feel the temperature with fingers on each of the drives in the working 
and non-working sets, and see if something stands out as significantly 
different, either cold or hot, it might point something out.

Try to format and mount individual partitions made from each disk, not 
in a RAID set or LV...simplify it to the simplest use of each disk if 
possible, without other layers on top of it.

Perhaps if volume labels are used, there is a naming problem...try 
mounting those individual partitions by exact /dev/ name, without any 
kind of automount and without any kind of label. Remove all fstab 
entries with labels while doing this.

If the disks are identical, and formatted identically, then after an 
fdisk dump of geometry, a pipe through sort and uniq should be short:
fdisk -l | cut -d ' ' -f 2- | sort | uniq

(the cut is to remove the field with the drive name, e.g., /dev/sda, 
sort followed by uniq removes duplicates, and there should be many 
duplicates...what remains should be similar)

Perhaps searching with badblocks might indicate trouble on a boot 
record...a long process.

If you run lspci -b, you'll notice that PCI bus listings are of the 
format of bus:device:subdevice (not technically, but that's the basic 
idea, since a PCI bus is bridged to other PCI busses, and a given 
physical device can contain more than one function, e.g., a sound card 
can contain standard sound + joystick controller + midi). Remove or swap 
devices which compete on the same bus...sometimes devices do not play 
fairly for DMA control in a buggy way which collides with another PCI 
device (in which case moving to another slot will make it work).

Physically swap anything involved, and look for any change in behavior, 
see if anything is in common.

D. Stimits, stimits AT comcast DOT net



More information about the LUG mailing list