Disk Enclosures
1753963 Members
7144 Online
108811 Solutions
New Discussion юеВ

ARMServer problems or hardware problems?

 
Alex Georgiev
Regular Advisor

ARMServer problems or hardware problems?

I'm new to HP-UX and the HP RAID arrays, so please bear with me.

A few days ago I noticed an error message by the array monitor deamon, saying that /dev/rdsk/c1t0d0 is inaccessible. I then called the people who support our HP-UX box asking for help. They said it's a software problem with the ARMServer, and that the arraymond message comes up when the ARMServer isn't running.

The bottom line: ARMServer stops running by itself!? Crashing might be a better word, but I did not find any core files. Nobody seems to know why would the ARMServer stop all by itself... Could it be a hardware problem that's causing that?

I get the feeling that the company responsible for supporting the HP-UX 10.20 machine are not as knowledgable as they should be, and of course I myself am more into the Linux/Solaris world and have little experience with troubleshooting RAID arrays and SCSI problems.

I did find a bunch of registry dumps and SCSI error/diag messages in the syslog. It says that the SCSI bus was reset. The messages you will see sometimes appear every 2 minutes, other times every 20 or so. If anyone could help me and tell me what do the syslog messages mean and why is ARMServer stoping I would very, very much appreciate it! Even just ideas as to where I should look for clues would be helpful. A small chunk of the syslog file is attached as syslog.txt.

To cap this off, I tried getting some info with arraydsp and arraylog, and those commands don't seem to work. For example, arraydsp -i just sits there without printing anything. I would try tracing it when I get a chance, but I doubt that that will help me much. Things are just looking pretty scarry...

Let me know if I need to post any more info. Thanks for any responses in advance!
3 REPLIES 3
Insu Kim
Honored Contributor

Re: ARMServer problems or hardware problems?

ARMServer died intermittently and this can be solved out by applying the latest ARMServer patch.

Run "/sbin/init.d/hparaymgr start" or /opt/hparray/bin/ARMserver to see the status of the array.

# arraydsp -i
# arraydsp -a or verify which component has failed by examining the logs using the logprint command.

You said that this error message comes frequently so there must be an error in the AutoRAID.
I know that the error never stop until you fix problem by experience.

First, take a look at the front panel to see if it's in READY state, especially controller with SCSI ID set to 0.
and use "arraydsp -a" and "logprint".
Finally, you can also use mstm -> logtools to get a clue.

Hope this helps,
Never say "no" first.
Bill McNAMARA_1
Honored Contributor

Re: ARMServer problems or hardware problems?

there's no question you need to install the latest SCSI patches.

arraydsp -R
should rescan for hardware.

If your f/s and vgs are okay on the autoraid - ie data acess is fine, then there is no pb with the h/w typically.

ioscan -fnkCdisk
look for the C5447A device files
do a diskinfo -v /dev/rdsk/c...

The ARMServer won't start properly if LUN 0 isn't accessable.

You should always have a lun0 show up on ioscan even it it exists or not on the array.

After arraydsp -R
run top
wait for around 2 ioscans processes (full) to finish..
the second one after you see the ARMServer process appear.

Then quit top,
arraydsp -i
should work.

Man logprint
and run it to interrogate the autoraid for h/w errors.
You problem tho is possibly cable or no terminator.

Are there any other devices on the same bus as the autoraid. What's the autoraid scsi id/priority wrt the other ids on the same bus.. think of increasing it.. don't put the autoraid on the same bus as a streaming device.

arraydsp -a
to see your autoraid f/w.
The ARMServer PHCO must match the f/w revision. ie don't install the latest PHCO for ARMServer unless you have read the release notes and have upgraded autoraid controller f/w before PHCO upgrade.

what /opt/hparray/bin/ARMServer

swlist -l product | grep -i arm

Later,
Bill

It works for me (tm)
Alex Georgiev
Regular Advisor

Re: ARMServer problems or hardware problems?

Thanks for the responses! The problem turned out to be a SCSI controller gone bad. Thanks to the fact that we have 2 SCSI channels to "talk" to the RAID we could access the data just fine, but all the array* commands would hang. What bugs me the most is that the 3 HP engineers that worked on this problem could not figure anything out by looking at the error logs and the output of various diagnostics commands. It was a machine reboot that finaly brought this server down to its knees and had us calling HP hardware people on site.