Disk Enclosures
1748041 Members
5107 Online
108757 Solutions
New Discussion юеВ

AutoRAID timeout

 
Dan Rosen
Frequent Advisor

AutoRAID timeout

Yes, someone is still using the AutoRAID 12H...

We recently relocated the D-class server and 12H AutoRAID to a datacenter. The move went fine, all connections were clearly labeled, and had no problems upon startup. The system name did not change, just the ip address information.

After the system started up, I have been getting intermittent errors from the arraymgr:

=======================================
Mon Sep 26 04:11:56 2005
Array Monitor Daemon
=======================================
access error: autoraid server timed out getting disk array status on /dev/rdsk/c3t0d0 at tsihp1

I may get one or two of these a day, no specific time interval. I have not seen these messages in the past 4 years we have had this system running. System performance seems to be fine otherwise, and the user load is much lower than it used to be.

Is this the equivalent to the check engine light coming on when you need your oil changed?
Or is this a deeper issue?

TIA
7 REPLIES 7
A. Clay Stephenson
Acclaimed Contributor

Re: AutoRAID timeout

I still have a few of these old beasts running as well. The most likely cause of your problem is the default IO timeout setting of 30 seconds. You should use pvchange and set the IO timeout to 120-180 seconds -- a much more appropriate value for an array LUN. To be on the safe side, I would also shutdown the array using the front panel, power of the array, and then reseat each disk, controller, and power supply and carefully check each cable connection. Poor termination can cause this kind of error as well but because you are only getting these errors on 1 LUN (at least that's what you seem to be saying) and because a LUN bears no relationship to disks, I'm more inclined to believe it's a problem related to the LUN itself and thus the pvchange -t has a very good change of fixing you. Man pvchange for details.
If it ain't broke, I can fix that.
Stefan Stechemesser
Honored Contributor

Re: AutoRAID timeout

Hi,

this is maybe a simple network problem that have been introduced by changing the IP address.
Most likely, a DNS lookup by the Array Monitor deamon did not succeed.

Please check if

nslookup hostname
nslookup ipaddress

works fine and gives the correct information.

best regards

Stefan
Sameer_Nirmal
Honored Contributor

Re: AutoRAID timeout

I guess you might have already tried re-starting the ARMserver and arraymond using
/sbin/init.d/hparray stop/start.
If not , do it and observe.

It seems from error that the arraymond deamon is not able to get status of that LUN from the array causing time-out. The deamon runs at every 15 mints interval.

Are you getting this error only for LUN /dev/rdsk/c3t0d0 ?
Do you get any slow response running vgdisplay of a VG to which this LUN belongs or pvdisplay ?

What about the output shown by
arraydsp -a
arraydsp -l /dev/rdsk/c3t0d0
logprint
arraylog -u
arraylog -e

As ACS said above, you need to re-seat each disk after shutting down the array. I believe that if any one of the disks underneath the LUN is not working properly ( like initialising, rotation delays or media problem etc ) consitutes overall delay of LUN response. If the response time of a LUN goes above than that arraymond deamon expects , time-out would occure.
This may happen because of bad seating of disk or any such hardware physical problems which could be possible in the transit.

Check for termination as well.

Nemer_1
Regular Advisor

Re: AutoRAID timeout

Hi,
Also examin your /var/adm/syslog/syslog.log file if it contains any "SCSI time out" messages most properly this is an issue in your SCSI bus (cables, and terminators). check them well.

Another thing check the firmware version of your array and installed disks.

check if your array respond well for arraymgr commands. try to shut down the array using

arraymgr -s shut

then go and see the array LCD if it display that the array is shutted down or not. if not, this may prove that you have a problem in your SCSI bus.

use: "arraymgr -s start" to start the array again.

Dan Rosen
Frequent Advisor

Re: AutoRAID timeout

Thanks for all the replies.

ACS - I changed the timeout to 60 seconds, still got them this morning, I will pump it up to 120.

I did find a SCSI timeout this morning (the only one in syslog.log, even though it has been running for 10 days). It it possible that it will see a timeout when the array is backing up to tape? If arraymond is checking every 15 minutes, I am only seeing the errors once, maybe twice a day.

I don't think it is network related (all the internal references check out).

I did also see some SCSI: Abort Tag messages and and LVM: Recovered Path.

If I try to look at arraydsp or any of the other commands, I get "arraydsp cannot display - may be initializing."

I am having someone look at the unit (we moved it 400 miles away) to see what it reads onboard. But, during testing, it did read all clear.

Thanks everyone, points are forthcoming...
Nemer_1
Regular Advisor

Re: AutoRAID timeout

Hi,

Check the SCSI cables an Terminiators. try to replace them.
Stefan Stechemesser
Honored Contributor

Re: AutoRAID timeout

Hi Dan,

it looks like the ARMserver deamon is not running and this is the reason why you got this message.

I don't think that this is a problem with the Autoraid, because otherwise your syslog should be filled with SCSI errors or/and the luns are not accessible.

In many cases, I saw this error when there have been wrong IP addresses for this host in the /etc/hosts file (in your case: still the old IP ?) and the IP address was one of the things you have changed in this environment.

If the IP configuration is OK, then please try the following:

- /sbin/init.d/hparray stop
- ps -ef | grep ARM
- if ARMserver is still running kill the process
- /sbin/init.d/hparray start
- see if there are any messages on the console or in syslog.log
- after a few minutes run "arraydsp -i". Is this still giving the same message ?

The ARMserver is not needed to access the Luns, but only to control the diskarray (diagnostics, create new luns etc.) so the above can be done online.

best regards

Stefan