System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

I got the critical alarm " Driver detected that device is offline." on server

AWEB
Occasional Visitor

I got the critical alarm " Driver detected that device is offline." on server

I got the critical alarm "  Driver detected that device is offline." on server 2 and cluster changed to server1 and i got this massage   Event 64  "  Driver detected that device is offline." 

 

How can i check more details and How can i solve the problem ?

Thank you

 

 

2 REPLIES
Bill Hassell
Honored Contributor

Re: I got the critical alarm " Driver detected that device is offline." on server

To locate the cause for the event, look in the Service Guard directory /etc/cmcluster for the cluster log file. The end of the log will provide details about the error and the switch to the alternate node. Also look at the end of /var/adm/syslog/syslog.log (on the node that failed) for error messages.



Bill Hassell, sysadmin
Matti_Kurkela
Honored Contributor

Re: I got the critical alarm " Driver detected that device is offline." on server

How exactly did you get the alarm? Did your alarm system parse it from syslog? Was the alert received via WBEM? Or something else?

Is this a blade or a stand-alone server? Which HP-UX version? Which hardware model? (The outputs of commands "uname -a" and "model" would be enlightening.)

https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c02279502

It looks like it might be an alert from a FibreChannel HBA, indicating that a connection to a disk or LUN was lost.

- Read the recent messages from the system's syslog file: /var/adm/syslog/syslog.log

- Run "ioscan -fnkCdisk" to see if any disk/LUN is in a NO_HW state. On HP-UX 11.31 you can add -N to the options; there are also other ioscan options you can use to query the health of disks/LUNs.

- Use the fcmsutil command to check the state of the FibreChannel HBAs. If a HBA has lost link, check the cable. (Someone accidentally disconnected a wrong cable, or kinked a fibre in a rack door, perhaps?)

- If it's just a single LUN, see if that LUN was actually in use. If not, was it unpresented on purpose from the SAN side without removing the device from the operating system using the "rmsf" command first?

MK