cancel
Showing results for 
Search instead for 
Did you mean: 

NIC Failure

 
Highlighted
Super Advisor

NIC Failure

Hi

 

I have a rx7640 running hp-ux 11.31, on a two node cluster using Mc Serviceguard. in one of the nodes apparently 2 NICs and 1 fiber card are not working, so the cluster cannot form, with network errors. I them physically check the hardware and I found that the NICs do not have any activity (no LED) and the fiber card, all the LEDs (amber, green and red) are lit.

The errors on the logs are as follows:

 

 

 DLPI error ack for primitive 11 with 8 0
Oct 28 13:33:31 dbnode0 cmclconfd[5782]: Unable to attach to network interface 2
Oct 28 13:33:31 dbnode0 cmclconfd[5782]: Unable to attach to DLPI: I/O error
Oct 28 13:33:34 dbnode0 cmclconfd[5783]: Request from root on node dbnode1 to start the cluster on this node
Oct 28 13:33:36 dbnode0 cmcld[5784]: Unable to get IPv6 interface information.
Oct 28 13:33:41 dbnode0 cmcld[5784]: Daemon Initialization - Maximum number of packages supported for this incarnation is 300.
Oct 28 13:33:41 dbnode0 cmcld[5784]: Global Cluster Information:
Oct 28 13:33:41 dbnode0 cmcld[5784]: Network Polling Interval is 2.00 seconds.
Oct 28 13:33:41 dbnode0 cmcld[5784]: IO Timeout Extension is 0.00 seconds.
Oct 28 13:33:41 dbnode0 cmcld[5784]: Auto Start Timeout is 600.00 seconds.
Oct 28 13:33:41 dbnode0 cmcld[5784]: Failover Optimization is disabled.
Oct 28 13:33:41 dbnode0 cmcld[5784]: Information Specific to node dbnode0:
Oct 28 13:33:41 dbnode0 cmcld[5784]: Cluster lock disk: /dev/dsk/c18t1d5.
Oct 28 13:33:41 dbnode0 cmcld[5784]: lan900  0x001a4b098e7a  10.1.20.240  bridged net:1
Oct 28 13:33:41 dbnode0 cmcld[5784]: lan2  0x001a4b098efc  192.168.20.103  bridged net:2
Oct 28 13:33:41 dbnode0 cmcld[5784]: lan0  0x002264340c88  192.168.30.103  bridged net:1
Oct 28 13:33:41 dbnode0 cmcld[5784]: lan8  0x001a4b098e2a    standby    bridged net:1
Oct 28 13:33:41 dbnode0 cmcld[5784]: lan6  0x0024812408b8    standby    bridged net:2
Oct 28 13:33:42 dbnode0 cmcld[5784]: Heartbeat Subnet: 10.1.20.0
Oct 28 13:33:42 dbnode0 cmcld[5784]: Heartbeat Subnet: 192.168.20.0
Oct 28 13:33:42 dbnode0 cmcld[5784]: Heartbeat Subnet: 192.168.30.0
Oct 28 13:33:42 dbnode0 cmcld[5784]: Failed to bind to 192.168.20.103:5300: Can't assign requested address
Oct 28 13:33:42 dbnode0 cmclconfd[5783]: The Serviceguard daemon, cmcld[5784], exited with a status of 1.
Oct 28 13:51:11 dbnode0 cmclconfd[11143]: DLPI error ack for primitive 11 with 8 0
Oct 28 13:51:11 dbnode0 cmclconfd[11143]: Unable to attach to network interface 2
Oct 28 13:51:11 dbnode0 cmclconfd[11143]: Unable to attach to DLPI: I/O error

 

 

other error that I found on rc.log:

 

 

mountall: cannot mount /dev/vgora/orabin
mountall: diagnostics from mount
UX:vxfs mount: ERROR: V-3-20003: Cannot open /dev/vgora/orabin: No such device or address
UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version

 

 

 

a simple "netstat -in" would result in:

 

 

netstat -in
Name      Mtu  Network         Address         Ipkts              Ierrs Opkts              Oerrs Coll
lo0      32808 127.0.0.0       127.0.0.1       30687              0     30687              0     0
lan900    1500 10.1.20.0       10.1.20.240     313571             0     300203             0     0

 

 

and using ioscan to see the fiber cards:

 

 

#ioscan -fnC fc
Class     I  H/W Path    Driver S/W State   H/W Type     Description
==================================================================
fc        2  1/0/12/1/0  fcd   CLAIMED     INTERFACE    HP AB379-60101 4Gb Dual Port PCI/PCI-X Fibre Channel Adapter (FC Port 1)
                        /dev/fcd2
fc        3  1/0/12/1/1  fcd   CLAIMED     INTERFACE    HP AB379-60101 4Gb Dual Port PCI/PCI-X Fibre Channel Adapter (FC Port 2)
                        /dev/fcd3

 

 

here I should see 4 outputs, but its only showing 2.

I have now ended up using for now only one node of the cluster.

But I am still not sure if the cards are really dead.