Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

Host intermittently loosing connection to DS2405 array (FC) (help!! :S)

SOLVED
Go to solution
Andrew Beal_1
Occasional Visitor

Host intermittently loosing connection to DS2405 array (FC) (help!! :S)

Hi,

I have a very strange problem wich our hp-ux host.

It is an L3000, running hp-ux 11.11, with the latest fibre channel patches applied recently after we started experiencing this problem.

It all started with ems reporting a fault HBA in the host. I contacted HP, and had the faulty HBA replaced.

The problem then got worse... EMS was not only reporting problems with the new HBA, but everything on the path below 0/3/0/0!! HP then came and replaced the LCC in the DS2405, and the FC cable between the HBA, and the LCC.

This did not resolve the issue...

When running an ioscan loop, devices will randomly appear offline... exaple below shows ioscan running every second on bus 0/3/0/0, grepping for NO_HW... (this example ran for appriximatley 2 mins)

[root@host] /var/adm/syslog->while true^Jdo^Jioscan -fnH 0/3/0/0 | grep NO^Jsleep 1^Jdone
disk 27 0/3/0/0.8.0.255.0.4.0 sdisk NO_HW DEVICE HP 36.4GST336752FC
disk 27 0/3/0/0.8.0.255.0.4.0 sdisk NO_HW DEVICE HP 36.4GST336752FC
disk 27 0/3/0/0.8.0.255.0.4.0 sdisk NO_HW DEVICE HP 36.4GST336752FC
disk 28 0/3/0/0.8.0.255.0.5.0 sdisk NO_HW DEVICE HP 36.4GST336752FC
disk 25 0/3/0/0.8.0.255.0.2.0 sdisk NO_HW DEVICE HP 36.4GST336752FC
target 14 0/3/0/0.8.0.255.0.6 tgt NO_HW DEVICE
disk 29 0/3/0/0.8.0.255.0.6.0 sdisk NO_HW DEVICE HP 36.4GST336752FC
target 17 0/3/0/0.8.0.255.0.9 tgt NO_HW DEVICE
disk 32 0/3/0/0.8.0.255.0.9.0 sdisk NO_HW DEVICE HP 36.4GST336752FC
disk 27 0/3/0/0.8.0.255.0.4.0 sdisk NO_HW DEVICE HP 36.4GST336752FC

cstm also has issues obtaining information from the devices on that path...

16 0/3 PCI Bus Adapter (782) Information Successful
17 0/3/0/0 Fibre Channel Interface ( Information Successful
18 0/3/0/0.8 Fibre Channel Driver (Mas
19 0/3/0/0.8.0.255.0.0. SCSI Disk (HP36.4GST33675 Information Incomplete
20 0/3/0/0.8.0.255.0.1. SCSI Disk (HP36.4GST33675 Information Warning
21 0/3/0/0.8.0.255.0.2. SCSI Disk (HP36.4GST33675 Information Warning
22 0/3/0/0.8.0.255.0.3. SCSI Disk (HP36.4GST33675 Information Warning
23 0/3/0/0.8.0.255.0.4. SCSI Disk (HP36.4GST33675 Information Warning
24 0/3/0/0.8.0.255.0.5. SCSI Disk (HP36.4GST33675 Information Warning
25 0/3/0/0.8.0.255.0.6. SCSI Disk (HP36.4GST33675 Information Warning
26 0/3/0/0.8.0.255.0.7. SCSI Disk (HP36.4GST33675 Information Warning
27 0/3/0/0.8.0.255.0.8. SCSI Disk (HP36.4GST33675 Information Warning
28 0/3/0/0.8.0.255.0.9. SCSI Disk (HP36.4GST33675 Information Warning
29 0/3/0/0.8.0.255.0.15 Disk Enclosure (HPA6255A) Information FAILED

when running a dd on a block device on that bus, it takes an increably long time to read any disk device... example.

[root@host] /var/adm/syslog->fg %1
timex dd if=/dev/dsk/c6t6d0 of=/dev/null bs=2048k
25+0 records in
25+0 records out

real 7:24.06
user 0.00
sys 0.37

[root@host] /var/adm/syslog->iostat 1 50 | grep c6t6
c6t6d0 0 0.0 1.0
c6t6d0 70 17.8 1.0
c6t6d0 247 62.0 1.0
c6t6d0 7 2.0 1.0
c6t6d0 74 18.6 1.0
c6t6d0 271 68.0 1.0
c6t6d0 71 18.0 1.0
c6t6d0 155 39.0 1.0
c6t6d0 54 13.7 1.0
c6t6d0 70 17.8 1.0
c6t6d0 238 59.8 1.0
c6t6d0 79 20.0 1.0
c6t6d0 70 17.8 1.0
c6t6d0 159 40.0 1.0

However when running a dd on the raw device, the performance is much better...

[root@host] /var/adm/syslog->timex dd if=/dev/rdsk/c6t6d0 of=/dev/null bs=2048k &
[1] 17090
[root@host] /var/adm/syslog->iostat 1 50 | grep c6t6
c6t6d0 0 0.0 1.0
c6t6d0 15615 61.0 1.0
c6t6d0 501 2.0 1.0
c6t6d0 5068 19.8 1.0
c6t6d0 16981 66.3 1.0
c6t6d0 4561 17.8 1.0
c6t6d0 1520 5.9 1.0
c6t6d0 4768 18.6 1.0
c6t6d0 11912 46.5 1.0
c6t6d0 3548 13.9 1.0
c6t6d0 506 2.0 1.0
c6t6d0 5119 20.0 1.0
c6t6d0 17324 67.7 1.0

[root@host] /var/adm/syslog->fg %1
timex dd if=/dev/rdsk/c6t6d0 of=/dev/null bs=2048k
94+0 records in
94+0 records out

real 53.64
user 0.00
sys 0.04

The Firmware on the LCC's, and HBA was at HP01 (latest was HP05). Apparently in HP01, there was a problem with the firmware which could display symptoms such as the ones described above. We then upgraded the firmware to HP05 on the LCC, HBA, and all the disks in the array were upgraded to HP06.

This still did not resolve the issue... :S

So, I have had this call going with HP for the last 3 weeks, and they have got no idea what is going on... We have not made any progress for days... So I thought I would call on the experience of you guys.

Any help / suggestions would be greatly appreciated.

Regards,

Andrew
1 REPLY
Andrew Rutter
Honored Contributor
Solution

Re: Host intermittently loosing connection to DS2405 array (FC) (help!! :S)

hi,

Is the PDC firmware upto date also?

Sounds like this could be a backplane issue or core I/o, but i'd suspect the backplane first.
Is there any errors in the gsp logs or other hpux logs?

andy