cancel
Showing results for 
Search instead for 
Did you mean: 

SAN read/write issues...

Allen Brand
Occasional Advisor

SAN read/write issues...

# uname -a
OSF1 xxxx.xxxxx-xx.com V5.1 1885 alpha

I have an Alpha ES40 connected to an HP EVA3000 via FC HBAs. The EVA3000 is presenting vdisks to two other Alphas--an ES45 running V5.1 2650 and a DS20 also running V5.1 1885--and all is fine between them.

Despite the fact that the ES40 sees the vdisks (see here):
# hwmgr -show scsi
102: 42 nova disk none 2 4 dsk39 [1/0/1]
103: 43 nova disk none 0 4 dsk40 [1/0/2]
104: 44 nova disk none 0 4 dsk41 [1/0/3]
105: 45 nova disk none 2 4 dsk42 [1/0/4]
106: 46 nova disk none 0 4 dsk43 [1/0/5]

If I attempt to do any kind of reading or writing to any of these vdisks, however, the process hangs indefinitely, with no way to kill it. Even just attempting to read the disklabel results in this hang, as well as any 'dd' tests I attempt to run.

I have unpresented one of the vdisks to represent it to another Alpha, and it works fine. Initially, I thought it unlikely but possible the hardware was bad, so I replaced both HBAs and the FC cable connecting them to the switch, and selected different ports to connect them to on the switch, which resulted in no change in the behavior of the system.

I found this odd, considering that the DS20 with the same HBAs and same version of the OS had no issues. In an effort to rule out the EVA, I unpresented then deleted every vdisk I created for this system and rebuilt them. As with the other two Alphas, the EVA sees the WWNs from the system, I'm able to present them to the host ES40 and as far as the EVA knows, everything is fine.

Clearly it isn't.

I saw no errors in the logs regarding driver issues with the HBAs (they're FCA-2354 adapters...actually, the Emulex-branded version, but the same thing nonetheless), and obviously the communication between the EVA and the ES40 is "working" because hwmgr was able to rescan the scsi bus(ses) and discover the newly-created vdisks on different HWIDs. I even see the driver owner value change for the devices, indicating that the driver has opened the device for reading or writing...

And still, the process hangs.

Obviously, the Alpha in question is a production system, and I can't very well start installing the latest patches and changing things around without some scheduling to do so. I have some things I can try...changing PCI slots with the HBAs, recompiling the kernel again, install updates...

Pertinent info:

FCA-2354 : Driver Rev 2.11 : F/W Rev 3.93A0(1.40A1)

Anyone have any ideas?

4 REPLIES
Hein van den Heuvel
Honored Contributor

Re: SAN read/write issues...

V5.1 build 1885 is basic 5.1A
Maybe you need to add more recent patch kits?

And yes, It would remove doubts if you could go to V5.1B (2650) but like you wrote, it seems to work on an other V5.1a box.

It's a strech, but how about booting genvmunix and rebuilding the kernel (doconfig -c ) with the FCA-2354 emx0 controller showing?

I would also play around with the various hwmgr commands to make sure all is as expected:

hwmgr -show scsi
hwmgr -show fibre -adapter
hwmgr -show fibre -topo

fwiw,
Hein.
Allen Brand
Occasional Advisor

Re: SAN read/write issues...

Hmmm, after some time spent seriously thinking about this, I believe I have found the answer to my own problem.

First, the driver version is not 2.11

It's 1.30A, and secondly the patch that contains the updated driver is not installed on that system.

I've scheduled a tentative date for downtime on that machine, and am fairly certain that the patch (T64V51AB24AS0006-20031031) will fix this problem. I'm sure most will concur with my guess.

I will, of course, relay my experiences on this at a later date.
DCBrown
Frequent Advisor

Re: SAN read/write issues...

Driver Rev 1.30 was release ~03/2001. 1.30A would seem to indicate that this was some type of patch, possibly manually installed. That may be the reason why it didn't get replaced during patch kit installation.

Bud
DCBrown
Frequent Advisor

Re: SAN read/write issues...

Forgot to mention, you could use scu and then send TUR (test unit ready) or INQUIRY to devices to see what the response was. This is just about the simplest io that can be done against the device.

# scu
scu> sbtl
scu> TUR

bud