Disk Enclosures
1751975 Members
4499 Online
108784 Solutions
New Discussion

Re: MSA500 HA Linux multipath can't see one path

 
John Paget Bourke
Occasional Advisor

MSA500 HA Linux multipath can't see one path

Hi,

We have two identical systems, one can only see one path of a multi path

md: autorun ...
md: considering cciss/c1d0p1 ...
md: adding cciss/c1d0p1 ...
md: created md0
md: bind
md: running:
md: cciss/c1d0p1's event counter: 0000012a

The other can see both paths

md: considering cciss/c0d1p1 ...
md: adding cciss/c0d1p1 ...
md: adding cciss/c1d0p1 ...
md: created md0
md: bind
md: bind
md: running:
md: cciss/c0d1p1's event counter: 00000552
md: cciss/c1d0p1's event counter: 00000552


Linux SCSI driver can see both paths on boot up

SCSI subsystem driver Revision: 1.00
HP CISS Driver (v 2.4.54.RH1)
cciss: Device 0x46 has been found at bus 4 dev 3 func 0
blocks= 142253280 block_size= 512
heads= 255, sectors= 32, cylinders= 17433 RAID 1(0+1)

blocks= 142261440 block_size= 512
heads= 255, sectors= 32, cylinders= 17434 RAID 1(0+1)

blocks= 142261440 block_size= 512
heads= 255, sectors= 32, cylinders= 17434 RAID 1(0+1)

blocks= 142261440 block_size= 512
heads= 255, sectors= 32, cylinders= 17434 RAID 1(0+1)

blk: queue c04fefa0, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
Partition check:
cciss/c0d0: p1 p2 p3
cciss/c0d1: p1
cciss/c0d2: p1
cciss/c0d3: p1 p2
cciss: Device 0x46 has been found at bus 10 dev 1 func 0
blocks= 142261440 block_size= 512
heads= 255, sectors= 32, cylinders= 17434 RAID 1(0+1)

blocks= 142261440 block_size= 512
heads= 255, sectors= 32, cylinders= 17434 RAID 1(0+1)

blocks= 142261440 block_size= 512
heads= 255, sectors= 32, cylinders= 17434 RAID 1(0+1)

blk: queue c04ff070, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
cciss/c1d0: p1
cciss/c1d1: p1
cciss/c1d2: p1 p2


So Linux can see both paths but miltipath cannot.

Any ideas ?

Thanks

john
1 REPLY 1
John Paget Bourke
Occasional Advisor

Re: MSA500 HA Linux multipath can't see one path

During array controller upgrade activities, somehow, we lost one path of the redundant multipath paths to the disk array, the path via the onboard controller on each POP.

The activation/deactivation of multipath paths is independent of each server, so it was a coincidence that both serverss had the same problem, or the action which caused the path to be deactivated was common to both serverss.

The deactivated path did not appear at boot time and did not appear when you ran the multipath checks. So it did not appear as deactivated, it just did not appear at all.

If you physically disconnect the working path and you try to run on the deactivated path only, you got a kernel panic error, which is effectively a crash. This is not a lot of help when trying to problem solve.

To clear. If you physically disconnect the working path and then try

Path deactivated, onboard physically disconnected - Server boots
Path deactivated, onboard physically connected - Server crashes
Path activated, onboard physically disconnected - Server boots
Path activated, onboard physically connected - Server boots

So this is bizarre. If the path is deactivated and the cable is disconnected, the server boots. If the path is deactivated and the cable is connected, the server crashes. So leaving a deactivated path connected will cause a server crash (assuming the other cable is disconnected). This is not right in any sense.

It is clear that the firmware/drivers for multipath have some issues, and the fact that they caused a kernel panic crash rather than just reporting an error has made this look like a hardware problem. And this put us on a path of finding a hardware problem which did not exist.

Finally the other confusing thing for me is, if one path was deactivated and stayed deactivated during each boot, where is the activation/deactivation stored. It is not stored in any config file. I have asked Red Hat to tell me. If there was a config file I could at least have looked there and seen that the path had been deactivated.