Re: multipath -ll showing [faulty] path

Markus Wiedner · ‎03-12-2010

Hello,

we are running SLES 10 SP2 with device-mapper-1.02.13-6.14 with eight paths to an EVA LUN.

There has been a path interuption in one fabric and now multipath -ll is giving me this output:

\_ round-robin 0 [prio=100][active]
\_ 5:0:3:1 sdh 8:112 [active][ready]
\_ 5:0:2:1 sdg 8:96 [active][ready]
\_ 4:0:3:1 sdd 8:48 [failed][faulty]
\_ 4:0:2:1 sdc 8:32 [failed][faulty]
\_ round-robin 0 [prio=20][enabled]
\_ 5:0:1:1 sdf 8:80 [active][ready]
\_ 5:0:0:1 sde 8:64 [active][ready]
\_ 4:0:1:1 sdb 8:16 [failed][faulty]
\_ 4:0:0:1 sda 8:0 [failed][faulty]

However, if I look at the paths with adapter_info (opt/hp/hp_fibreutils), I can see that all eight paths are online again. Each of the eight is transmitting requests (so I see I/O requests going through both fabrics).

My questions are:
1) Shouldn't multipath -ll update the state of the paths automatically to [active][ready] once they are available again?
2) Will it help to issue a multipath -v1 command to update the multipath map?
3) Will the command in 2) disrupt traffic to the SAN LUN and impact the running OS (it is a BFS blade with this LUN holding the root FS)?

Thanks a lot for clarification and best regards.

Markus

Michal Kapalka (mikap) · ‎03-12-2010

hi,

check the connection to you SAN.

mikap

Matti_Kurkela · ‎03-12-2010

Do you have the multipathd daemon running?

It is the element that periodically checks for failed paths, and presumably also checks if the previously-failed paths become functional again.

I guess the kernel dm-multipath module would be smart enough to stop using a path if it produces errors, but it would not necessarily automatically resume using it when it starts to work again.

1) multipath -l and -ll just display the current state: I would not expect them to update anything.

2) Yes, it might help.

3.) If it feels it should change the configuration of a multipath device that is already in use it won't do it and will produce a "path in use" error message instead. But unless your WWIDs have somehow changed, there should be no reason for it to change the existing devices.
So it should be harmless.

MK

MK

Markus Wiedner · ‎03-15-2010

Hi,

thanks for your replies so far!

Now, I've given the multipath -v2 command a try and received the following output for each of the four devices that are in the fabric in which the path interruption occured.

sdd: not found in pathvec
sdd: mask = 0x1f
sdd: dev_t = 8:48
sdd: size = 419430400
sdd: subsystem = scsi
sdd: vendor = HP
sdd: product = HSV210
sdd: rev = 6200
sdd: h:b:t:l = 4:0:3:1
sdd: serial =
sdd: getuid = /sbin/scsi_id -g -u -s /block/%n (controller setting)
error calling out /sbin/scsi_id -g -u -s /block/sdd
sdd: prio = alua (controller setting)
sdd: couln't get supported alua states
sdd: alua prio error
error calling out /sbin/scsi_id -g -u -s /block/sdd

Again, I'm almost 100 percent sure that physically, the paths in that fabric are working fine.

What makes me certain is "adapter_info -d 4" (with 4 being the HBA pointing to the fabric in question) showing me:

LUNs
----------
( 0: 0): Total reqs 3, Pending reqs 0, flags 0x0*, Dflags 0x0, 0:0:1000 00
( 0: 1): Total reqs 5286631, Pending reqs 0, flags 0x0, Dflags 0x0, 0:0:1000 00
( 1: 0): Total reqs 3, Pending reqs 0, flags 0x0*, Dflags 0x0, 0:0:1000 00
( 1: 1): Total reqs 5263887, Pending reqs 0, flags 0x0, Dflags 0x0, 0:0:1000 00
( 2: 0): Total reqs 3, Pending reqs 0, flags 0x0*, Dflags 0x0, 0:0:1000 00
( 2: 1): Total reqs 24103445, Pending reqs 0, flags 0x0, Dflags 0x0, 0:0:1000 00
( 3: 0): Total reqs 3, Pending reqs 0, flags 0x0*, Dflags 0x0, 0:0:1000 00
( 3: 1): Total reqs 24082057, Pending reqs 0, flags 0x0, Dflags 0x0, 0:0:1000 00

The I/O requests are increasing whenever I refresh the output, so there is traffic going to the LUN through that fabric.

Only that multipath -ll has not recognized that yet and the question is: where is the problem?

Interestingly enough, output of multipath -ll has changed over the weekend to:

\_ round-robin 0 [prio=100][active]
\_ 5:0:3:1 sdh 8:112 [active][ready]
\_ 5:0:2:1 sdg 8:96 [active][ready]
\_ 4:0:3:1 sdd 8:48 [failed][faulty]
\_ 4:0:2:1 sdc 8:32 [failed][faulty]
\_ round-robin 0 [prio=20][enabled]
\_ 5:0:1:1 sdf 8:80 [active][ready]
\_ 5:0:0:1 sde 8:64 [active][ready]
\_ 4:0:1:1 sdb 8:16 [active][faulty]
\_ 4:0:0:1 sda 8:0 [active][faulty]

So the Device Mapper status for sdb and sda has changed from failed to active. However, the path status is still faulty.

My next idea is to simply restart the multipathd Daemon and see what happens.
Does anyone think that this will significantly interrupt or otherwise impact I/O going to the LUN (we BFS from this only LUN of the server)?

Hints and comments are very welcome!

Thx,

Markus

dirk dierickx · ‎03-15-2010

did you configure multipath correctly for EVA devices? different SAN devices need different settings (i don't know what EVA needs, only EMC here).

Markus Wiedner · ‎03-16-2010

I checked again, and my assumption that the paths to the SAN are back again was wrong.

I got mislead by the output of adapter_info (hp_fibreutils), seeing requests for every path:

LUNs
----------
( 0: 0): Total reqs 246, Pending reqs 0, flags 0x0, Dflags 0x0, 0:0:1000 00
( 0: 1): Total reqs 6076381, Pending reqs 0, flags 0x0, Dflags 0x0, 0:0:1000 00
( 1: 0): Total reqs 246, Pending reqs 0, flags 0x0, Dflags 0x0, 0:0:1000 00
( 1: 1): Total reqs 6053637, Pending reqs 0, flags 0x0, Dflags 0x0, 0:0:1000 00
( 2: 0): Total reqs 246, Pending reqs 0, flags 0x0, Dflags 0x0, 0:0:1000 00
( 2: 1): Total reqs 24893276, Pending reqs 0, flags 0x0, Dflags 0x0, 0:0:1000 00
( 3: 0): Total reqs 246, Pending reqs 0, flags 0x0, Dflags 0x0, 0:0:1000 00
( 3: 1): Total reqs 24871807, Pending reqs 0, flags 0x0, Dflags 0x0, 0:0:1000 00

I realized that all the paths are pointing to 1000 and cross checked with /opt/hp/hp_fibreutils/lssd -w whether the sd$ devices are actually bound to the SAN LUN - they are not!

So Device Mapper does not have a problem but the connection to the SAN. I will do a hp_rescan -a and if that doesn't help I will need to reboot the server.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: multipath -ll showing [faulty] path

multipath -ll showing [faulty] path