HPE EVA Storage

Identifying faulty disk on EVA4000

 
SOLVED
Go to solution
R.ob
Occasional Advisor

Identifying faulty disk on EVA4000

I would be most grateful if someone could point me in the right direction regarding how to identify which disk in a loop is causing problems, mostly the error: A Fibre Channel exchange to a physical disk drive has completed but is missing data.

The log points out almost every disk between 1 and 14 in the cabinet at random times when there is probably only one messing the communications up on that loop.

Right now the only solution I see is to ungroup and remove one disk at a time but it takes forever with leveling and all.

Another thing I have been wondering about; is the array protected during leveling or will a disk failure during leveling destroy the whole group?

7 REPLIES 7
IBaltay
Honored Contributor

Re: Identifying faulty disk on EVA4000

Hi,
there could be
a) media errors (block reallocation within the disk)
b) mechanical errors (servo, positioning)
c) port interface errors (loop failures)
d) SMART (prediction - spindles, arms, headers, platters)

When a controller puts FC loop in a failed state due to excessive port errors, the other controller also puts the same loop in a failed state to prevent the access to occur over the properly functioning loop.

Disk errors related to loop failures can impact other disks and controller operation. Failures affecting one port also affect other disks and can be detected and reported by any device â downstreamâ from
the problem.

There is a special functionality in Webes/IRS/RSP - call-out/notification rules to the HP HW teams for the proactive disk replacement. This should be kept strict not to wait until the controller log shows the disk in a failed state and thus risk the double disk failure in the same Disk group/same RSS. Usualy during the failure the controller does the ungroup of the disk (migrate all its data to other disks in the disk group). Once the disk is ungrouped/data migrated, another disk failure will be only a 1 disk failure in the Disk group.

Yes there is the Disk group function of the disk protection to reserve the free virtual space for the automaticaly/manual addition to the disk group when needed in the emmergency situations.
Single disk protection means the capacity of 2 virtual spare drives, Double disk protection means the capacity of 4 virtual spare drives, so avoid to use NONE here in your production.

the pain is one part of the reality
R.ob
Occasional Advisor

Re: Identifying faulty disk on EVA4000

Maybe I was unclear. We are having problems on the FC loop generating all kind of errors because of one disk in the cabinet that probably is bad, although none is failed in the EVA.

Is there an easy way to determine which disk it is or do I have to ungroup and remove one disk at a time until problems clear? This is the way I have had to do it before since almost every disk in the box generates errors right now because of garbage in the loop.

IBaltay
Honored Contributor

Re: Identifying faulty disk on EVA4000

Hi,
if there is an excessive number of port problems on the loop, HP support could be called to run the special loop test (FCLX).
the pain is one part of the reality
Víctor Cespón
Honored Contributor
Solution

Re: Identifying faulty disk on EVA4000

R.ob you need to look at two things:

1) The "missing data" events are always on the same loop?

2) Are there any disk giving "SCSI parity error?

If all disks give "missing data", on the same loop, and there are no "SCSI parity error" it's the I/O module.

Get the log checked by some with experience on HP.
R.ob
Occasional Advisor

Re: Identifying faulty disk on EVA4000

That makes sense. There are no parity errors and just now I got one "A drive enclosure transceiver error has been detected" error as well.

So I´ll try to replace to cable tonight and we´ll see.

/Rob
R.ob
Occasional Advisor

Re: Identifying faulty disk on EVA4000

Yes!

I changed the cable for the loop and no errors since.

Thanks a bunch for the help. Weekend saved!

/Rob
R.ob
Occasional Advisor

Re: Identifying faulty disk on EVA4000

Closing thread.