HPE EVA Storage

EVA8000 disk failure messages

Mel Nugent
Regular Advisor

EVA8000 disk failure messages

I had a 1TB FATA drive fail in a disk disk group last night with 16 disks. In command view I keep getting a pop up which I never got before with any other disk failures. The disk has been replaced and the rebuilding is at 76%. The protection is single on the disk group.
Any know what this messages means. Does it mean another disk failure would lead to data loss?

"A hardware failure has occurred in this disk group. The advanced virtualization features of your system have prevented any data from being lost, but the level of protection in the disk group is degraded. Please repair the hardware failure as soon as possible to restore maximum hardware failure protection to your system.

If you have already repaired this disk group, the system is in the process of redistributing data across the group's disk drives. Redistribution may take some time, depending on the size of the group and its drives. This message will continue to appear until that action is complete"
Mel Nugent
Regular Advisor

Re: EVA8000 disk failure messages

After the disk failed last night the controller did a "controller resynchronization". This seems to be some sort of "quick reboot". Has any one seen this before? I looked it up and
Found the following

The following functions cause the controllers to resynchronize automatically:
-Initialization of a storage cell
-New XCS software loaded on the controllers with HP Command View EVA
-Debug flags set through the command line interface or the OCP
-Unresolved disk group hardware issue (that is, meltdown) on a Vraid1 disk group or return of a disk during this condition
-Bad block replacement (BBR) performed on a disk
-Memory allocation failure and other very rare errors
-Deletion of the default disk group
Note meltdown is not my words but what it says on HP website! Also since I had a disk fail that disk group hardware failure seems most likely to have caused my resynch.

I have also noticed that two other disk groups are levelling at the moment
32*450GB disk group
16*450Gb disk group

The resynch seems to have fixed a fixed a "cosmetic" issue I had with the second of these groups reporting an incorrect occupancy level (twice the actual used disk and more than was physically available).

Anyone any experience of this controller resynchs or multiple disk groups levelling after a disk failure.

Uwe Zessin
Honored Contributor

Re: EVA8000 disk failure messages

If the *rebuilding* is at 76% it means that redundancy of the data has not fully been restored. 24% of the data currently has no protection if the 'wrong' disk fails. A disk group with 16 disks should have two RSS, so it could survive another disk failure if it happens in the unaffected RSS. Unfortunately neither we nor the EVA does now when and which disk fails next...

I've never seen 'unexpected' reboots, erm. 'resyncs' ;-), but I have never run EVAs in production so I did not pay close attention.
But I _have_ seen EVAs starting leveling (data redistribution) for no _apparent_ reason. I've been told that the EVA monitors I/Os to disks and can move data around if some disk drives have developed 'hot spots'.