1832924 Members
2833 Online
110048 Solutions
New Discussion

Re: Disk Group Failure

 
Jacky Wing
Regular Advisor

Disk Group Failure

I have an eva4000 with a diskgroup of 28 disks. One of the disks failed so we replaced it. the data leveling is complete but now we get an error in the disk group with a popup that says: "A hardware failure has occurred in this disk group. The advanced virtualization ...." .
I did the following:
1- restarted the command view service --> same problem
2- restarted the SMA ---> same problem
3- reduced the usage to about 2 TB out of 7 TB (I am running a test environment so I can do that) --> same problem.
4- I ungrouped 3 disks --> same problem.
5- If you go to the redundancy it says that there is a disk that lost some data because one disk was removed ???

well, all my vdisks are running with no problem, i browsed the whole EVA storage but did not find any vraid 0 disk. everything runs normal from the vdisk point of view, but I keep getting this error .

Any idea how to troubleshoot? and I didn't find any significant log in the error log. just that one disk is removed, ....
17 REPLIES 17
Johan Guldmyr
Honored Contributor

Re: Disk Group Failure

Hi, can you please paste/attach the whole error message about the "hardware failure"?

It should tell you specifically which vdisk has failed.

If you have a test environment, wouldn't you be able to accept the losses? Do you get that option?

Also, which firmware and CV version are you running?
Jacky Wing
Regular Advisor

Re: Disk Group Failure

I am running a test environment, this is why i can reduce the storage to 2 TB out of 7 TB. but I cannot wipe out the whole disk group.

I am running CV 6.0 with CR0D63xc3p-6000.

the redundancy error says: "Your storage system contains data that is not protected by disk drive redundancy. Refer to your system documentation or your HP representative before removing disk drives from your system.
Removing a disk drive may cause loss of data because the following conditions exist in your system:

â Data on one or more disk drives is currently unavailable. A disk drive may have been removed from the system.
"

the disk group error is attached.

I am thinking of restarting both controllers one at a time.
Johan Guldmyr
Honored Contributor

Re: Disk Group Failure

When you look on the disk group in CV - does it say that everything is OK?
Jacky Wing
Regular Advisor

Re: Disk Group Failure

It gives a warning and the error message I attached before. It also says that all the vdisks are ok in the disk group (in the vdisk tab).

Re: Disk Group Failure

Can you please send us the Controller Event Log and Configuration Log here ?

I would suggest you to upgrade your CV EVA to latest version,
https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=CommandViewEVA9.3

Also please let me know the controller firmware version ?

Is this the latest version as well ?


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Johan Guldmyr
Honored Contributor

Re: Disk Group Failure

See two posts up Subhajit.

Upgrading to above 6.000 and CV 6.x may not be possible for a test environment (licensing etc..).

But I guess it's very likely that in one of the releases after 6.x there is a fix that relates to this.

Re: Disk Group Failure

As per the error message it seems to be redistribution of data is going on across the drives of the disk group.

If this is a false alert, then it may resolve by upgrading the CV EVA to latest version or by the upgrade of controller firmware.

https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=T4256-63136



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo

Re: Disk Group Failure

Hello Johan/Jacky,

I am sorry that I missed it and you are correct that later firmware release have this fix.

Jacky can try with the trial version of CV EVA9.3 as well because this is a test environment.

Is it possible to build up a separate CV EVA server with CV EVA9.3?

Otherwise you can try to reboot the master controller and check.


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Johan Guldmyr
Honored Contributor

Re: Disk Group Failure

Jacky: maybe you can provide a screenshot of the disk group page as well?
The configuration log is hard to get out of CV 6 unless you have a script, which I don't have. But maybe somebody with a good tool can help you with looking at the log.
Probably a good idea to do that before attempting to restart the controller.

But, if you would try restart controller, I would suggest you start with the one that is master (you can find which via the OCP - the display in the front of the controller).
On the one that is not master it says 'slave' in the OCP.

Command View only speaks with the master controller.
Johan Guldmyr
Honored Contributor

Re: Disk Group Failure

I've read on several other occasions in this forum that 'trying' a later CV version is not a wise idea.

Also there are lots of 'licensing problems' seen when upgrading the CV from 6 to over 7, for people who have the license.

How long can you play with the EVA with the trial license? 60 days?

Re: Disk Group Failure

Please find the procedure and attached are the scripts in order to collect logs,

1. Logon to the SAN Appliance and open "Command View EVA" .
2. Click on the Storage sub system (The Storage Cell).
3. The Storage sub system properties is now displayed on the right side. Click on "View Events" button.
4. Click on the "Controller Event Log"
5. Click on the "Get log file" to download the logs.
Repeat step 4 for â Controller Termination logsâ .

And procedure to Collect the Configuration Log or SSSU log

SSSU Logs: [From EVA LUN PERSPECTIVE]

SSSU (it is a utility) log uses to capture the configuration of the EVA.

The SSSU utility is attached to this post.

Please extract the content of the file to a folder and run this utility for the Eva's.

Use SCANV6 for command view 6 - read readme1st.txt for how to do, which is available when you unzip the content of the file.




I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Jacky Wing
Regular Advisor

Re: Disk Group Failure

As I said before, Not the whole EVA is test environment, the test environment was on 4 TB on the disk group which I deleted to enable faster leveling.

Upgrading CV or firmware is not an option right now. Also we are not sure that it will solve the issue.
In the logs I see many events of "A Volume has transitioned to the MISSING state." with a UUID that I cannot find.

The logs are very big to attach. I am attaching the one failure I always see.

I searched all the vdisks, the disks, the .... and I was not able to see anything failed.

I tried to log in to SSSU to check the available Vdisks, or the UUID but I was stuck at "SELECT SYSTEM". If I check the LS SYSTEM command it says: EVASYSTEMNAME available, so I do a "SELECT SYSTEM EVASYSTEMNAME" it says the system EVASYS... is not available.
Johan Guldmyr
Honored Contributor

Re: Disk Group Failure

You need to put in the name of your EVA.

You can see the name of it in CV.
Uwe Zessin
Honored Contributor

Re: Disk Group Failure

> the trial version of CV EVA9.3

Be careful with these "trial" versions. From what I have read, they make irrecoverable meta-data changes deep in the controller's NVRAM so that it is no longer managable by CV V6!

And XCS 6.000 has space management bugs. If the disk group runs out of free space, data corruption might result.
.
D.Glass
Advisor

Re: Disk Group Failure

I had a similar error recently on an EVA5000. The CV EVA options to ungroup or remove the failed disk had not been available so I just pulled it from the array. I put a new disk in the same bay and grouped it, but the reconstruction kept failing because it could not find the disk for that bay even though CV EVA and Controller Configuration dump showed it was good. Eventually I ungrouped the new disk and removed it in the normal way, then put another new disk in the bay and grouped it. This time the reconstruction succeeded.

 

I suspect the reconstruction would have worked first time if I had put the first new disk through a normal Remove process in CV EVA after installing it, followed by re-insertion and grouping.

 

WilliamSmith11
Super Advisor

Re: Disk Group Failure

Hi 

I have a similar problem, but in my case, I have two disk groups with the error mentioned before.

The IT personal take too much time to request for new disks to be replaced in the failed disk groups.

Both disk groups are at 100% of occupancy, 

Seven physical disks failed in total (divided into the two disk groups).

Failed disk drives have been replaced and are available to be grouped, but when trying to group the Pop-up messages (A hardware failure has occurred ...)  appear.
 
Also, both controllers had Soft Diagnostic Failure, which was corrected by entering to the fieldservice option and resetting the Master controller.

I reboot the two controllers, one at a time.
Going back to both disk groups can see the error persist.

I make a shutdown EVA4400 and disconnected power for all the enclosure and controller enclosure, 

I have waited for 10 minutes and reconnected everything and power on.

The error for the disk group persists and can not add the available new disks.

Some idea or clue to resolve this case

Regards 

rperez
WilliamSmith11
Super Advisor

Re: Disk Group Failure

HardwareFailure.jpg

rperez