Re: Random Errors after drive replacement

Tim Poulter · ‎12-07-2021

Lateley we have been getting flooded with errors after we had to replace a hard drive due to a drive failure

These are on a BL460c Gen 9 / D2220SB blades with HP CMC 12.8

The errors are as follows, always seems to come in batches of 4

The storage system 'VSA01' status in cluster 'CLUSTERNAME' is 'Not Ready'. CRITICAL (E00060202)

Message :The storage system 'VSA01' status in cluster 'CLUSTERNAME' is 'Not Ready'.

Volume status update from management group 'MGMTG'. WARNING (EF2000400)

Message : Volume status update from management group 'CCGMGMTG'.

Volume Status VMSTORE1 = Unprotected Total Volumes = 1

The storage system 'VSA01' status in cluster 'CLUSTERNAME 'is 'Up'. INFORMATIONAL (E00060206)

Message: The storage system 'VSA01' status in cluster 'CLUSTERNAME is 'Up'.

Volume status update from management group 'MGMTG'. WARNING (EF2000400)

Message : Volume status update from management group 'MGMTG'.

Volume Status Normal

I have tried to look for details about these error codes but cannot find anything and the links that come in the emails do not go to anywhere and probably lost due to the site changes over the years (baffles me why on the latest verson they never fixed the links)

We have rebooted the VSA's but not the host yet. But we are not seeing anything that would be the reason why these would be triggered as the VSA is up while we are looking in the CMC

Tim Poulter · ‎12-08-2021

I have done a Host reboot and it seems to have resolved the issues

Tim Poulter · ‎12-09-2021

Well they are back again. Not sure how to proceed with this as the answer links go nowhere in the email notifications

Assen · ‎12-09-2021

Hello Tim,

This looks like the underlaying RAID has problems. Could be HDD, BBU or Smart Array settings.

Generate and Array Diagnostic report from the Smart Array controller and check or let it be checked for HDD errors.
In the Array Diagnostic report check also how the Cache setting of the Smart Array controller are. If Read or Write is set to 0 than this would explain the issue.
If write cache is disabled could be the battery/backup unit.

Cheers

Assen

I am an HPE employee
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo

Tim Poulter · ‎12-09-2021

Hi Assen

I will double check the RAID. We did have a drive failure and it was replaced, and ILO does show all green in regards to the storage.

Tim Poulter · ‎12-09-2021

So it seems that the RAID is fine. there is no errors in ILO or the ACU. the logs show that there is a battery, it sees the size and shows the read/write percentage

Percent Write Cache 85% (0x55)

Cache Size In MiB 2 GiB (0x0800)

Cache Battery Count 1 (0x01)

So I guess I will have to open a support case, unless there is something else that I could look at or an idea of why

One other thing I noticed too is when this is happening the host is getting really High CPU usauge.

The average is about 15%(varies between 10-20%) but when it hits the high usage it spikes to 100%-90+% on this host. the other host is fine

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Random Errors after drive replacement

Random Errors after drive replacement