Disk Enclosures
1827520 Members
2660 Online
109965 Solutions
New Discussion

MSA2012fc vdisk problem

 
Tiago Andrade e Silva
Occasional Advisor

MSA2012fc vdisk problem

Hello!

I have only one RAID 6 vdisk (12 HDs) with 3 volumes on a MSA2012fc dual controller. At the logs I got several strange messages stating something like (more details on the attach):
*** Event code 207 vdisk scrub failed, error code 1. 100 errors found;

After some consecutive entries like that I get:
*** Event code 207 vdisk scrub complete, 37 errors found;

I guess this now happens every time a scrub operation will end.

My questions are: why did vdisk scrub failed in the first place when the system has almost no workload at all (still testing and configuring)? How can this be avoided and corrected? I guess this is serious, right? I'm starting to think this equipment sucks... if this is the case now I wonder what will happen when it goes into production with several VMs!

Thanks in advance,
RC
15 REPLIES 15
skris
Trusted Contributor

Re: MSA2012fc vdisk problem

Hi Tiago,
The Event code 207 is classified as information only(severity-medium) . Looks like one of the Drives in the VDISK is erroneous and causing problems.

I'll answer the questions but in a different order :)

why did vdisk scrub failed in the first place when the system has almost no workload at all (still testing and configuring)?

The scrub is a background process, which will occur when device is in an idle state. A flyby is done on each sector simulating i/o as when in production(this is a normal process)

guess this is serious, right?
No, it is an ongoing issue and it is expected to be addressed by the newer firmware

How can this be avoided and corrected?
Find the faulty drive and replace (not possible till the next firmware).

Workaround:
You could probably create a new Vdisk(R6) and go on adding disks one-by-one let the scrub complete and wait for errors (Try removing the drive which logs errors and seek replacement)

PS: If you really wanna get rid of the errors, disable the Disk scrubbing.

Cheers!
Shiva


skris
Trusted Contributor

Re: MSA2012fc vdisk problem

The issue was identified with version : J200P19. Wonder what version you are at now.
Tiago Andrade e Silva
Occasional Advisor

Re: MSA2012fc vdisk problem

skris,

Thanks for your reply.
The controller has the latest available firmware - J200P19.
The storage and disk drives are new. The vDisk configuration was created from scratch without beeing used until it completed. I know it's still possible to fail... not usual but possible.
I don't wanna get rid of the errors by disabling scrub, if something is wrong I want to know and correct it and be warned for any eventuality.
I have no desire to put this "thing" into production knowing I have one (of 12) problematic drive who makes my volume (vDisk) inconsistent!?!? Who tells me it isn't more than 1 hard disk? Shouldn't SMART detect such thing and light on the disk LED?
And if I told you I know about another MSA2012fc (also brand new) with the same issue? I don't think this equipment has the minimum quality to be sold yet. This is a Dot Hill product rebranded by HP and I wonder if this is also a common issue with their equivalent product.

Regards,
RC
skris
Trusted Contributor

Re: MSA2012fc vdisk problem

HI Tiago,

Looks like newer firmware updates are available:

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=3687119&prodTypeId=12169&prodSeriesId=3687115&swLang=13&taskId=135&swEnvOID=1005


But now in the release notes there isn't a word about any fixes to this problem. Ironically the J200 P19 release notes boasts to have fixed the issue which we are discussing.

The newer updates look like minor changes to existing code and sure should help. I would rather try out the newer firmware before anything else.


Cheers!
Shiva

Vahur
New Member

Re: MSA2012fc vdisk problem

Hey,

any updates on this?
Just received our MSA2012FC, and have i populated with 6 SAS and 4 SATA disks - made 2 virtual disk.
And now when their creation has finished, i see that both disks are basically all the time in background scrub "mode". Since the device is idle at this time, i suppose it's normal? But what worries me - after every couple of hours one of the vDisks reports:
207 A317 Vdisk scrub complete, no errors found. (Vdisk: SAS-RAID6, SN: 00c0ffd70ebf00006fd62f4900000000)

And this is considered a critical event, so i get a notification ofcourse about it.

And one other thing - it seems, that every time the "box" stays idle for a while (over-night for example), the WebInterface will stop to funcion. IE just displays an empty page - with status: done with errors on page (have tried it from several computers). All other protocols seem to work (telnet for example). When i restart the controller (via telnet) the WebUI will "gome back". But i don't want to restart the storage controller every time i need to make a chnage. Any ideas?
BTW: i'm on latest firmware J200P24
Vahur
New Member

Re: MSA2012fc vdisk problem

Oh, and as additiona info:
disabling and re-enabling HTTP/HTTPS has no efect. Only restart of the management controller.
Zsolt Ponyiczki
New Member

Re: MSA2012fc vdisk problem

Hello Tiago!

We have the same log entries on our MSA2000.

Did you solved this problem by now?

Best regards!

Zsolt
Venkadesh Kannan
Frequent Advisor

Re: MSA2012fc vdisk problem

As of I know the best solution to this now is backup the vdisk, delete the vdisk, recreate the vdisk using offline initialization & restore data.

for offline initialization you need to change yourself to a advanced user & then select create vdisk, you will get a new menu called advanced options where offline is visible.
SWAP
Advisor

Re: MSA2012fc vdisk problem

Hello,

We are also facing same issue with MSA 2012fc:
Event Code : 207
Message: Vdisk scrub complete, 1 error(s) found (Vdisk: SAN-DISK-02, SN: 00c0ffd55f2500489539874800000000)

Where SAN-DISK-02 is configured with RAID5.

1# Is Vdisk scrub is automatic process? if yes then what is time interval?
2# Is it any Hardware related error?
3# Do we need to plan for contingency?

Thanks in advance for support.
Carl-Martell Sippel
Occasional Advisor

Re: MSA2012fc vdisk problem

Hi,
trying to answer the latest questions:

1. scrub is an automatic, but configurable process. You can turn it on or off (WebUI: Manage -> General Config -> System Configuration). It doesn't run at a specific time interval, but continously. You can however specify the utility priority on the same config page.

2. According to HP support, this is caused by (at least) one disk showing a high error rate, but being still operational. Due to a FW bug or limitation the system can not identify which disk exactly is causing this.
It was supposed to be fixed with the FW upgrade realeased end of October 2008, but wasn't. Now it's supposed to be in the next FW upgrade - whenever this may be realesed :-(

3. Again according to HP support the system will still be able to identify when a disk encounters a "hard error". So this shouldn't directly cause any data loss (assuming you have properly configured RAID sets and such). So personally I keep the system runnning as is and hope for the next upgrade to fix it... However I still feel uncomfortable knowing the might be a (in some way) erroneous disk. In particular in case another disk fails and data consistency then depends on all other disks keeping up until the RAID is reconstructed. So I'd say you have to decide yourself how critical you consider this bug and the data on your MSA (we happen to have the data mirrored to another MSA anyway - so I am still kind of relaxed :-). This thread shows some advices what you could do to isolate the faulty disk. But regardless of this specific issue you should of course have a regular valid backup of the data on your MSA...

Best Regards,

Carl Martell
W. Voos
New Member

Re: MSA2012fc vdisk problem

We have found out that enabling the INFORMATIONAL Event #58 (Recoverable disk drive errors) shows the failing drive ...
Stephen Denton
New Member

Re: MSA2012fc vdisk problem

Hi All,

I'm currently in the same situation...

MSA2012fc Critical: SCRUB_ARRAY_COMPLETE
Scrub failed on vdisk
51 parity mismatches detected
Itâ s imperative you contact technical support...
EVENT CODE:207
EVENT SEVERITY:Error

Current firmware: J200P30

HP have recommended the following...

1. Take backup of the data from the vdisk.
2. Delete the exisiting Vdisk.
3. Update the controller firmware to J200P39.
4. Recreate the vidsk in a "OFFLINE" mode.
5. Restore data backup.

As a best practice, we have to create a vdisk in the offline mode.
You can still update the firmware to J200P39 to resolve the issue about the controller reboot, however the vdisk scrub errors would still persist if the vdisk has not been deleted and recreated in offline mode.

WOW!

We are a School running a virtualised environment which is hosted on the SAN. There are 16 VM's (Mostly servers including Domain Controllers, Exchange, Intranet, File, Print, Application, TS, basically everything!), are on a single volume which uses all the space on the SAN. As you can appreciate the data is critical and the amount of time/downtime spent resolving this issue will be high. I do not have a spare SAN and do not have the budget for a spare SAN. Can anybody out there help with the risks involved in not resolving our problem, or indeed the best way to resolve this situation in a timely and realistic fashion?

Also if this is just a faulty hard drive, then resolving the situation as per HP's advice will surely leave ourselves in the same situation with just my loss of sanity! Will upgrading to firmware J200P39 provide any more useful information/tools for fixing my issue? Or will it just stop the errors increasing?

Hi W. Voos...

How do you enable this event and do I have to be on the latest version of Firmware to achieve this? When I do a full debug log dump I can see that I have 0 Event #58 errors!

Thanks in advance to all!
Chris Ciapala
Trusted Contributor

Re: MSA2012fc vdisk problem

I'd first upgrade firmware to latest version. First controller firmware, then HDD. Make sure to stop all IO before upgrading HDDs, especially on SATA disks.
I had similar problems in the past, but they are gone now without recreating vdisks.
Stephen Denton
New Member

Re: MSA2012fc vdisk problem

I cannot find a firmware upgrade for the SAS 300GB 15k Dual Port P/N: AJ736A aka ST3300655SS disks! Although I will still try the Controller firmware update to J200P39 and let you all know the outcome...

Re: MSA2012fc vdisk problem

I was told by HP engineer to first upgrade the HDD firmware on the SATA drives, then the controller and enclosure last.
Reason should be that there is a small risk of the controller losing connection to the drives if power fails.

If your disk's are vdisk members I'll recommend that you upgrade one at a time and after each upgrade check that the disk is online again and vdisk is ok.
Remember to eliminate all I/O access and background scrub process before you begin (I turned off the SAN switches just to be sure).
Also be aware that when upgrading one drive, all drives managed by the specific controller will be taken offline temporarily even if the are in another enclosure.
If you have dual controllers and a drive firmware upgrade fails, you could try the upgrade from the other controller.

SATA drive updates takes about 5 min/disk.
Controller upgrade will take about 15 min/controller.
Enclosure upgrade will take about 5 min/enclosure.

Hope this information can help someone as it did for me.