MSA Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

MSA2324fc Vdisk fault

 
k_a_tumanov
Occasional Advisor

MSA2324fc Vdisk fault

  Hello!

  We have been using MSA2324fc for a couple of months. MSA is equipped with 14 hard drives. We have one RAID-6 array configured. 

  This morning we saw that our data is inaccessible. SMU shows that Vdisk status is "Fault".

  Besides this, Fault LEDs on 3 hard drives (#7, 10, 13) are On (Solid) and their Online LEDs are also on. There are no such LED combination in User guide.

  SMU shows that all hard drives are "Healthy" and "Up". The same for Controller and Power supplies.

What should we do to manage this and make data accessible again?

Thank you!

7 REPLIES 7
PAC-MAN
Frequent Advisor

Re: MSA2324fc Vdisk fault

In SMU click on the critical events and post the events that recently happened, these events will help us to undestand why the vdisk went OFFL.

If you feel this was helpful please click the KUDOS! star in the left column!
k_a_tumanov
Occasional Advisor

Re: MSA2324fc Vdisk fault

  PAC-MAN, thanks for your response!

  Unfortunately log records are too new now, because we tried to restore the system quickly with some primitive methods (remove/insert "fault" hard drives, reboot MSA). This caused posting of many new log records. For now we have restored main data from backup onto reserve storage system. Also we have recreated Vdisk on MSA2324fc.

  After recreating Vdisk we had one hard drive in "leftover" status (#7) and another one in "spare" status (#1). Then we cleared metadata of hard drive #7. But after the second Vdisk recreation we had hard drive #10 in "leftover" status and hard drive #1 in "spare" status. Then we cleared metadata of hard drives # 7 and 10. For now, after the third Vdisk recreation, we have Vdisk in "fault tolerant" status. MSA is performing "Media scrub" to Vdisk for many hours now.

 Now we are waiting for what comes next...

k_a_tumanov
Occasional Advisor

Re: MSA2324fc Vdisk fault

Now newly created Vdisk is usable, but already the third "media scrub" job is in progress.

PAC-MAN
Frequent Advisor

Re: MSA2324fc Vdisk fault

Thanks for the update, Media scrub is normal process and we only have to intervene if its stuck.
If you feel this was helpful please click the KUDOS! star in the left column!
k_a_tumanov
Occasional Advisor

Re: MSA2324fc Vdisk fault

MSA has been executing "Media scrub" jobs for 3 days now. Should we patiently wait or should we somehow interrupt the process?

k_a_tumanov
Occasional Advisor

Re: MSA2324fc Vdisk fault

  After 3 days of "Media scrub" we have 16 "informational" records in event log with text like "An error was detected by a disk drive. (disk: channel: 0, ID: 6, SN: TP3105138711, enclosure: 1, slot: 7)(Key,Code,Qual:0x1,0x17,0x3)(CDB:Rd 0874a980 0080)(Info:0x0874A9B4)(CmdSpc:0x0, FRU:0x0, SnsKeySpc:0x800025)(Recovered Error, recovered data with negative head offset)".

  Fifteen records refers to hard drive in slot # 7 and one record refers to hard drive in slot #10.

  "Media scrub" is still in progress.

k_a_tumanov
Occasional Advisor

Re: MSA2324fc Vdisk fault

  After a few days of work hard drive in slot #13 has got to "Leftover" status. The drive was making loud sounds of crackle.

  In the next few days hard drives in slots ##7 and 10 have got to "Leftover" status too. There was a number of "Informational" records in Log regarding these drives with Events 58 and Messages "...read retries exhausted" immediately before that.

  We gave these three drives to the seller and they replaced bad drives with new ones.

  All was fine for about a week, but in last 2 days we had 9 "Informational" records in Log regarding drive in slot #10 (again!) with Events 58 and Messages "...Recovered Error, recovered data with error correction applied", "...Recovered Error, recovered data with retries", "...Recovered Error, recovered data with negative head offset".

  Do these records indicate that the new drive in slot #10 is also bad or there are some problems with MSA?