MSA Storage
1753611 Members
5952 Online
108797 Solutions
New Discussion юеВ

All 3 iSCSI volumes on MSA 2060 suddenly RAW and unreadable

 
SOLVED
Go to solution
JoeFLC
Occasional Contributor

All 3 iSCSI volumes on MSA 2060 suddenly RAW and unreadable

Hi,

So I encountered a disasterous volume corruption problem yesterday with our new HPE MSA 2060 that I was just starting to roll out and I'm hoping someone might know why it might have happened and whether it can be prevented or recovered from in future.

Summary:
One of our iSCSI volumes just randomly stopped working in Windows. After a reboot, all 3 volumes appeared as RAW in Disk Management and could not be fixed with chkdsk. They would later degrade to uninitialised. They've now been wiped clean. The MSA was up-to-date with firmware.

Full version:
It started while everything was simply running. We had 3 iSCSI volumes via direct connect ethernet cable, each 64TB NTFS on Windows Server 2022. I had connected a couple of people to a SMB share on one of them, but with read-only permissions. Suddenly one of them couldnтАЩt be read in Windows anymore, despite the other two seeming to be fine. The MSA was about 70% through a scrub but reported no problems at all in its GUI or my email notifications. Trying to interact with the drive in any way through the Windows GUI stopped the server from responding to those windows and was freezing all work for staff. It also seemed to be preventing the server from doing a normal restart, so I did a desperate hard reboot.

When Windows came back up, all 3 volumes were showing as RAW in Disk Management. Googling it showed an abundance of threads about this happening after hard reboots with iSCSI. However, running chkdsk /f on each of the volumes failed immediately despite being a reliable fix in most of the threads I had read. Using a different computer as the iSCSI initiator made no difference at any point. The MSA 2060 said the data was still there on each volume (although it seemed to be losing a couple percent every time I checked back later).

I looked at running partition and file recovery programs, but they were clearly going to take forever for each volume with no guaranteed success and the data was all backed up elsewhere anyway. However, interacting with the volumes seemed to cause each of them to degrade further by showing up as Not Initialized one by one. Also, when I deleted the volume that was of least importance and tried to create a new one I couldnтАЩt initialize any of them either which was sending up red flags that the MSA 2060 might be the problem.

IтАЩm not sure what resolved it exactly but power cycling the MSA and switching 10Gbe ports allowed me to at least initialize new volumes. I was pretty exhausted and stressed by this point and just decided to create clean reliable volumes so I could start the long process of copying data back over. In hindsight I should have slept on it so that HPE support could actually look at the corrupted volumes to better make suggestions about the cause/fix, but that's where I'm at.

I'm hoping somebody might have some helpful advice on what might have happened here and what measures can be taken in future. Thanks.

 

Appendix - Event Logs:

Below are a selection of events from Event Viewer as it all happened. I couldnтАЩt find any events of any note before these all appeared at the same time:

At 16/02/2022 2:31:41
Volume Shadow Copy Service error: Unexpected error DeviceIoControl(\\?\Volume{2e40a1e3-f58c-4092-9e52-187e46be723c} - 0000000000000244,0x0053c008,0000010C4E0068B0,0,0000010C4E0078E0,4096,[0]).  hr = 0x80820001, The bootfile is too small to support persistent snapshots.

.

Operation:

   Processing EndPrepareSnapshots

Context:

   Execution Context: System Provider

 

And:
Volume H: (\Device\HarddiskVolume19) requires an Online Scan.  An Online Scan will automatically run as part of the next scheduled maintenance task.  Alternatively you may run "CHKDSK /SCAN" locally via the command line, or run "REPAIR-VOLUME <drive:> -SCAN" locally or remotely via PowerShell.

 

And:

Chkdsk was executed in verify mode on a volume snapshot. 

Checking file system on \Device\HarddiskVolume19

The shadow copy provider had an error. Check the System and Application event logs for more information.

A snapshot error occured while scanning this drive. You can try again, but if this problem persists, run an offline scan and fix.

 

And:
A corruption was discovered in the file system structure on volume H:.

The Master File Table (MFT) contains a corrupted file record.  The file reference number is 0x100000000040d.  The name of the file is "<unable to determine file name>".

 

And:
The file system structure on volume H: cannot be corrected.

Please run the chkdsk utility on the volume H:.

 

2:38:27

The file system structure on volume H: has now been repaired.

 

2:56:44

The file system structure on volume H: cannot be corrected.

Please run the chkdsk utility on the volume H:.

 

3:50:33

Hard reset

 

3:57:26

A corruption was discovered in the file system structure on volume G:.

The exact nature of the corruption is unknown.  The file system structures need to be scanned online.

And that was where none of the volumes would work anymore. There were plenty more NTFS errors after this point, although only for G: drive for some reason.

6 REPLIES 6
support_s
System Recommended

Query: All 3 iSCSI volumes on MSA 2060 suddenly RAW and unreadable

System recommended content:

1. HPE MSA 1060/2060/2062 Storage Troubleshooting Guide

2. HPE MSA 1060/2060/2062 Installation Guide

 

Please click on "Thumbs Up/Kudo" icon to give a "Kudo".

 

Thank you for being a HPE valuable community member.


Accept or Kudo

support_s
System Recommended

Re: All 3 iSCSI volumes on MSA 2060 suddenly RAW and unreadable

Hi Joe,

 

 

The latest firmware version IN11P001 has a fix for volume access issue till MSA controller is restarted. This issue is described in the below advisory:

 

https://support.hpe.com/hpesc/public/docDisplay?docId=a00119276en_us&docLocale=en_US

 

However, this issue is specific to vmware OS.

 

This issue would probably require an HPE support case and in depth log analysis to check for any issues from MSA end.


Accept or Kudo

JonPaul
HPE Pro

Re: All 3 iSCSI volumes on MSA 2060 suddenly RAW and unreadable

@JoeFLC 
Please connect with HPE support.  At first look, it seems as if there is another Initiator attaching to your LUN and cleaning/clearing the data on your LUNs.
Which may in fact be your one system.  If the volume is not being brought into the system as a Multipath device you will see multiple instances of the same LUN which you can use independently within Windows so on one instance your data resides. On another it's a foreign disk and will ask if you want to overwite/format/initialize.
Does the LUN show up as an MPIO LUN?
Do you see multiple instances of the disk device in Device Manager?
Is your iSCSI initiator IQN unique?
By default when Windows sees a LUN it does not recognized in Disk Manager it basically says do you want me to blindly overwrite everything on this disk?  Are there other admins attaching to the MSA system?

I work for HPE
JoeFLC
Occasional Contributor

Re: All 3 iSCSI volumes on MSA 2060 suddenly RAW and unreadable

@JonPaulI have engaged support fully as of yesterday but I'm waiting on a response at this time.

Thanks for your suggestions, they provide some useful insights. It definitely sounds like my inexperience with iSCSI might be the culprit here. Regrettably it's hard for me to diagnose much at this point since it was a non-production server and at the time I felt it was easier to wipe everything and start over, but knowing why iSCSI can lead to this sort of issue is very helpful to me.

The MSA is direct-connected to the host and I'm the only admin. I don't explicitly recall using the iSCSI initiator under a second account, but it's possible I may have done at some point while logging into the host with a more privileged account. I definitely do recall seeing multiple instances in Disk Management at one point. At the time I thought Windows was just misreporting or something. The IQN should be unique, but I've never been able to get Windows to automatically reconnect in the iSCSI initiator after a reboot (a common issue upon Googling, with no resolution that works for me so far). I have to wonder if my manual attempts to reconnect it every time caused me to somehow create multiple connections.

So, probably the key lesson for me is: Will setting up multipath reasonably protect me in future from the issue you have described?

On a separate note: It's worth noting that I went with the iSCSI version of the MSA without proper knowledge of what I was in for. It made sense on paper at the time with my limited understanding, but has since proven an extremely troublesome, especially since the MSA is right next to the host server which has now has room for an external HD-SAS card.

Consequently, I would very much like to know if it is possible for me to purchase SAS modules and have myself or an HPE tech simply swap out the 10Gbe modules on the MSA with them? Or do the variants of MSA differ in a way that prevents that?

Thanks

JonPaul
HPE Pro
Solution

Re: All 3 iSCSI volumes on MSA 2060 suddenly RAW and unreadable

@JoeFLC 
Sorry for the trouble you are having. iSCSI can be very simple but can also be more complex as it requires a login to see the initiator.
I'll look for some information sources to help you along, nothing is coming up right away for me.  There is a little information in the Best Practices guide:  https://www.hpe.com/psnow/doc/a00105260enw?jumpid=in_lit-psnow-red
Under Connectivity Best Practices
From the iSCSI initiator software you want to make sure you are creating 'Sessions' from your Ethernet HBAs to the different target port IP addresses on the MSA. (you can create multiple sessions to the same target port but that is non-helpful)  When you look at the MPIO tab for the disk device you should see optimized and un-optimized paths to the LUNs.  If the LUNs are 'claimed' by multipath then you will only see 1 device per LUN and the PATHs will be handled by MPIO.  This will avoid seeing the same device multiple times in Disk Management and avoid accidental damage by looking at different instances of the same disk.
A SAS MSA would be a great option when you have 4 or fewer hosts connected.  Unfortunately we can not change out the controllers as this presents problems with support.  In the past we have suggested working with your HPE sales team/reseller to see about a system swap to a SAS system.

I work for HPE
JoeFLC
Occasional Contributor

Re: All 3 iSCSI volumes on MSA 2060 suddenly RAW and unreadable

@JonPaul 
Ok, thanks very much for your help. I think you've very likely provided all the relevant information I needed to prevent this happening again and I'm happy to mark this one as answered.

I also suddenly remembered that I did in fact probably have another host set up in the MSA dashboard. It was leftover from when I was trying to troubleshoot the poor transfer speeds I was getting. I'm not sure how they could have connected at the same time but I can't be sure that they didn't. At any rate, it's another important lesson learned.

Thanks for the clarification about switching to SAS. I'll discuss it with my reseller.

Cheers