HPE SimpliVity
1752785 Members
5659 Online
108789 Solutions
New Discussion юеВ

SSD failure causes SVTFS to fail

 
sergeisokolov
Advisor

SSD failure causes SVTFS to fail

Greetings!

Has anyone experience an issue when one SSD disk failure on a HPE380 Gen10 node caused SVTFS (SimpliVity file system) failure, so that SVTFS wouldn't even start after the restart?

We've had that already 3 times during the last month, and on 3 different nodes.

The first time we had this issue, SVTFS reported a fatal error, which looked like some kind of corruption so HPE support suggested to remove the node from the Federation, reset the OVC to defaults and join back. Obviously that meant that all data on the node was lost.

The second time we were more lucky and the node did not get data corruption but we had to wait for the replacement disk to arrive and RAID rebuild to complete. During that period SVTFS was non-operational on the node.

And now we're having it the third time and are waiting for the replacement disk to arrive, and I hope we wouldn't have to re-initalize the node.

Of course SSD failures are not uncommon but having a RAID-6 array guarantees that a node can survive even a 2-disks failure, but at least in our our first case we basically had RAID-0 reliability level.

 

12 REPLIES 12
dhooley
HPE Pro

Re: SSD failure causes SVTFS to fail

Hello,

We are aware of some issues in our Firmware where a single disk failure can cause an SVTFS crash. The issue is with a third party driver taking the LUN offline after multiple time outs.

Please open a support ticket for us to confirm the issue and provide resolution. Typically, you will need to upgrade to 3.7.9 & above for permanent resolution.

Hope this helps.

David

I work for SimpliVity HPE


I work for HPEAccept or Kudo
sergeisokolov
Advisor

Re: SSD failure causes SVTFS to fail

David,

Thanks for the quick response!

Does a Release Notes PDF mention this issue or it's internal information?

I'm already in a process of upgrading to 3.7.9, this third failure is currently causing the delay (I have some uncommited upgrades for other hosts).

I had support cases raised for each of those failures. Support's recommendation was "to be on the latest version possible", so it wasn't quite clear which version of OmniStack or firmware definitely fixes it.

I believe being aware of such a critical issue would be beneficial for everyone (I mean if it was posted in the Release Notes for example).
Us having the same issue third time in a row shows it wasn't random, so it's already good news that 3.7.9+ has a permanent fix for that.

 

DaveOb
HPE Pro

Re: SSD failure causes SVTFS to fail

https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00060528en_us&docLocale=en_US

OMNI-66677: Single drive failure caused data unavailability. Driver and firmware have both been updated and are available in the 3.7.9 release.

 

 

 


I am an HPE employee
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
SimonHorwood
Advisor

Re: SSD failure causes SVTFS to fail

Hi

I've just experienced this issue with 3.7.8. The timeouts cause a 'no start' to be but in the start up config so that the SVTFS doesn't start and so the OVC becomes inactive. I guess this is to lessen the chance of data corruption.

2 weeks ago another SSD went in another node but the OVC remained active.

Replacement disk should arrive today

Cheers

Simon

 

 

dhooley
HPE Pro

Re: SSD failure causes SVTFS to fail

@SimonHorwoodI would recommend getting your system upgraded to 3.7.9 when possible, to avoid a re-occurrence. Ensuring that you also patch with the shipped Frimware package.

The issue itself pertains to LUN resets following a disk failure. If this LUN reset times out, the SVTFS service will crash. In previous versions, there is a "quirk" where this reset can time out.

So not gauranteed to occur everytime, as you mention. Thanks.


I work for HPEAccept or Kudo
SimonHorwood
Advisor

Re: SSD failure causes SVTFS to fail

Ouch!

Is it 3.7.9 that supports upgrades per cluster or was it 3.7.10?

This is like a full time job just doing the firmware upgrades for 40 -50 hosts!

Cheers for the heads up!

 

DowS
HPE Pro

Re: SSD failure causes SVTFS to fail

Hi Simon,

 

It is 3.7.9 that allows upgrades at cluster level.

"Upgrade Manager now allows you upgrade HPE OmniStack at the cluster level. To use this feature, you must be upgrading from HPE OmniStack 3.7.8 to HPE OmniStack 3.7.9. To use this feature when upgrading from earlier versions of HPE OmniStack, you must first upgrade to HPE OmniStack 3.7.8 before upgrading to HPE OmniStack 3.7.9."

HPE OmniStack 3.7.9 for vSphere Release Notes

 


I am an HPE Employee

Accept or Kudo

dhooley
HPE Pro

Re: SSD failure causes SVTFS to fail

 

I work for HPEAccept or Kudo
SimonHorwood
Advisor

Re: SSD failure causes SVTFS to fail

Thanks guys!

At least it will be a bit easier for one of my Federations!

Cheers

Simon