1832609 Members
2456 Online
110043 Solutions
New Discussion

Re: HASS disk failure

 
Jakes Louw
Trusted Contributor

HASS disk failure

Hi guys
Have any of you been getting any hardware failures on the Seagate ST39173WC?
Local HP office won't confirm or deny the cause, or even the MTBF for this model. A little bird whispered in my ear, and said I should try extending the PV timeout to 180 ms (pvchange -t 180 pvname). At the moment, the timeout value is defaulted.
I've had a string of 16 failures in the last 11 months.
Trying is the first step to failure - Homer Simpson
7 REPLIES 7
Stefan Farrelly
Honored Contributor

Re: HASS disk failure

Weve got lots of HASS drives, including many ST39173WC and we dont have any more failures than usual (rare). We dont adjust our PV timeout values either, the default is sufficient.

I think you need to look at what type of failures you are having. What are the error messages you get when they fail ? eg. are they all powerfails, or scis errors or ??
Im from Palmerston North, New Zealand, but somehow ended up in London...
Jakes Louw
Trusted Contributor

Re: HASS disk failure

I'll have to check on the last one, but if my memory serves me correctly, we get POWERFAIL errors, the disks go to "NO H/W" on an ioscan, and go unavailable on a VGDISPLAY. Usually then the HP engineer will replace, but we have played around with unseating the disk, then rebooting, and then seating the disk again, and quite often the disk is visible and usable again once a vgsync is performed. Which tells me that the diagnostic firmware has flagged the disk due to excessive timeouts, or am I off base here?
Trying is the first step to failure - Homer Simpson
Alexander M. Ermes
Honored Contributor

Re: HASS disk failure

Hi there.
Tried to check the Jamaica box with the stm tool ( cstm / xstm ) ?
Should give an overview about the real problems. We use these boxes for some time now and have very little problems.
Rgds
Alexander M. Ermes
.. and all these memories are going to vanish like tears in the rain! final words from Rutger Hauer in "Blade Runner"
Eugeny Brychkov
Honored Contributor

Re: HASS disk failure

I would check disk firmwares (not to be too old) and SCSI busses - cables and termination. You should not see any SCSI/disk error messages in syslogs when server functions. If there're messages - then something is wrong. If you're daisy chained HASS's busses A and B - how long your daisy chain cable is?
If there're a lot of disks on the SCSI bus (let's say, 8 disks) and bus is loaded heavily (or example, having root/boot/swap/database disks on it) then I would split disks across different busses to split the load. If it's not possible - indeed try increasing PV timeout for low priority disks. Remember: 7 has highest priority, then .. 0, then 15 .. and 8 has lowest priority
Eugeny
Jakes Louw
Trusted Contributor

Re: HASS disk failure

Hi Eugeny

1) HP checked the firmware: they are happy
2) The disks are root/boot/swap, as you suggest, so high usage on occassion, hence my comment regarding timeouts
3) There are only 4 x HASS per SCSI card, so no heavy chaining
4) The cables are standard factory 5m or 10m units, installed by HP.
5) With only 2 x SCSI cards (one for primary, one for mirror), I cannot spread the load.
The parallel option for the mirroring should allow the fastest disk to update first, surely?
Trying is the first step to failure - Homer Simpson
A. Clay Stephenson
Acclaimed Contributor

Re: HASS disk failure

The symptom of a drive failing, fixed by unplugging and re-inserting, is a very common occurrence of a drive that is going to fail. I have over 100 of these drives and thrir failure rate is about typical. The default timeout is fine and should only be increased for real arrays -- not JBOD's. It's not unusual for me to replace one or two of these drives per month. If you have a small number of drives and they are consistantly failing then I would check two things: 1) Power supply voltages 2) Cooling -- both the JBOD fans and the ambient.
If it ain't broke, I can fix that.
Jakes Louw
Trusted Contributor

Re: HASS disk failure

Again, just a sign-off on this:

we started replacing the ST disks with IBM 9GB spindles (IBM DGHS09Y), and haven't had repeat failures on these fixes like we had on the Seagates.

Makes one think....
Trying is the first step to failure - Homer Simpson