HPE EVA Storage
1836458 Members
3242 Online
110101 Solutions
New Discussion

Re: MSA1000 cluster disk failures

 
Adam_88
Occasional Advisor

MSA1000 cluster disk failures

Multiple disk errors reported in Event Viewer. Cluster applications fail. (e.g. "Cluster resource 'Disk L:' failed")

How can I solve this problem?

The cluster includes 2 DL380G3 servers with SecurePath fiber connections to MSA1000 (fiber hub 2/3).

OS: Win 2K adv server

See "config.xls" for system configuration
See attached event viewer error messages
6 REPLIES 6
Doug de Werd
HPE Pro

Re: MSA1000 cluster disk failures

Can you please re-attach the Event Viewer messages? The attachment only shows the spreadsheet.

Other than that, what versions of HBA drivers and SecurePath are you running?

Thanks,
Doug
I am an HPE employee
Accept or Kudo
Adam_88
Occasional Advisor

Re: MSA1000 cluster disk failures

thanks for your swift reply.

I have attached the Event Viewer log file. Take a look at the error messages regarding disks L,S,Q

I updated the HP drivers using the new HP Support Pack 6.2. This includes the SecurePath drivers. Previous versions of the Support Pack have a bug where the DL380 crashes (Blue Screen) on error code 0004.
Marino Meloni_1
Honored Contributor

Re: MSA1000 cluster disk failures

Problem seems to appear with a connectivity lost, usualy cluster have tho ethernet links, one private (intracluster or heartbeat and one public, throught the public lan) in both of these link are lost you can run in trouble, It appear also problem writing data to some logicals units.

Be sure to have the last MSA FW, and last SP version(4.x).

Be sure to not have virusses or SW problems

check for lan connectivity

check for disk integrity/consistency
Marino Meloni_1
Honored Contributor

Re: MSA1000 cluster disk failures

Marino Meloni_1
Honored Contributor

Re: MSA1000 cluster disk failures

Adam_88
Occasional Advisor

Re: MSA1000 cluster disk failures

I solved this problem by installing SP4 for Win2000. Microsoft have a bug in Win2K SP3 which causes the drive letter of the cluster disks to change. This caused the multiple "write dely" messages and the SQL server to fail. (see http://support.microsoft.com/default.aspx?scid=kb;en-us;815616 )

We encoutered another serious problem which caused the Proliant DL380G3 servers to crash (Blue Screen on 00004). This turned out to be a HP bug which was fixed on Support Pack 6.4A (see http://www.microsoft.com/downloads/details.aspx?FamilyID=1001aaf1-749f-49f4-8010-297bd6ca33a0&DisplayLang=en )