<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: MSA2212fc strange failure in Disk Enclosures</title>
    <link>https://community.hpe.com/t5/disk-enclosures/msa2212fc-strange-failure/m-p/4567008#M33982</link>
    <description>Hello!&lt;BR /&gt;&lt;BR /&gt;Temperature in the data room was normal. Both the cluster nodes are HP DL360 with good enviromental control too. Customer has a number of other servers and data hardware at data room too - there was no any alarm.&lt;BR /&gt;&lt;BR /&gt;Power comes from two independent UPS. Both are monitored and are in good conditions&lt;BR /&gt;&lt;BR /&gt;All HDDs were not failed actually - it just seems so. Being manually switched off and on the MSA started successfully, all the 12 drives go up and running without errors&lt;BR /&gt;&lt;BR /&gt;Most probably the problem was in malfuncion of some part of MSA which is common for any HDD but is not monitored. But MSA hardware has 2x redundancy... I do not beleve that two or more circuits fails at the same time. Should be exactly one cause&lt;BR /&gt;&lt;BR /&gt;Regards, Ivan</description>
    <pubDate>Tue, 19 Jan 2010 21:38:25 GMT</pubDate>
    <dc:creator>Ivan Kuznetsov</dc:creator>
    <dc:date>2010-01-19T21:38:25Z</dc:date>
    <item>
      <title>MSA2212fc strange failure</title>
      <link>https://community.hpe.com/t5/disk-enclosures/msa2212fc-strange-failure/m-p/4567006#M33980</link>
      <description>Hello&lt;BR /&gt;&lt;BR /&gt;One of our client has two-node cluster (Oracle RAC under RHEL4) using MSA2212fc disk array as shared storage/voting disk. MSA has 2 controllers installed. Each controller connected with FC link to each node. We configured RAID10 of 10 HDDs and 2 HDDs are global hotspare (total 12 SAS dual-port HDDs). &lt;BR /&gt;The cluster works fine for ~1 year (24x7x365) but once failed. Linux on both nodes shows that MSA become unaccessible via both pathes:&lt;BR /&gt;&lt;BR /&gt;Jan 10 06:15:50 ctms1 kernel: SCSI error : &amp;lt;0 0 1 1&amp;gt; return code = 0x20000&lt;BR /&gt;Jan 10 06:15:50 ctms1 kernel: end_request: I/O error, dev sdb, sector 3936599&lt;BR /&gt;Jan 10 06:15:50 ctms1 kernel: device-mapper: dm-multipath: Failing path 8:16.&lt;BR /&gt;Jan 10 06:15:50 ctms1 multipathd: 8:16: mark as failed&lt;BR /&gt;Jan 10 06:15:50 ctms1 multipathd: mpath3: Entering recovery mode: max_retries=18&lt;BR /&gt;Jan 10 06:15:50 ctms1 multipathd: mpath3: remaining active paths: 0&lt;BR /&gt;Jan 10 06:15:51 ctms1 kernel: SCSI error : &amp;lt;0 0 1 1&amp;gt; return code = 0x20000&lt;BR /&gt;Jan 10 06:15:51 ctms1 kernel: end_request: I/O error, dev sdb, sector 145185607&lt;BR /&gt;Jan 10 06:16:01 ctms1 kernel: SCSI error : &amp;lt;0 0 1 1&amp;gt; return code = 0x20000 &lt;BR /&gt;&lt;BR /&gt;The cluster tried to reboot themself but both nodes hangs on startup. It was early morning of holiday, the load was minimal, there was no duty tech. personel on customer site. When the site administrator come on he turns off and on the hardware, the cluster starts successfully. No data lost but the application was off-line for some hours&lt;BR /&gt;&lt;BR /&gt;The customer asks us to diagnose the problem and prevent such failures in future. &lt;BR /&gt;&lt;BR /&gt;MSA controller shows strange log records (see the attached file; time at controller was not acurate, the difference is ~4 min). It looks like all the HDDs are simultaneosly failed, but it is unreal. The array has two identical controllers, all the drives are dual-ported&lt;BR /&gt;&lt;BR /&gt;Any ideas will help us greatly &lt;BR /&gt;&lt;BR /&gt;Regards, Ivan Kuznetsov&lt;BR /&gt;SOLVO ltd.</description>
      <pubDate>Tue, 19 Jan 2010 15:22:47 GMT</pubDate>
      <guid>https://community.hpe.com/t5/disk-enclosures/msa2212fc-strange-failure/m-p/4567006#M33980</guid>
      <dc:creator>Ivan Kuznetsov</dc:creator>
      <dc:date>2010-01-19T15:22:47Z</dc:date>
    </item>
    <item>
      <title>Re: MSA2212fc strange failure</title>
      <link>https://community.hpe.com/t5/disk-enclosures/msa2212fc-strange-failure/m-p/4567007#M33981</link>
      <description>Hello Ivan,&lt;BR /&gt;&lt;BR /&gt;For some reason I could not download you log files, so, I didn't read that.&lt;BR /&gt;&lt;BR /&gt;But, one thing that can cause an error in multiple disks at the same time is an ambiental error (cooling problem in the customer data center).&lt;BR /&gt;&lt;BR /&gt;The hard disk drives are one of the most affected itens by this kind of problem. Which can result or not in data loss.&lt;BR /&gt;&lt;BR /&gt;Do you have any log in the msa or the servers that indicates any cooling problem (excessive heat)?&lt;BR /&gt;Do you have any information about problems in the customer data center?&lt;BR /&gt;</description>
      <pubDate>Tue, 19 Jan 2010 19:59:01 GMT</pubDate>
      <guid>https://community.hpe.com/t5/disk-enclosures/msa2212fc-strange-failure/m-p/4567007#M33981</guid>
      <dc:creator>Diego Salim de Oliveira</dc:creator>
      <dc:date>2010-01-19T19:59:01Z</dc:date>
    </item>
    <item>
      <title>Re: MSA2212fc strange failure</title>
      <link>https://community.hpe.com/t5/disk-enclosures/msa2212fc-strange-failure/m-p/4567008#M33982</link>
      <description>Hello!&lt;BR /&gt;&lt;BR /&gt;Temperature in the data room was normal. Both the cluster nodes are HP DL360 with good enviromental control too. Customer has a number of other servers and data hardware at data room too - there was no any alarm.&lt;BR /&gt;&lt;BR /&gt;Power comes from two independent UPS. Both are monitored and are in good conditions&lt;BR /&gt;&lt;BR /&gt;All HDDs were not failed actually - it just seems so. Being manually switched off and on the MSA started successfully, all the 12 drives go up and running without errors&lt;BR /&gt;&lt;BR /&gt;Most probably the problem was in malfuncion of some part of MSA which is common for any HDD but is not monitored. But MSA hardware has 2x redundancy... I do not beleve that two or more circuits fails at the same time. Should be exactly one cause&lt;BR /&gt;&lt;BR /&gt;Regards, Ivan</description>
      <pubDate>Tue, 19 Jan 2010 21:38:25 GMT</pubDate>
      <guid>https://community.hpe.com/t5/disk-enclosures/msa2212fc-strange-failure/m-p/4567008#M33982</guid>
      <dc:creator>Ivan Kuznetsov</dc:creator>
      <dc:date>2010-01-19T21:38:25Z</dc:date>
    </item>
  </channel>
</rss>

