<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: OpenVMS Cluster crash in Operating System - OpenVMS</title>
    <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317137#M15879</link>
    <description>Mark,&lt;BR /&gt;&lt;BR /&gt;they are in the same rack.&lt;BR /&gt;Nobody was working there, that's sure.&lt;BR /&gt;&lt;BR /&gt;Jan,&lt;BR /&gt;&lt;BR /&gt;we don't have console logging.&lt;BR /&gt;We have two embedded  switches in the MSA1000 and two HBAs in each DS20.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Wed, 03 Dec 2008 14:19:13 GMT</pubDate>
    <dc:creator>dschwarz</dc:creator>
    <dc:date>2008-12-03T14:19:13Z</dc:date>
    <item>
      <title>OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317129#M15871</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;we have an OpenVMS Cluster, two DS20E running OpenVMS 7.3-2, patched to UPDATE-V1100 (not the latest, I know). The systems are connected to a MSA1000 active/standby.&lt;BR /&gt;&lt;BR /&gt;Yesterday both nodes crashed/rebooted at the same time. No memory dump, no errorlog entries.&lt;BR /&gt;&lt;BR /&gt;Both systems are connected to a UPS.&lt;BR /&gt;&lt;BR /&gt;These were the only systems that crashed at that time, all other systems in that room kept on running.&lt;BR /&gt;&lt;BR /&gt;What can cause such a problem ?&lt;BR /&gt;What can we do to find out why this happened ?&lt;BR /&gt;&lt;BR /&gt;Dieter</description>
      <pubDate>Wed, 03 Dec 2008 13:05:22 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317129#M15871</guid>
      <dc:creator>dschwarz</dc:creator>
      <dc:date>2008-12-03T13:05:22Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317130#M15872</link>
      <description>Hello&lt;BR /&gt;&lt;BR /&gt;Are you sure they are correctly configured in order to take a dump ?&lt;BR /&gt;&lt;BR /&gt;Have you already had a valid dump ?&lt;BR /&gt;&lt;BR /&gt;Are you sure the UPS is ok ? May be it is just a power failure ?</description>
      <pubDate>Wed, 03 Dec 2008 13:12:00 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317130#M15872</guid>
      <dc:creator>labadie_1</dc:creator>
      <dc:date>2008-12-03T13:12:00Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317131#M15873</link>
      <description>dieter ,&lt;BR /&gt;&lt;BR /&gt; is the systemm disk on the msa1000, if so is anything in the event log on the msa1000 for that time ?&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 03 Dec 2008 13:37:34 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317131#M15873</guid>
      <dc:creator>marsh_1</dc:creator>
      <dc:date>2008-12-03T13:37:34Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317132#M15874</link>
      <description>labadie,&lt;BR /&gt;&lt;BR /&gt;dumpstyle = 9&lt;BR /&gt;dumpbug = 1&lt;BR /&gt;savedump = 0&lt;BR /&gt;bugcheckfatal = 0&lt;BR /&gt;bugreboot = 1&lt;BR /&gt;&lt;BR /&gt;Yes, we have had a valid dump from earlier this year.&lt;BR /&gt;ANA/CRASH SYS$SYSTEM:SYSDUMP.DMP shows&lt;BR /&gt;...&lt;BR /&gt;Dump taken on 16-MAR-2008 09:56:18.03&lt;BR /&gt;...&lt;BR /&gt;So its obvious that nothing has been written yesterday.&lt;BR /&gt;&lt;BR /&gt;UPS is ok, other systems are connected to the same UPS. These systems did not crash.&lt;BR /&gt;&lt;BR /&gt;Power failure was our first idea, too. But we have no idea how this can happen to the cluster nodes without affecting any other system connected to the same UPS/power line.&lt;BR /&gt;&lt;BR /&gt;Dieter</description>
      <pubDate>Wed, 03 Dec 2008 13:46:15 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317132#M15874</guid>
      <dc:creator>dschwarz</dc:creator>
      <dc:date>2008-12-03T13:46:15Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317133#M15875</link>
      <description>mark,&lt;BR /&gt;&lt;BR /&gt;we use separate system disks, both on the MSA1000.&lt;BR /&gt;&lt;BR /&gt;MSA1000 shows &amp;gt;120 days uptime.&lt;BR /&gt;&lt;BR /&gt;There are no events reported on the MSA1000.&lt;BR /&gt;&lt;BR /&gt;Dieter</description>
      <pubDate>Wed, 03 Dec 2008 13:49:13 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317133#M15875</guid>
      <dc:creator>dschwarz</dc:creator>
      <dc:date>2008-12-03T13:49:13Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317134#M15876</link>
      <description>dieter,&lt;BR /&gt;&lt;BR /&gt;  are they in the same rack ? although they have redundant power supplies the ds20 has only one ac input if someone was working in the rack ....?</description>
      <pubDate>Wed, 03 Dec 2008 13:56:01 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317134#M15876</guid>
      <dc:creator>marsh_1</dc:creator>
      <dc:date>2008-12-03T13:56:01Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317135#M15877</link>
      <description>Dieter,&lt;BR /&gt;&lt;BR /&gt;did you have some direct logging of console output?&lt;BR /&gt;&lt;BR /&gt;We once had (AS2100's connected via HSZ50, so relevance questionable, but...) a hardware failure on the cabling (squeezed, and thereby semi-broken connection).&lt;BR /&gt;&lt;BR /&gt;And if the connection between system and disks is gone, how can ANYTHING get written to any disk?&lt;BR /&gt;On the console there was an error code (IIRC, error 660). Our field engeneer was able to diagnose that as a flaky connection.&lt;BR /&gt;And yes, the system DID get back online, only to go down again the next day. It was that second crash that we were able to re-trace the error.&lt;BR /&gt;Maybe, maybe not applicable in your case, but, fwiw.&lt;BR /&gt;&lt;BR /&gt;Proost.&lt;BR /&gt;&lt;BR /&gt;Have one on me.&lt;BR /&gt;&lt;BR /&gt;jpe</description>
      <pubDate>Wed, 03 Dec 2008 14:02:08 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317135#M15877</guid>
      <dc:creator>Jan van den Ende</dc:creator>
      <dc:date>2008-12-03T14:02:08Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317136#M15878</link>
      <description>dieter,&lt;BR /&gt; &lt;BR /&gt; any other common components in the fabric ? is an embedded switch in the msa or two separate switches / dual hbas in ds20's ?&lt;BR /&gt;</description>
      <pubDate>Wed, 03 Dec 2008 14:07:09 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317136#M15878</guid>
      <dc:creator>marsh_1</dc:creator>
      <dc:date>2008-12-03T14:07:09Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317137#M15879</link>
      <description>Mark,&lt;BR /&gt;&lt;BR /&gt;they are in the same rack.&lt;BR /&gt;Nobody was working there, that's sure.&lt;BR /&gt;&lt;BR /&gt;Jan,&lt;BR /&gt;&lt;BR /&gt;we don't have console logging.&lt;BR /&gt;We have two embedded  switches in the MSA1000 and two HBAs in each DS20.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 03 Dec 2008 14:19:13 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317137#M15879</guid>
      <dc:creator>dschwarz</dc:creator>
      <dc:date>2008-12-03T14:19:13Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317138#M15880</link>
      <description>Dieter,&lt;BR /&gt;&lt;BR /&gt;the only chance left is to have a look in ERRLOG.SYS. You'll need DECevent to decode the errlog file, as ANAL/ERR/ELV will probably not be able to translate the errlog entries from a crash.&lt;BR /&gt;&lt;BR /&gt;What's the setting of the console environment variable AUTO_ACTION ?&lt;BR /&gt;&lt;BR /&gt;$ WRITE SYS$OUTPUT F$GETENV("AUTO_ACTION")&lt;BR /&gt;&lt;BR /&gt;Volker.</description>
      <pubDate>Wed, 03 Dec 2008 14:51:15 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317138#M15880</guid>
      <dc:creator>Volker Halle</dc:creator>
      <dc:date>2008-12-03T14:51:15Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317139#M15881</link>
      <description>Dieter,&lt;BR /&gt;&lt;BR /&gt;$ ANAL/ERR/ELV TRANSLATE/INCL=BUGCHECK/SINCE=...&lt;BR /&gt;&lt;BR /&gt;Does SYS$SYSTEM:SYS$ERRLOG.DMP exist and is it big enough ?&lt;BR /&gt;&lt;BR /&gt;Volker.</description>
      <pubDate>Wed, 03 Dec 2008 14:55:10 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317139#M15881</guid>
      <dc:creator>Volker Halle</dc:creator>
      <dc:date>2008-12-03T14:55:10Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317140#M15882</link>
      <description>dieter,&lt;BR /&gt;&lt;BR /&gt; how do you know they crashed and were'nt rebooted by someone ?&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 03 Dec 2008 15:03:36 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317140#M15882</guid>
      <dc:creator>marsh_1</dc:creator>
      <dc:date>2008-12-03T15:03:36Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317141#M15883</link>
      <description>Regarding the UPS coverage: both systems, the storage controllers, and the network infrastructure required for clustering?  (A network glitch or VLAN outage can crash a cluster.) &lt;BR /&gt;&lt;BR /&gt;As there's little evidence here of what transpired; if no dump and no error logs and no controller logs...  Or even if so...&lt;BR /&gt;&lt;BR /&gt;Set up logging to capture these events as described elsewhere and also capture via the console serial lines.  &lt;BR /&gt;&lt;BR /&gt;Ensure both boxes are set to RESTART/REBOOT, and not to HALT, nor to REBOOT -- this via the SRM console  AUTO_ACTION variable.&lt;BR /&gt;&lt;BR /&gt;Patch to current.  For on the hosts and controllers.&lt;BR /&gt;&lt;BR /&gt;I'd be seriously tempted to tie the MSAs into the logging, as well as the UPS.&lt;BR /&gt;&lt;BR /&gt;Move down the racks, and up-rate the monitoring on and the UPS and related configurations of the other boxes in a similar fashion.   Rack-mount boxes have a nasty habit of incurring unintentional and incremental changes, and this can lead to network switches that aren't covered by UPS, storage controllers that aren't, or any number of other subtle "adjustments" to the intended configuration.&lt;BR /&gt;&lt;BR /&gt;Then wait for the next one.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 03 Dec 2008 15:31:52 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317141#M15883</guid>
      <dc:creator>Hoff</dc:creator>
      <dc:date>2008-12-03T15:31:52Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317142#M15884</link>
      <description>Mark, &lt;BR /&gt;&lt;BR /&gt;SYS$MANAGER:OPERATOR.LOG would show entries like this:&lt;BR /&gt;_BUCL02$OPA0:, BUCL02 shutdown was requested by the operator.&lt;BR /&gt;It doesn't&lt;BR /&gt;&lt;BR /&gt;DIAG/SIN.. would show entries like this:&lt;BR /&gt;Entry Type                       65. Volume Dismount&lt;BR /&gt;&lt;BR /&gt;SWI Minor class                   5. Volume dismount&lt;BR /&gt;It doesn't&lt;BR /&gt;&lt;BR /&gt;Volker,&lt;BR /&gt;AUTO_ACTION is RESTART&lt;BR /&gt;&lt;BR /&gt;Will ANAL/ERR/ELV..... give me more information than DIAG.... does?&lt;BR /&gt;I don't think so and DIAG shows a time stamp entry at 23:50 and configuration informations at 23:58.&lt;BR /&gt;</description>
      <pubDate>Wed, 03 Dec 2008 15:38:07 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317142#M15884</guid>
      <dc:creator>dschwarz</dc:creator>
      <dc:date>2008-12-03T15:38:07Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317143#M15885</link>
      <description>dieter,&lt;BR /&gt;&lt;BR /&gt;  i know, had to ask though, would'nt be the first time somebody who should'nt have was playing around and it was overlooked  ... :-)&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 03 Dec 2008 15:46:10 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317143#M15885</guid>
      <dc:creator>marsh_1</dc:creator>
      <dc:date>2008-12-03T15:46:10Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317144#M15886</link>
      <description>Dieter,&lt;BR /&gt;&lt;BR /&gt;then these 2 OpenVMS system really did not 'crash'. They may just have 'booted' without a prior crash. And this most likely can only caused by a hardware event/signal. Look for something similar to those 2 machines, which could have affected both machines at the same time.&lt;BR /&gt;&lt;BR /&gt;Only console information would have been able to provide more info, if there was really anything to tell. If you see just an INIT message on a console of a running system, you still have to wonder about the underlying reason.&lt;BR /&gt;&lt;BR /&gt;Another piece of info to check would be the configuration entries logged at boot time. Maybe there are some status bits, which would tell more about the preceeding events...&lt;BR /&gt;&lt;BR /&gt;Volker.</description>
      <pubDate>Wed, 03 Dec 2008 16:17:50 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317144#M15886</guid>
      <dc:creator>Volker Halle</dc:creator>
      <dc:date>2008-12-03T16:17:50Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317145#M15887</link>
      <description>Conclusion:&lt;BR /&gt;&lt;BR /&gt;We will try to capture as much information as possible (console logging, msa1000 logging,...)&lt;BR /&gt;and wait for the next time as Hoff wrote.&lt;BR /&gt;&lt;BR /&gt;We haven't seen something like this for the last 6 years, so there is a chance to survive the next decade without seeing it again.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 04 Dec 2008 07:49:42 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317145#M15887</guid>
      <dc:creator>dschwarz</dc:creator>
      <dc:date>2008-12-04T07:49:42Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317146#M15888</link>
      <description>Dieter,&lt;BR /&gt;Do you have any CLUE listing files in SYS$ERRORLOG: ?&lt;BR /&gt;These should get written at system boot time.</description>
      <pubDate>Thu, 04 Dec 2008 11:19:04 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317146#M15888</guid>
      <dc:creator>Peter Elliott</dc:creator>
      <dc:date>2008-12-04T11:19:04Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317147#M15889</link>
      <description>Peter, &lt;BR /&gt;there are only some old CLUE$node_date_time.LIS files. They don't help.&lt;BR /&gt;CLUE$HISTORY.DAT does not contain any information related to the problem.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 04 Dec 2008 11:36:16 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317147#M15889</guid>
      <dc:creator>dschwarz</dc:creator>
      <dc:date>2008-12-04T11:36:16Z</dc:date>
    </item>
    <item>
      <title>Re: OpenVMS Cluster crash</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317148#M15890</link>
      <description>Dieter,&lt;BR /&gt;&lt;BR /&gt;While you are checking things, please carefully check the grounding of the systems, in addition to the power.&lt;BR /&gt;&lt;BR /&gt;Grounding problems can cause all manner of failures, many of them seemingly mysterious. A grounding failure can be something as simple as a corroded connection.&lt;BR /&gt;&lt;BR /&gt;- Bob Gezelter, &lt;A href="http://www.rlgsc.com" target="_blank"&gt;http://www.rlgsc.com&lt;/A&gt;</description>
      <pubDate>Thu, 04 Dec 2008 12:13:59 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/openvms-cluster-crash/m-p/4317148#M15890</guid>
      <dc:creator>Robert Gezelter</dc:creator>
      <dc:date>2008-12-04T12:13:59Z</dc:date>
    </item>
  </channel>
</rss>

