Operating System - OpenVMS
1753687 Members
5643 Online
108799 Solutions
New Discussion юеВ

Cluster member crash while doing backup

 
SOLVED
Go to solution
Vladimir Fabecic
Honored Contributor

Cluster member crash while doing backup

Hello
First crash happened while doing backup of ODS-5 disk (DISK$ORACLE10) to save set on another disk.
Second crash happened when doing backup of that disk to tape.
Disk had large number of files.
Before crash node had big ENQ/DEQ rate (cca. 8-9000)
After many files were deleted from that disk, backup works OK.
OS version is 8.2 (patched), 2 X ES80, storage is XP 10000.
See attached
In vino veritas, in VMS cluster
12 REPLIES 12
Vladimir Fabecic
Honored Contributor

Re: Cluster member crash while doing backup

Sorry, see attached file.
In vino veritas, in VMS cluster
labadie_1
Honored Contributor

Re: Cluster member crash while doing backup

I remember something similar on Alpha Vms 7.3-2, where shuting down a Rdb database which had a huge number of locks led to a crash.
We had a reproducer, and a patch had then been issued by HP.

I will dig in my files to get more details.
atul sardana
Frequent Advisor

Re: Cluster member crash while doing backup

20415AC1 02C1 SYSTEM SYSTEM RWSCS 5 8E43B840 8E90E000 3322



Above given output of that specific process , which was gone in resource wait and after that system was crashed

Atul Sardana
I love VMS
atul sardana
Frequent Advisor

Re: Cluster member crash while doing backup

DEAR Please check $show system for all process during backup.......

in first crash reason is RWSCS Resource Wait for System Communications Services

If a process seems to be in RWSCS for a 'long time', it usually means it's not hung in this state, but getting into and out of RWSCS so quickly, that you don't see the non-RWSCS state (just check increasing IO count or CPU cycles to confirm, that the process is not hung).

Atul Sardana
I love VMS
Jur van der Burg
Respected Contributor

Re: Cluster member crash while doing backup

One crash is in SPISHR, part of monitor. I recall that there was a patch for an issue in that image, but I would need a look in the dump to be sure.

The second crash in sys$pgadriver, part of the fibre channel drivers. Again, without a dump it's hard to tell if this is a known issue.

At least make sure that the latest patches for Monitor and Scsi are installed.

RWSCS has nothing to do with the crash, it's a completetly normal process state.

Jur.
Vladimir Fabecic
Honored Contributor

Re: Cluster member crash while doing backup

Here is the output of PRODUCT SHOW HISTORY
$ product show history
----------------------------------- ----------- ----------- --------------------
PRODUCT KIT TYPE OPERATION DATE AND TIME
----------------------------------- ----------- ----------- --------------------
DEC AXPVMS TCPIP V5.5-11ECO2 Full LP Install 06-MAR-2007 19:20:16
DEC AXPVMS TCPIP V5.5-11ECO1 Full LP Remove 06-MAR-2007 19:20:16
DEC AXPVMS DWMOTIF_ECO01 V1.5 Patch Install 06-MAR-2007 19:07:36
DEC AXPVMS VMS82A_ACRTL V2.0 Patch Install 06-MAR-2007 18:50:05
DEC AXPVMS VMS82A_DDTM V2.0 Patch Install 06-MAR-2007 18:50:05
DEC AXPVMS VMS82A_FIBRE_SCSI V3.0 Patch Install 06-MAR-2007 18:50:05
DEC AXPVMS VMS82A_LAN V2.0 Patch Install 06-MAR-2007 18:50:05
DEC AXPVMS VMS82A_UPDATE V6.0 Patch Install 06-MAR-2007 18:41:09
DEC AXPVMS FASTVM122 V1.2-21 Full LP Install 19-DEC-2006 05:25:33
IBM AXPVMS WMQCLIENT V5.30 Full LP Install 19-DEC-2006 05:12:22
HP AXPVMS AVAIL_MAN_ANA V2.6-AA Full LP Install 19-DEC-2006 04:13:59
HP VMS AVAIL_MAN_COL V2.6-AV Full LP Install 19-DEC-2006 04:13:30
DEC AXPVMS JAVA142 V1.4-24P5 Full LP Install 19-DEC-2006 02:38:28
DEC AXPVMS JAVA142 V1.4-24P2 Full LP Remove 19-DEC-2006 02:38:28
DEC AXPVMS JAVA142 V1.4-24P2 Full LP Install 19-DEC-2006 02:37:34
DEC AXPVMS JAVA142 V1.4-24 Full LP Remove 19-DEC-2006 02:37:34
DEC AXPVMS JAVA142 V1.4-24 Full LP Install 19-DEC-2006 02:36:40
DEC AXPVMS VMS82A_XFC V2.0 Patch Install 19-DEC-2006 02:19:33
DEC AXPVMS VMS82A_UPDATE V5.0 Patch Install 19-DEC-2006 02:18:39
DEC AXPVMS VMS82A_UPDATE V4.0 Patch Install 19-DEC-2006 02:01:54
DEC AXPVMS TCPIP V5.5-11ECO1 Full LP Install 11-OCT-2006 11:02:32
DEC AXPVMS TCPIP V5.5-11 Full LP Remove 11-OCT-2006 11:02:32
DEC AXPVMS VMS82A_ACRTL V1.0 Patch Install 11-OCT-2006 11:00:51
DEC AXPVMS VMS82A_BASRTL V2.0 Patch Install 11-OCT-2006 10:59:15
DEC AXPVMS VMS82A_CLUSTER V1.0 Patch Install 11-OCT-2006 10:58:48
DEC AXPVMS VMS82A_DEBUG V1.0 Patch Install 11-OCT-2006 10:58:10
DEC AXPVMS VMS82A_FIBRE_SCSI V2.0 Patch Install 11-OCT-2006 10:57:36
DEC AXPVMS VMS82A_INSTAL V1.0 Patch Install 11-OCT-2006 10:57:03
DEC AXPVMS VMS82A_LMF V2.0 Patch Install 11-OCT-2006 10:56:32
DEC AXPVMS VMS82A_LOADSS V2.0 Patch Install 11-OCT-2006 10:56:06
DEC AXPVMS VMS82A_OPCOM V1.0 Patch Install 11-OCT-2006 10:55:39
DEC AXPVMS VMS82A_QMAN V1.0 Patch Install 11-OCT-2006 10:55:12
DEC AXPVMS VMS82A_TDF V2.0 Patch Install 11-OCT-2006 10:54:41
DEC AXPVMS VMS82A_TZ V1.0 Patch Install 11-OCT-2006 10:53:57
DEC AXPVMS VMS82A_SYS V5.0 Patch Install 11-OCT-2006 10:52:39
DEC AXPVMS VMS82A_UPDATE V3.0 Patch Install 11-OCT-2006 10:51:47
DEC AXPVMS VMS82A_PCSI V1.0 Patch Install 11-OCT-2006 10:50:24
CPQ AXPVMS CDSA V2.1-331 Full LP Install 11-OCT-2006 09:48:08
DEC AXPVMS DECNET_PHASE_IV V8.2 Full LP Install 11-OCT-2006 09:48:08
DEC AXPVMS DWMOTIF V1.5 Full LP Install 11-OCT-2006 09:48:08
DEC AXPVMS OPENVMS V8.2 Platform Install 11-OCT-2006 09:48:08
DEC AXPVMS TCPIP V5.5-11 Full LP Install 11-OCT-2006 09:48:08
DEC AXPVMS VMS V8.2 Oper System Install 11-OCT-2006 09:48:08
HP AXPVMS AVAIL_MAN_BASE V8.2 Full LP Install 11-OCT-2006 09:48:08
HP AXPVMS KERBEROS V2.1-72 Full LP Install 11-OCT-2006 09:48:08
HP AXPVMS TDC_RT V2.1-69 Full LP Install 11-OCT-2006 09:48:08
----------------------------------- ----------- ----------- --------------------

46 items found
In vino veritas, in VMS cluster
John Travell
Valued Contributor
Solution

Re: Cluster member crash while doing backup

OK, crash1 needs access to the listings to see what SYS$PGADRIVER was doing at the time.
Crash2 reason code was a 444, page read error. This means that an image tried to access a page that was not resident in memory, and when VMS tried to retrieve that page it failed. The non-resident page could have been either in the image file or in a pagefile.
In either case, the primary issue is that the relevant disk was not accessible at the time. Depending on where the respective disks are, this could be a side effect of the same problem that caused crash1. A problem in the access path to those disks.
If you have HP support, escalate both crashes. You may include my comments above if you wish.
JT:
Volker Halle
Honored Contributor

Re: Cluster member crash while doing backup

Vladimir,

please remember to just post the CLUE$COLLECT:CLUE$node_ddmmyy_hhmm.LIS file (as you've correctly done for the 2nd crash), if you ask for advice on a system crash. Posting all kind of SDA output is typically no help at all. If more detailled information is needed, you will be specifically asked for it.

The INVEXCPTN crash at SYS$PGADRIVER+08C88 needs to be escalated to HP OpenVMS engineering, IF you are running the most recent SYS$PGADRIVER available for V8.2:

SYS$PGADRIVER linked 12-JAN-2007 19:19:41.66 from VMS82A_FIBRE_SCSI-V0300

As you didn't post the CLUE file for this crash, I can't tell, which driver you were running at the time of the crash, but your product history indicates you've installed that patch, but only AFTER the first crash in SYS$PGADRIVER !

The 2nd crash occured due to a page read error - as John has already indicated - and the failure was a SS$_VOLINV. You would need to find out, which disk the data was to be read in from and what happened to that disk. Check the errorlog for disk errors preceeding the crash !

Consider to also provide the other 2 CLUE files (CETINA and DINARA from 5-MAR-2007) for more detailled information about those crashes.

Volker.
Vladimir Fabecic
Honored Contributor

Re: Cluster member crash while doing backup

Hello guys
I am very sorry I did not reply before.
I was very sick!
I did escalate a case to VMS support.
I will post the results when I get the response.
In vino veritas, in VMS cluster