Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

VMS queues are stacked in one node

matousekz
Occasional Visitor

VMS queues are stacked in one node

Hi,

I have OpenVMS AXP Version V7.3-2.
The problem is that on one node suddenly stoped working the queues subsystem - no output for 5 minutes:

DV31:SYSTEM> sh que /all


Interrupt



I've noticed another problem at that time - it was not possible connect to another nodes of the cluster from this particular node:

DV31:SYSTEM> seth SMDV33


%SYSTEM-F-SHUT, remote node no longer accepting connects


But from other nodes, there is no problem.

I've thinked about problem in inter-cluster communication but there were all nodes in "show cluster" command output. And monitor cluster shows all nodes too.

After the reboot of the node all started to work correctly but the situation reeated again in few days.

Can somebody advice me what to check for better localization of real source of problem?

I alredy checked the ERRLOG.SYS and operator.log but haven't find anything.


Thanks Zbynek
7 REPLIES
Robert Gezelter
Honored Contributor

Re: VMS queues are stacked in one node

Zbynek,

My first recommendation would be to generate a crash dump from the malfunctioning node. I would then strongly recommend a opening a support case.

There are many possibilities, without the evidence (the dump) it would just be speculation.

- Bob Gezelter, http://www.rlgsc.com
Volker Halle
Honored Contributor

Re: VMS queues are stacked in one node

Zbynek,

SHOW QUEUE and SET HOST use different protocols and communication mechanisms, so it may not be so easy to find the underlying problem.

$ SET HOST other_node exiting with %SYSTEM-F-SHUT seems to indicate, that DECnet on the local node is being shut down (or had not been completely started ?).

If the SHO QUE command was hung on SMDV31, but worked on the other nodes at that time, this would indicate a communication problem with JOB_CONTROL on the local node and/or a problem between JOB_CONTROL on the local node and the QUEUE_MANAGER process - wherever it was running.

As the node was still in the cluster and you were able to issue (some) commands, you can try to do some further debugging next time, before forcing a crash as the ultimate step:

$ SHOW SYS - any processes in unusual state ?

- JOB_CONTROL not in HIB ? If so, use SDA to find out what it's waiting for.

$ SHOW MEM/POOL/FULL - pool expanded ?

Volker.
Petr Spisek
Regular Advisor

Re: VMS queues are stacked in one node

Hi,
- check if you have the latest ECOs on this cluster
- watch your resources esp. locking management
- how looks your network infrastructure ... no errors?

Petr
matousekz
Occasional Visitor

Re: VMS queues are stacked in one node

there are some errors on LANCP level:

DV31:SYSTEM> mc lancp sh dev eia0 /count

Device Counters EIA0:
Value Counter
----- -------
1534323 Seconds since last zeroed
128580009215 Bytes received
150004825651 Bytes sent
520743329 Packets received
419676960 Packets sent
313196472 Multicast bytes received
88155738 Multicast bytes sent
2160053 Multicast packets received
657357 Multicast packets sent
674 Unrecognized unicast destination packets
766631 Unrecognized multicast destination packets
0 Unavailable station buffers
0 Unavailable user buffers
1375629 Alignment errors
1397245 Frame check errors (14-AUG-2007 04:45:55.23)
0 Frame size errors
0 Frame status errors
0 Frame length errors
0 Frame too long errors
0 Data overruns
0 Send data length errors
963162 Receive data length errors
0 Transmit underrun errors
0 Transmit failures
0 Carrier check failures
0 Station failures
0 Initially deferred packets sent
0 Single collision packets sent
0 Multiple collision packets sent
0 Excessive collisions
0 Late collisions
0 Collision detect check failures
1 Link up transitions (27-JUL-2007 10:07:33.34)
0 Link down transitions
27-JUL-2007 10:07:32 Time of last generic transmit error
None Time of last generic receive error

But nothing in TCPIP putput. I've checked the cisco switches but there are no errors too. Strange.


The system have been updated to last VMS patch level at 19 Jun:
----------------------------------- ----------- ----------- --------------------
PRODUCT KIT TYPE OPERATION DATE AND TIME
----------------------------------- ----------- ----------- --------------------
LCMG AXPVMS PYTHON V2.3-4B Full LP Install 20-JUN-2007 12:56:49
LCMG AXPVMS ZLIB V1.2-1B Full LP Install 20-JUN-2007 12:54:25
LCMG AXPVMS LIBBZ2 V1.0-2B Full LP Install 20-JUN-2007 12:54:16
DEC AXPVMS X25 V1.6-1 Full LP Install 19-JUN-2007 21:49:02
DEC AXPVMS WANDD V1.6-1 Full LP Install 19-JUN-2007 21:48:52
DEC AXPVMS VMS732_SYS V10.0 Patch Install 19-JUN-2007 21:48:44
DEC AXPVMS VMS732_LOADSS V1.0 Patch Install 19-JUN-2007 21:48:26
DEC AXPVMS VMS732_LMF V2.0 Patch Install 19-JUN-2007 21:48:19
DEC AXPVMS VMS732_INSTAL V2.0 Patch Install 19-JUN-2007 21:48:11
DEC AXPVMS VMS732_F11X V5.0 Patch Install 19-JUN-2007 21:48:04
DEC AXPVMS VMS732_ACRTL V2.0 Patch Install 19-JUN-2007 21:47:57
DEC AXPVMS TCPIP_ECO V5.4-155 Patch Install 19-JUN-2007 21:44:52
DEC AXPVMS DNVOSIECO02 V7.3-2 Patch Install 19-JUN-2007 21:43:50
DEC AXPVMS DFU V3.1 Full LP Install 19-JUN-2007 21:43:29
DEC AXPVMS DFU V2.6 Full LP Remove 19-JUN-2007 21:43:29
DEC AXPVMS VMS732_UPDATE V6.0 Patch Install 19-JUN-2007 21:43:15
DEC AXPVMS VMS732_PCSI V3.0 Patch Install 19-JUN-2007 21:40:43
CPQ AXPVMS CDSA V2.0-109 Full LP Install 19-JUN-2007 20:11:57
DEC AXPVMS DECNET_OSI V7.3-2 Full LP Install 19-JUN-2007 20:11:57
DEC AXPVMS DWMOTIF V1.3-1 Full LP Install 19-JUN-2007 20:11:57
DEC AXPVMS OPENVMS V7.3-2 Platform Install 19-JUN-2007 20:11:57
DEC AXPVMS TCPIP V5.4-15 Full LP Install 19-JUN-2007 20:11:57
DEC AXPVMS VMS V7.3-2 Oper System Install 19-JUN-2007 20:11:57
HP AXPVMS KERBEROS V2.0-6 Full LP Install 19-JUN-2007 20:11:57
DEC AXPVMS DECNET_OSI V7.2-1 Full LP Remove 19-JUN-2007 20:11:57
DEC AXPVMS DNVOSIECO02 V7.2 Patch Remove 19-JUN-2007 20:11:57
DEC AXPVMS DNVOSIECO03 V7.2 Patch Remove 19-JUN-2007 20:11:57
DEC AXPVMS DNVOSIECO06 V7.2 Patch Remove 19-JUN-2007 20:11:57
DEC AXPVMS DWMOTIF V1.2-5 Full LP Remove 19-JUN-2007 20:11:57
DEC AXPVMS OPENVMS V7.2-1 Platform Remove 19-JUN-2007 20:11:57
DEC AXPVMS TCPIP V5.0-11 Full LP Remove 19-JUN-2007 20:11:57
DEC AXPVMS TCPIP_ECO V5.1-151 Patch Remove 19-JUN-2007 20:11:57
DEC AXPVMS TCPIP_ECO V5.0-113 Patch Remove 19-JUN-2007 20:11:57
DEC AXPVMS TCPIP_ECO V5.0-111 Patch Remove 19-JUN-2007 20:11:57
DEC AXPVMS VMS V7.2-1 Oper System Remove 19-JUN-2007 20:11:57
DEC AXPVMS VMS721_ACRTL V4.0 Patch Remove 19-JUN-2007 20:11:57
DEC AXPVMS VMS721_ACRTL V2.0 Patch Remove 19-JUN-2007 20:11:57
DEC AXPVMS VMS721_AMACRO V1.0 Patch Remove 19-JUN-2007 20:11:57
DEC AXPVMS VMS721_AUDSRV V2.0 Patch Remove 19-JUN-2007 20:11:57
DEC AXPVMS VMS721_BACKUP V2.0 Patch Remove 19-JUN-2007 20:11:57

Volker Halle
Honored Contributor

Re: VMS queues are stacked in one node

Zbynek,

the errors shown on EIA0 seem to indicate, that there is a problem with the speed settings between your LAN interface and the network switch.

Consider to set both of them to AUTO-NEGOTIATE. This should work o.k. with V7.3-2.

You can do this at console level

>>> SET eia0_mode auto

or even with LANCP in the running system.

Volker.
Anton van Ruitenbeek
Trusted Contributor

Re: VMS queues are stacked in one node

Zbynek,

I should not considure to set te EIA0 on auto or other of these sugestions.
Set these ports always hard values. If you want the fastest options set these to FullDuplex/100. Make sure the switch can support this. If it is a managed switch, use the console to set the correct port to full/100. If both of the ports (OpenVMS and switch) are on auto the most funniest things can happen. Even after some days these ports can get out of handshake and the connection is shaking! Always set the ports of the server hardcoded. If it is a manageble switch you 'may' set it to auto but on the switch make it hardcoded. Make sure, never use both auto !

AvR
NL: Meten is weten, maar je moet weten hoe te meten! - UK: Measuremets is knowledge, but you need to know how to measure !
Steven Schweda
Honored Contributor

Re: VMS queues are stacked in one node

> Consider to set both of them to AUTO-NEGOTIATE.

> Set these ports always hard values.

My network switches are always the cheapest
junk I can find, and I've _never_ had any
trouble using auto-negotiate with any of my
old-junk Alpha systems, with either EW or EI
interfaces.

These:

1375629 Alignment errors
1397245 Frame check errors (14-AUG-2007 04:45:55.23)

do seem to suggest a network problem of some
sort. These should be zero (or very close to
zero).

Before I changed anything, I'd _look_ at the
settings:

mcr lancp show devi /char

and whatever you need to do at the network
switch where it's connected.

And/or try something simple, like moving the
cable to a different switch port, or changing
the cable or the NI card.