Operating System - OpenVMS
1830061 Members
2937 Online
109998 Solutions
New Discussion

Re: TCPIP address in VMS crashdump.

 
Terry Beales
Occasional Contributor

TCPIP address in VMS crashdump.

I am investigating a VMS crash that appears to have been caused by something on the network initiating over 8000 IP connections. This led to about 9Mb of nonpaged pool being consumed by BG device UCB's that no-longer have any links to the IP addresses of the remote node.
I can find the IP address at @(@(ucb+1e4)+10)+20 when the device is still known to TCPIP, but in the UCB's queued for deletion the link at UCB+1E4 is now zero.
Does anyone know if the socket structures that were once attached to these UCB's might still exist, and if so, how to find them ?

John Travell, logged in to Terry Beales account.
8 REPLIES 8
Craig A Berry
Honored Contributor

Re: TCPIP address in VMS crashdump.

Just a desperate guess, but if anything resulted in a process termination before the crash, the Remote ID in accounting might be what you're looking for.
Volker Halle
Honored Contributor

Re: TCPIP address in VMS crashdump.

John,

no, I don't know. But I can also poke around in some system/dump data structures, if you let me know your OpenVMS and TCPIP version. Could you maybe dump and provide a little bit of the data structures you've looked at to come up with that 1-liner SDA command.

There is the SDA> TCPIP TAG ALL to identify all TCPIP data structures. Using this command, I was able to follow the INET_UCB -> SOCKET -> PCB structure and found the IP address at PCB+48 (TCPIP V5.3 ECO 3).

SDA> TCPIP SHOW DEV/FULL won't shown them, right ?

Volker.
Terry Beales
Occasional Contributor

Re: TCPIP address in VMS crashdump.

Craig.
Sorry, accounting has not been recording process terminations, so no evidence there, good try though.

Volker.
VMS 7.2-1, TCPIP 5.1eco4
The crash was an INSF_NONPAGED, the lookaside for the desired packet size was empty and the largest packet on the freelist was too small. NPP was 30% expanded. What was consuming so much more NPP than normal ? "show pool/sum" revealed over 10Mb of UCB's.
Output "show pool/head" to a file, reformat the file to a procedure doing a "show device/address" for each UCB then sorting the results revealed well over 8000 BG device UCB's.
TCPIP only admits knowing about 23.
All of the remainder have UCB$L_ASTQFL pointing to IOC_STD$FREEUCB+20.
In the UCB's that TCPIP admits knowing an IP address, @(@(UCB+1E4)+10)+2C gives me the encoded IP address that TCPIP reports. It turns out this is at PCB+2C. I only have 36 PCB's, which do not match the 8000+ BG sockets.
The remaining question amounts to this:- Does this version of TCPIP store the IP address of an incoming connect anywhere other than in a PCB? If not, it looks as though the IP address(es) of whatever caused the creation of those 8000+ spurious sockets (BG devices) are no-longer available.
SDA> TCPIP TAG ALL does not help much, it fails to tag the one-time BG device UCB's that it has forgotten about.
While I know exactly why NPP was blown away, I cannot take that final step of identifying the perpetrator. Shame, but that's life...

John Travell.
Jan van den Ende
Honored Contributor

Re: TCPIP address in VMS crashdump.

John,,
sorry no explanation, but memories of something similar.

Also VMS 7.2-1, TCPIP 5.1, can't remember ECO.

_IF_ "we" somehow succeeded in attempting some threshold number of ftp connections simultanuously (from Citrix servers), the # of sockets on VMS exploded, and the Citrix server slowed down, and after some time crashed.

We were able to demonstrate that those two phenomena were connected.

Upgrading to TCPIP 5.3 (straight to ECO-2, then current) solved it for us (both VMS & Citrix).

fwiw,

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Jan van den Ende
Honored Contributor

Re: TCPIP address in VMS crashdump.

Sorry, forgot 1 thing:

Why to TCPIP 5.3?

This happened _DURING_ our rolling upgrade of a bunch of patches, when part of cluster already had been done, and part were waiting to reboot patched System disk. Troubled node was not yet updated.

Our luck that we managed to push on before we were formally accused:
when the relation was recognised to exist, we were told to revert the patches.
Just having rolled forward, and not seeing the issue any more, it was left at "Any trouble again, you roll back" Which did not occur. Pfewwww!!!

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Volker Halle
Honored Contributor

Re: TCPIP address in VMS crashdump.

John,

there seem to be some fields further down in the PCB, which contain the local/remote port and local/remote IP address.

While the links from the socket back to the INET UCB etc. may have been cleared, when freeing those data structures, the rest of the fields in those data structures certainly do still exist (if you are not that unlucky, that all those 8000 sockets and PCBs have actually been re-used by the time of the crash).

If you can confirm this with an existing IP connection and it's PCB in your crash (and TCPIP version), it should be possible to find those deallocated PCBs via the local IP address through an appropriate SDA> SEARCH command.

Volker.
John Travell
Valued Contributor

Re: TCPIP address in VMS crashdump.

I am no-longer on this site, and do not have access to the dump file any more.
While I can ask Terry B to run some commands for me, he is not exactly a CDA guru. That's why I was there.
Q: has the PCB size changed in recent versions ? I see 1024 bytes on V7.3-1. With no access to a V7.2-1 dump or symbol table files I cannot easily verify if it was the same then.
In any case, there were no large numbers of PCB's in the crash to be able to do much matching. Regrettably there was no network monitoring going on at the time, so the perpetrator will forever remain a mystery.

I must remind Terry to assign some points.

JT:
comarow
Trusted Contributor

Re: TCPIP address in VMS crashdump.

There is a new tcpip ECO for the most recent version of TCPIP.


Why reinvent the wheel?