- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Strange things in DECNET+
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-06-2005 06:30 PM
тАО12-06-2005 06:30 PM
Strange things in DECNET+
Further info in attachment.
Anyone any idea what happened and how to investigate after the reboot ?
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-06-2005 06:51 PM
тАО12-06-2005 06:51 PM
Re: Strange things in DECNET+
as You can see, You have defined maximum tranport connections are defined as 500
So the limit of connections is reached.
You can change this with sys$startup:net$configure or by editing the file SYS$SPECIFIC:[SYSMGR]NET$NSP_TRANSPORT_STARTUP.NCL;
As a Guideline:
select 1000 transport connections with a maximum Window of 20 and maximum receive buffers of 20000.
Be aware tat maximum Window has a upper limit of 65535 (?? not absoluteli sure. It may be less)
Hope that helps
Regards
Heinz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-06-2005 07:22 PM
тАО12-06-2005 07:22 PM
Re: Strange things in DECNET+
Forgot to mention that this is a AS1000 and that 500 is very high for this node.
Normally about 50 connections are open.
But I killed (almost) every process using connections and still 500 were used. 1 process I killed freed about 40 connections but they were taken again within 1 minute.
So, I guess there was some kind of attack in decnet.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-06-2005 11:37 PM
тАО12-06-2005 11:37 PM
Re: Strange things in DECNET+
I would recommend NOT killing processes, but first displaying the list of active connections (and their originating and receiving processes) to a file.
Deleting processes is like washing down a crime scene, it destroys evidence of what is happening (or has happened).
The syntax for DECnet+ escapes me at the moment, but the Phase IV (NCP) syntax would be SHOW KNOWN LINKS.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-07-2005 12:19 AM
тАО12-07-2005 12:19 AM
Re: Strange things in DECNET+
I tried that but the programs (ncl and net$mgmt) both hanged. So, I tried to kill the processes 1 by 1 until I killed the one who held the connections.
In the mean time I found out that many other nodes logged "reject received" in the operator log file.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-07-2005 09:54 AM
тАО12-07-2005 09:54 AM
Re: Strange things in DECNET+
Is there anything in the operator log or security audit of failed connections or attempts?
Do you have any monitoring that might show what time the additional connections started?
Was any processes in a Resource Wait state? (decnet primarily)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-07-2005 06:06 PM
тАО12-07-2005 06:06 PM
Re: Strange things in DECNET+
---------------
I Found the TNS1 process still active and tried to kill it. That restarted the
system.
MXM01/MGRWVW>stop/id=000000A0
---------------
But:
000000A0 TCPIP$INETACP HIB 10 691 0 00:00:13.57 217 144
0000012E AUDIT_CLIENT LEF 6 1469 0 00:00:06.41 323 176
0001B62F TCPIP$TNS1 HIB 6 120 0 00:00:00.27 532 32
So, you killed INETACP which I guess is hooked rather deeply in the kernel.
Edwin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-07-2005 06:10 PM
тАО12-07-2005 06:10 PM
Re: Strange things in DECNET+
The monitoring was stuck itself.
The process in RWxxx was a TPU session.
No audit alarm.
Nothing special in accounting.
No log files with other error messages (on client + server).
Because almost all decnet using processes were killed and still 500 connections were used, I think it must be a decnet bug. All nodes that were connected in decnet still had given messages to the node (were accepted and found back afterwards) but also received rejected messages.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-07-2005 06:12 PM
тАО12-07-2005 06:12 PM
Re: Strange things in DECNET+
Very good. I made that mistake. But even that should not halt the system. Why was there no crash ?
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-07-2005 09:01 PM
тАО12-07-2005 09:01 PM
Re: Strange things in DECNET+
>> halted CPU 0
>> halt code = 2
>> kernel stack not valid halt
>> PC = ffffffff801551a4
probably because the designers decided that after a kernel stack corruption it was to dangerous to perform even a dump. It could corrupt the disk if the dump code or parameters got wierd.
Edwin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-07-2005 09:54 PM
тАО12-07-2005 09:54 PM
Re: Strange things in DECNET+
>> halted CPU 0
>> halt code = 2
>> kernel stack not valid halt
>> PC = ffffffff801551a4
>probably because the designers decided that after a kernel stack
>corruption it was to dangerous to perform even a dump. It could
>corrupt the disk if the dump code or parameters got wierd.
Not so, if your console is setup correctly, i.e. AUTO_ACTION is set to RESTART, then VMS will restart for the explicit purpose of taking an appropriate bugcheck. In this case it would have been KRNLSTAKNV.
There are of course situations where even this restricted restart is not possible, and others where the bugcheck code cannot write the dumpfile, but the ability to preserve the evidence after a pathological halt has always been present in VMS.
JT:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-07-2005 10:17 PM
тАО12-07-2005 10:17 PM
Re: Strange things in DECNET+
Anyone bad experiences with auto_action=RESTART ? E.g. automatic reboot failing ?
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-08-2005 02:56 AM
тАО12-08-2005 02:56 AM
Re: Strange things in DECNET+
We had to make all of ours be restart after an issue with memory not having the system create the crash dump.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-08-2005 05:42 AM
тАО12-08-2005 05:42 AM
Re: Strange things in DECNET+
I have no clue as to what happened, but
>> kernel stack not valid halt
should DEFINITELY be a reason to write a dump!
During the CrashDumpAnalysis course prior to last Bootcamp one whole chapter was dedicated to just that kind of dumps.
But, you DO need the dumpfile... :-(
So, basically you now have two problems:
a- What happened to DecNet?
b- WHY is there no dumpfile?
Some help I am, eh?
Sorry.
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-08-2005 06:43 AM
тАО12-08-2005 06:43 AM
Re: Strange things in DECNET+
I was surprised to discover that no dump was done. I don't quite understand why in this case you have to specify "restart" to get the dump. The logic ???
In the mean time, I discovered that the problem began 6 dec at 0:05. I gues a collision between several decnet things happening at the same time (T2T, ncl).
In any case, I will classify this problem as a "very rare bug" and hope I will never see it again. But if I see it, I will crash the system myself.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-08-2005 07:53 AM
тАО12-08-2005 07:53 AM
Re: Strange things in DECNET+
Since the bugcheck code is part of VMS it will not happen UNLESS the CONSOLE takes action to restart VMS for the purposes of taking a restart bugcheck.
AUTO_ACTION is NOT just for BOOT, it comes into play EVERY time an uncontrolled entry to console code occurs. Just about the ONLY thing that constitutes a controlled entry is the end of a shutdown (or a bugcheck), where VMS (and probably Unix) tells the console to expect a halt, and perhaps what other action to take as well (think power off, reboot).
Power on, Kernel stack not valid, double error, halt instruction. All are considered uncontrolled console entries and cause the console to do whatever AUTO_ACTION dictates.
I DID once have a case where RESTART caused a problem, back in the days when V6.1 was current. A problem caused corruption of the system page table, which led to code winding down the stack. The KRNLSTAKNV triggered auto_action restart, the attempt to restart fell over the corrupted SPT, which led to another KRNLSTAKNV restart, which led to...
This problem was a bit of a bitch, it took us three weeks to fix it.
The issues around RESTART are mainly related to whether you want a cluster node to rejoin immediately or not. There are some sites where a failed machine is left failed until the next 'reboot opportunity', whenever that was. Turning off RESTART causes loss of the dump if a pathological halt occurs. For such a situation, a better solution may be to always stop at SYSBOOT and wait for a continue command.
JT: