- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Sudden server disconnects; network debugging strat...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-07-2002 12:39 AM
тАО08-07-2002 12:39 AM
Sudden server disconnects; network debugging strategies requested
I'm fully aware that it is almost impossible to expect some visitor to this forum who doesn't know our network infrastructure, network components, application interfaces etc. to aid in tracing the reason for the problems I'm faced with.
Nevertheless, maybe you can give me some general strategies or recipes to follow.
The symptoms are that clients who connect to one of our clustered DBMS are seemingly arbitrarily disconnected/kicked out.
The database affected (which is an Oracle instance) runs as a cluster (MC/SG) package which binds to the NIC lan2:6.
When I run lanadmin on ppa lan2 I can see no inbound or outbound errors, drops, collisions, or other packet discards.
The network guy from the client side says that his network components are working well, but that on sent icmp packets to the server (i.e. the IP of the package that disconnects) he receives a "source quench" which he says is prove to him that definitely the server is the
cause.
Not knowing the network lingo I looked up in an internet dictionary what commonly is understood by "source quench".
There I read that it simply is a request from the receiving side to the sender to send the packets at a lower pace (which to me implies too heavy load on the receiver). It also read that routers are not obliged to act on "source quench" requests.
Hm, I'm not able to discover any trouble with the NIC.
Apart from lanadmin queries that to me revealed no malfunctioning a mere
"netstat -I lan2:6 -in"
reports 332008089 inbound and 326117719 outbound packets (since bringing the NIC up?).
This is an outbound to inbound ratio of some 98%.
I'm not sure if this ratio is meaningful at all.
I only realized that unfortunately the HP-UX netstat had no extra columns to account for errors and collisions like the versions from Solaris or Linux do.
I rather suspect the servers from the application side that are spawned through inetd to be the culprit.
Unfortunately I have no access means (logwise, debugging mode etc.) to the application to seek evidence because I've never been provided with details about the working of these servers by the customer who introduced this application.
All I can see are the establishments of connections in the syslog.log because inetd was started with "-l" flag.
Because the ports for these services were registered in /etc/services I know them and can grep for them on a casual
"netstat -anf inet",
which at the moment gives me some 45 established sockets.
But how can I find out when and why a disconnection occurs?
Unfortunately inetd only logs new connections in syslog.log but not when a connection suddenly severs.
To get a better overview I installed some freeware network tools on the box (e.g. lsof, libpcap, nmap, ntop, tcpdump).
Unfortunately I have little experience in using these tools efficiently.
Can someone give me some hints how to locate the source of the sudden disconnects.
Many thanks for your patience
Ralph
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-07-2002 12:54 AM
тАО08-07-2002 12:54 AM
Re: Sudden server disconnects; network debugging strategies requested
The source quench problem can be easily sorted out;
From the command line:
ndd -set /dev/ip ip_send_source_quench 0
OR
Put:
TRANSPORT_NAME[0]=ip
NDD_NAME[0]=ip_send_source_quench
NDD_VALUE[0]=0
in your /etc/rc.config.d/nddconf to set it on startup.
Heres some more info on source quench;
Document ID : DCE19981119001
Problem Description
Are these Source Quench Messages something that I need to worry about?
Solution
This problem has been identified and is addressed in SR 5003435396. This
problem will be fixed in the 11.01 version of the HP-UX operating
system. These messages can be safely ignored as they have absolutely no
impact on the operating system (performance or otherwise). Alternatively
these messages
can be prevented by disabling source quench. For more information see
the sections below.
What is causing these messages?
At 11.0 the Streams Xport layer now passes the ICMP echo request to any
other process that has a socket open and bound to raw IP. The rpcd
rpcd/dced deamon opens a raw socket to listen to ICMP messages. This raw
socket is open by icmp_monitor routine of rpcd. The main function of
this routine is to check for error messages from dce servers registered
in endpoint database of the host and it checks the socket every 5
minutes. It does not respond to or use the ICMP echo requests However
the socket queue becomes filled during the 5 minute delay causing the
source quench message. The fix being implemented in 11.01 will be to
increase the buffer size to 128k and shorten the wait interval from 5
minutes to 2 minutes thereby flushing the queue of these unwanted
messages before the queue becomes filled.
Why is it safe to ignore these messages or to turn them off?
A good disscussion of this is in TCPIP Illustrated Volume 1 (by Richard
Stevens) page 160-162
Here is a Clip from page 161
"Although RFC 1009 [Braden and Postal 1987] requires a router to
generate source quenches when it runs out of buffers, the new router
Requirements RFC [Almquist 1993] changes this and says that a router
must not originate source quench errors. The current feeling is to
deprecate the source quench error, since it consumes network bandwidth
and is an ineffective and unfair fix for congestion."
Also see RFC 1812 section 4.3.3.3 Source Quench (this is good discussion)
As for other reasons for network disconnects check this out; we get this type of problem more than source quench problems.
http://searchnetworking.techtarget.com/tip/1,289483,sid7_gci802539,00.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-07-2002 01:37 AM
тАО08-07-2002 01:37 AM
Re: Sudden server disconnects; network debugging strategies requested
Why not start with nettl facility
#netfmt -t 50 -f /var/adm/nettl.LOG00 > /tmp/nettl
then check /tmp/nettl for error message,
any duplicate IP ?
also check syslog , does any error from Service Guard ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-07-2002 02:30 AM
тАО08-07-2002 02:30 AM
Re: Sudden server disconnects; network debugging strategies requested
First I would look around for core files - if you find any use the 'file' command on them to work out if they're from your suspect process.
Secondly, grab a copy of 'tusc' and see if you can get a system call trace of one of these processes during a connection failure (easier said that done I know).
Regards,
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-07-2002 06:24 AM
тАО08-07-2002 06:24 AM
Re: Sudden server disconnects; network debugging strategies requested
Assuming your network guys has a Cisco router have him run an extended ping and sweep range of sizes. This will cause it to send a long series of pings from the
minimum size up to the maximum that Cisco supports. If he gets random failures then you might want to look at your NIC. Ours turned out to be sensitive to electromagnetic interference.
If you pass the extended ping with sweep range of sizes then it's not a network problem.
When looking at the tcpdump output search for " R " (R with a space in front and in back) which indicates a reset was sent. I expect you will see one when a connection drops unexpectedly. If you don't see any I would expect that the application hung up properly and then investigate why the application decided to say bye bye.
You might also look at netstat -a right after a drop and see what state the connection is in.
Ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-07-2002 06:58 AM
тАО08-07-2002 06:58 AM
Re: Sudden server disconnects; network debugging strategies requested
I tend to agree on the source quenche de-activation strategy first , I have very good reason to do so because I am well aware of several problems at HP customers with this feature of 11.x , in normal circumstances it wouldn't hinder the machine but in some extreem cases (high load machines) I noticed that the source quenching became disruptive , meaning it slowed communications down .. what you could have here is totally normal performance of the networking being stepped down due to the ICMP and the end machine giving up on the connection because it times out after several negative replies or drops on packets ..
first take that step and if the problem persist it request that we see where the connection gets broken client or server side , this will generally mean tracing , first do make a check on the systelog for any 'connection reset by peer' messages it could still point to an end client issue , if non are visible start tracing the problem with whatever tool at your disposition but try to limit the tracing as it can grow huge , I hope you can easily reproduce the problem and you don't need to transfer 300mb of data before it occurs , I'de go for a PC with netmon or something like that actively scanning the network untill a user yells "disconnected" stopping the tracing at that point , filtering out only that traffic and look at the last packet sequence you would then know who break the communication and why e.g. bad packets , no reply , out of time , retransmission failure .. all are possible
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-08-2002 12:13 AM
тАО08-08-2002 12:13 AM
Re: Sudden server disconnects; network debugging strategies requested
many thanks for your valuable suggestions.
Unfortunately I couldn't give any feedback yesterday, because the ITRC webserver only gave me a chance to assign points, and afterwards didn't serve my request for the reply form any more.
I think most of you reside in the USA.
So I wonder if you have similar trouble with your ITRC access.
I, here from Berlin, Germany continously have trouble to access ITRC after abt. 12:00 CET, although I'm coming over the European httpd dispatcher.
Now back to our network problem.
Parallel I had a call yesterday to the HP Support centre in Ratingen, Germany.
This was after I had read the very informative reply from Stefan where he suggested to disable the creation of source quenchs of the network driver through ndd.
I also mentioned your suggestion to the supporter, but he wasn't too convinced to disable SQs altogether.
He rather suggested to me to install subsystem patch PHSS_21614, that is said to increase the buffer size to 128 KB and thus reduce the churning out of SQs considerably.
Then he also gave me some hints what to perform to test the stability of the network connection.
Since, as I wrote, these servers are started by inetd, he also told me that there was an undocumented switch "-b" for the inetd which sets it into debugging mode and lets it log more verbosely into syslog.log.
So when a client encounters a disconnect next time I will restart inetd in this mode.
Stefan, the URL you supplied is great,
and I carefully read what was written there about causes for duplex and speed mismatches.
This in mind I checked the NIC settings of the server against the port settings of the switch where the server is plugged in.
Both were set to autonegotiation, full duplex 100 Mbps.