Operating System - HP-UX
1834395 Members
2079 Online
110066 Solutions
New Discussion

Network errors / authentication delays

 
Steve Lewis
Honored Contributor

Network errors / authentication delays

I have a bizarre problem that I cannot diagnose. The database (informix) is starting to give long delays on authenticating users, say 10-30 seconds. This includes local users connecting through shared memory, not just tcpip, although remote users are also affected.
It happens on several machines and none of them have been changed themselves, but there was a network change put in - a vlan was defined for one server and port spanning was enabled for a linux box to snort traffic. The network people are telling me that they are not snooping, or pen testing and are not creating packets.

netstat -s has started flagging huge error numbers in certain places, compared with other machines on the subnet.

Logging on to the servers using telnet and ssh is fine, no delays there.

At the same time, some UNIX client-server processes on the boxes have started to go wrong. These communicate through UNIX sockets (ie files) and do not bind. These processes are getting errnos 2 (no such file) and 235 ( socket is not connected).

If anyone can help with this output below and give me some help in diagnosing or drilling down then I would be very grateful.

tcp:
875535732 packets sent
2509258801 data packets (3425152922 bytes)
1990723 data packets (984694790 bytes) retransmitted
2684954798 ack-only packets (70404651 delayed)
11497 URG only packets
166545 window probe packets
22145002 window update packets
37393177 control packets
674107630 packets received
2620141777 acks (for 3446326145 bytes)
10730561 duplicate acks
0 acks for unsent data
1905166156 packets (1476689041 bytes) received in-sequence
88 completely duplicate packets (33565 bytes)
35906 packets with some dup, data (45425434 bytes duped)
460389 out of order packets (434148997 bytes)
468 packets (2232675199 bytes) of data after window
36101 window probes
16612861 window update packets
2682 packets received after close
786 segments discarded for bad checksum
1 bad TCP segment dropped due to state change
13594352 connection requests
8573855 connection accepts
22168207 connections established (including accepts)
22675633 connections closed (including 507795 drops)
500444 embryonic connections dropped
2591777834 segments updated rtt (of 2591777834 attempts)
597693 retransmit timeouts
47747 connections dropped by rexmit timeout
166545 persist timeouts
336856 keepalive timeouts
317643 keepalive probes sent
822 connections dropped by keepalive
0 connect requests dropped due to full queue
1508750 connect requests dropped due to no listener
udp:
0 incomplete headers
0 bad checksums
0 socket overflows
ip:
1083471845 total packets received
0 bad IP headers
2023613 fragments received
0 fragments dropped (dup or out of space)
10039 fragments dropped after timeout
0 packets forwarded
0 packets not forwardable
icmp:
656461 calls to generate an ICMP error message
62 ICMP messages dropped
Output histogram:
echo reply: 201606
destination unreachable: 444759
source quench: 0
routing redirect: 0
echo: 0
time exceeded: 10034
parameter problem: 0
time stamp: 0
time stamp reply: 0
address mask request: 0
address mask reply: 0
0 bad ICMP messages
Input histogram:
echo reply: 92536
destination unreachable: 109208
source quench: 72
routing redirect: 0
echo: 201606
time exceeded: 331
parameter problem: 0
time stamp request: 2
time stamp reply: 0
address mask request: 2
address mask reply: 0
201606 responses sent
igmp:
148286 messages received
0 messages received with too few bytes
0 messages received with bad checksum
0 membership queries received
0 membership queries received with incorrect fields(s)
88102 membership reports received
0 membership reports received with incorrect field(s)
88091 membership reports received for groups to which this host belongs
60229 membership reports sent



7 REPLIES 7
Steve Lewis
Honored Contributor

Re: Network errors / authentication delays

I should mention that IBM informix support are stumped on this. The database server has the symptom of the miscellaneos (MSC) vp being active for up to 30 seconds during the period of initiating a connection. Normally you hardly ever see it. This vp specifically handles o/s calls which require a large stack. Since the user has no database connection at that point in time, I cannot get any database diagnostics out.
Steven E. Protter
Exalted Contributor

Re: Network errors / authentication delays

Shalom,

Appears to be trouble in the networking environment.

Has someone disabled ping in all or part of this servers network?

It also could be a problem with any physical part of the network including the network card.

On the software side, since its mostly informix, perhaps a disk error or other event has corrupted part of the software.

Is there a way to relink the binaries in Informix like in Oracle?

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
RAC_1
Honored Contributor

Re: Network errors / authentication delays

Many things to check.

1. any delays reaching dns servers?? (If you use them for name resolution)
2. Network card settings on system and switch/port side.-duplex setting. Are thay matching?? What about speed settings?? hard coded or autoneg??
3. Any other specific message in syslog.log, dmesg, informix logs??
4. Is server experiencing any kind of bottleneck??-cpu, network, memorym disk?? - Check with glance.
5. Is delay specific to informix operations?? Or anything else?? When you telnet,bdf or other commands, do you experience delays??
There is no substitute to HARDWORK
Bill Hassell
Honored Contributor

Re: Network errors / authentication delays

THe 10-30 second delays and the stats above that show unreachable destinations strongly suggest that your DNS servers are not working. Now they may be fine but for this HP-UX box, the first DNS server is not responding. Verify this with nslookup in both directions as in:

nslookup some_cpu
nslookup some_IP_addr

HP-UX security requires reverse DNS lookup. If the primary server has been changed, then 20-30 second delays for mostr network activities will be normal. Temporarily fix the problem by putting a working DNS entry first in /etc/resolv.conf. Then use nslookup to test the other DNS servers as in:

nslookup some_cpu dns_server2
nslookup some_IP_addr dns_server2

DNS is a critical resource for networking and must be equal or more reliable than the servers that use it. Note that the DNS servers might be working but the vlan configuration is blocking access from the HP-UX box.


Bill Hassell, sysadmin
Steve Lewis
Honored Contributor

Re: Network errors / authentication delays

ping works fine on all 3 networks (100, 1000-base-SX and token ring).

The cards are fine and so are the settings - remember that this started happening on 4 servers at the same time. Similarly on the database side, highly unlikely to affect 4 out of 6 servers.
You cannot re-link the binaries with informix.

It mysteriously stopped happening last night and I am not aware of any server changes. I spoke to the network people and they didn't make any changes last night.

As for DNS, nslookup for both IP and hostname comes back straight away, as does nsquery. We did some investigation on the local DNS server this morning and discovered that its lancard was set to dns lookup to itself which looked suspicious, so we set it to the main dns server for the company and re-started the service just in case.
The network settings are all fine. There are no login delays in the o/s which makes me think that maybe DNS isn't the issue.
There are no messages in the database or server syslogs.
There are no server bottlenecks, it just mysteriously cropped up every few minutes. most commands are fine, apart from some client-server processes which time out when trying to connect via a UNIX domain socket. Our nfiles/ maxfiles usage is quite low at just 25% of maximum.

I now suspect that there may have been an issue with an ODBC or JDBC driver and will start questioning the PC software people instead. They are the ones who complained first, so it could be something they did which broke it.



Jeff Dukes
New Member

Re: Network errors / authentication delays

I am on the phone with Informix as I type this. We are experiencing a very similar problem. I was wondering if you ever were able to resolve the issue. Both HP and Informix are struggling to come up with a solution. Any help would be greatly appreciated. Thanks.
Steve Lewis
Honored Contributor

Re: Network errors / authentication delays

Hi, I just noticed your post to my old problem.

Basically I had to eat my words. It WAS DNS! - probably.

Although nslookup worked to both hostname and IP in most cases, the PC/Windows admin people in our company had changed a whole bunch of IP addresses and in particular the IP of a webserver that connected to the instance had a different IP in the DNS server.
The problem mysteriously went away after a couple of weeks, when they had fixed their windows servers and added them to DNS both ways round.