Networking
cancel
Showing results for 
Search instead for 
Did you mean: 

Re: Upto 50% ping loss when ARP cache grows over ~100 items ( HP-UX 11.31 RX2620-2 )

 
SOLVED
Go to solution
Highlighted
Trusted Contributor

Re: Upto 50% ping loss when ARP cache grows over ~100 items ( HP-UX 11.31 RX2620-2 )

what results do you get for the following commands?

ndd -get /dev/arp arp_cache_report
ndd -get /dev/arp arp_cleanup_interval

what's in your nddconf file?

Highlighted
Advisor

Re: Upto 50% ping loss when ARP cache grows over ~100 items ( HP-UX 11.31 RX2620-2 )

Hi Donna,
Thank you for giving a hand...

# ndd -get /dev/arp arp_cleanup_interval
600000

# /etc/rc.config.d/nddconf
# As per PHNE_43814. Semih BATTAL 2014-11-19
TRANSPORT_NAME[0]=tcp
NDD_NAME[0]=tcp_sack_enable
NDD_VALUE[0]=2
#
TRANSPORT_NAME[1]=ip
NDD_NAME[1]=ip_ire_gw_probe
NDD_VALUE[1]=0

# ndd -get /dev/arp arp_cache_report
ifname proto addr proto mask hardware addr flags
lan0 192.168.007.111 255.255.255.255 ec:1f:72:b7:d3:18
lan0 192.168.007.110 255.255.255.255 dc:85:de:ba:d4:2c
lan0 192.168.007.108 255.255.255.255 dc:85:de:ba:d4:29
lan0 192.168.011.164 255.255.255.255 00:8c:fa:61:c7:5c
lan0 192.168.011.042 255.255.255.255 00:25:86:e3:2f:07
lan0 192.168.011.041 255.255.255.255 00:25:86:e3:1f:66
lan0 192.168.011.110 255.255.255.255 10:bf:48:05:23:21
lan0 192.168.011.045 255.255.255.255 40:b0:34:29:82:5b
lan0 192.168.007.097 255.255.255.255 00:0c:43:ce:42:de
lan0 192.168.008.048 255.255.255.255 00:1e:68:1e:1a:02
lan0 192.168.011.113 255.255.255.255 00:8c:fa:61:ba:55
lan0 192.168.008.251 255.255.255.255 aa:aa:aa:00:cb:65
lan0 192.168.011.191 255.255.255.255 1c:87:2c:42:22:b4
lan0 192.168.007.112 255.255.255.255 c8:d5:fe:f1:ae:4d
lan0 192.168.011.194 255.255.255.255 1c:87:2c:41:ab:a7
lan0 192.168.008.001 255.255.255.255 00:17:08:59:b4:60
lan0 192.168.011.193 255.255.255.255 1c:87:2c:42:1e:7d
lan0 192.168.002.014 255.255.255.255 00:14:38:eb:4b:62 UNRESOLVED
lan0 192.168.010.006 255.255.255.255 00:80:92:6d:62:d2
lan0 192.168.011.070 255.255.255.255 00:1e:8c:df:60:6e
lan0 192.168.002.012 255.255.255.255 00:14:38:eb:4b:62 UNRESOLVED
lan0 192.168.002.013 255.255.255.255 00:14:38:eb:4b:62 UNRESOLVED
lan0 192.168.008.072 255.255.255.255 a0:2b:b8:1f:35:28
lan0 192.168.008.008 255.255.255.255 00:14:38:eb:4b:62 PERM PUBLISH LOCAL
lan0 192.168.011.010 255.255.255.255 c8:60:00:56:e6:4f
lan0 192.168.000.002 255.255.255.255 00:0b:86:6e:cb:54
lan0 192.168.015.010 255.255.255.255 ac:9b:f4:82:69:1c
lan0 192.168.011.014 255.255.255.255 88:51:fb:57:57:38
lan0 192.168.011.012 255.255.255.255 c8:60:00:56:e4:ac
lan0 192.168.011.083 255.255.255.255 00:0f:fe:f3:cb:2d
lan0 192.168.011.082 255.255.255.255 00:0f:fe:f2:73:88
lan0 192.168.011.022 255.255.255.255 00:1e:90:28:79:1c
lan0 192.168.007.088 255.255.255.255 00:0c:43:ce:44:41
lan0 192.168.011.026 255.255.255.255 00:24:d6:3b:43:60
lan0 192.168.008.026 255.255.255.255 00:22:64:2a:30:3c
lan0 192.168.011.024 255.255.255.255 00:16:e6:64:d0:e5
lan0 192.168.011.088 255.255.255.255 00:1e:33:d1:60:ba
lan0 192.168.011.159 255.255.255.255 d8:5d:4c:80:e3:e9
lan0 192.168.007.211 255.255.255.255 00:0c:43:ce:42:bf
lan0 192.168.011.028 255.255.255.255 10:bf:48:04:7b:9d
lan0 224.000.000.000 240.000.000.000 01:00:5e:00:00:00 PERM MAPPING
( Unresolved MAC's are powered down but they are being ping'ed by the server )

Highlighted
Trusted Contributor

Re: Upto 50% ping loss when ARP cache grows over ~100 items ( HP-UX 11.31 RX2620-2 )

do you know why arp_cleanup was changed from 300000 (5 min)? i'm thinking running cleanup every 5 minutes should resolve your issue...

BUT before you make any changes, please do the following:

netstat -s > before.txt
<wait for 10 minutes>
netstat -s > after.txt

please attach "before" and "after" so i can see how your network is performing.

Highlighted
Advisor

Re: Upto 50% ping loss when ARP cache grows over ~100 items ( HP-UX 11.31 RX2620-2 )

Hi Donna,
While I wait for the "after" report...
I've tried a wide range of values for the arp_cleanup_interval, values from as low as 10 seconds right up to 10 minutes...
They did not make the problem any better or any worse.
The title of this thread is a bit misleading isn't it, as I said earlier I am probably wrong in making this association.
And, as said earlier, "ifconfig" is doing things besides clearing the ARP cache..
The size of the ARP cache seems to be indicative of the time period when things start to go wrong ( after the ifconfig command clears the ARP cache )
before.txt and after.txt are attched...
No, they are NOT... Only jpg, bmp etc. are accepted.
Should I cheat by changing the extension? Or include them ihere n the text?

 

Highlighted
Advisor

Re: Upto 50% ping loss when ARP cache grows over ~100 items ( HP-UX 11.31 RX2620-2 )

Hi again Donna,
See below how bad the problem is when the "ifconfig" script is not running...
Mon Jul 10 22:16:27 GMT-3 2017 : 400 packets transmitted, 335 received, 16% packet loss, time 6462ms
Mon Jul 10 22:16:46 GMT-3 2017 : 400 packets transmitted, 351 received, 12% packet loss, time 6612ms
Mon Jul 10 22:18:22 GMT-3 2017 : 400 packets transmitted, 397 received, 0% packet loss, time 6626ms
Mon Jul 10 22:19:41 GMT-3 2017 : 400 packets transmitted, 357 received, 10% packet loss, time 7181ms
Mon Jul 10 22:21:09 GMT-3 2017 : 400 packets transmitted, 395 received, 1% packet loss, time 6782ms
Mon Jul 10 22:23:05 GMT-3 2017 : 400 packets transmitted, 181 received, 54% packet loss, time 6364ms
Mon Jul 10 22:32:44 GMT-3 2017 : 400 packets transmitted, 291 received, 27% packet loss, time 6882ms
Mon Jul 10 22:36:16 GMT-3 2017 : 400 packets transmitted, 375 received, 6% packet loss, time 6746ms
Mon Jul 10 22:43:02 GMT-3 2017 : 400 packets transmitted, 303 received, 24% packet loss, time 6565ms
Mon Jul 10 22:43:22 GMT-3 2017 : 400 packets transmitted, 350 received, 12% packet loss, time 6876ms
Mon Jul 10 22:47:51 GMT-3 2017 : 400 packets transmitted, 367 received, 8% packet loss, time 6534ms
Mon Jul 10 22:48:01 GMT-3 2017 : 400 packets transmitted, 309 received, 22% packet loss, time 6757ms
Mon Jul 10 22:53:29 GMT-3 2017 : 400 packets transmitted, 374 received, 6% packet loss, time 6692ms
Mon Jul 10 22:56:05 GMT-3 2017 : 400 packets transmitted, 377 received, 5% packet loss, time 6846ms
Mon Jul 10 22:59:28 GMT-3 2017 : 400 packets transmitted, 227 received, 43% packet loss, time 6811ms
Mon Jul 10 23:00:56 GMT-3 2017 : 400 packets transmitted, 399 received, 0% packet loss, time 6683ms
Mon Jul 10 23:01:45 GMT-3 2017 : 400 packets transmitted, 372 received, 7% packet loss, time 6771ms
Mon Jul 10 23:02:05 GMT-3 2017 : 400 packets transmitted, 390 received, 2% packet loss, time 6951ms
Mon Jul 10 23:06:36 GMT-3 2017 : 400 packets transmitted, 387 received, 3% packet loss, time 6949ms
Mon Jul 10 23:07:53 GMT-3 2017 : 400 packets transmitted, 389 received, 2% packet loss, time 6602ms
Mon Jul 10 23:09:01 GMT-3 2017 : 400 packets transmitted, 119 received, 70% packet loss, time 6808ms
Mon Jul 10 23:09:11 GMT-3 2017 : 400 packets transmitted, 123 received, 69% packet loss, time 6593ms
Mon Jul 10 23:11:26 GMT-3 2017 : 400 packets transmitted, 285 received, 28% packet loss, time 6892ms
Mon Jul 10 23:11:36 GMT-3 2017 : 400 packets transmitted, 389 received, 2% packet loss, time 6777ms
Mon Jul 10 23:20:26 GMT-3 2017 : 400 packets transmitted, 287 received, 28% packet loss, time 6541ms
Mon Jul 10 23:21:53 GMT-3 2017 : 400 packets transmitted, 394 received, 1% packet loss, time 6738ms
Mon Jul 10 23:26:35 GMT-3 2017 : 400 packets transmitted, 373 received, 6% packet loss, time 6876ms
Mon Jul 10 23:27:42 GMT-3 2017 : 400 packets transmitted, 356 received, 11% packet loss, time 6357ms
Mon Jul 10 23:31:06 GMT-3 2017 : 400 packets transmitted, 359 received, 10% packet loss, time 6857ms
Mon Jul 10 23:31:15 GMT-3 2017 : 400 packets transmitted, 210 received, 47% packet loss, time 6883ms
Mon Jul 10 23:32:23 GMT-3 2017 : 400 packets transmitted, 390 received, 2% packet loss, time 6376ms

Highlighted
Advisor

Re: Upto 50% ping loss when ARP cache grows over ~100 items ( HP-UX 11.31 RX2620-2 )

I cheated... The file extensions should be changed to "txt"...

Highlighted
Trusted Contributor

Re: Upto 50% ping loss when ARP cache grows over ~100 items ( HP-UX 11.31 RX2620-2 )

this is strange.....  the following reflect the differences between your before and after file, where the elapsed time is ~10 minutes:

tcp:
33372 packets sent
36924 packets received

however

icmp:
25174 calls to generate an ICMP error message
0 ICMP messages dropped
Output histogram:
echo reply: 25131

you've got nearly as many pings as you have with all other network activity! pinging too much is not better. (in the old days there was such a thing as 'the ping of death' (large packets with large counts)...) maybe you can dial down your paranoia level and only ping (say) every 10-15 minutes with 10 bytes for 10 iterations?

that's the only thing i'm seeing that may be an immediately addressable issue.  i suspect there's a larger issue with your lan.  i suspect too that if you were to actually open a call with the RC you'd get a better answer.

Highlighted
Advisor
Solution

Re: Upto 50% ping loss when ARP cache grows over ~100 items ( HP-UX 11.31 RX2620-2 )

Hi Donna,
Sorry it was my fault...
I had forgotten to stop the "flood-ping" script before running the netstat commands.
I've done the same test again and there were only 37 pings in 27186 sent tcp packets in the 10 minute period.
I only started to run the "flood-ping" script when we started to encounter broken connections due to lost packets.
And, I also have to run the "ifconfig" script simultaneously, to periodically "reset" the network stack?, clear the ARP cache?
This is the only way that I can make the server "usable"...
Actually, these two scripts have been running continously since I booted the system on 16th May 2017.
The server was shutdown for a few hours due to UPS maintenance.
I remember installing the March 2017 QPKBASE and QPKAPPS bundles before shutting down the system.
Should I uninstall both bundles and see what happens?

Today, after everyone went home, I stopped our "global systems daemon" program.
This program checks about 180 "must-be-alive" systems/services such as our SMTP/POP3/HTTP/Oracle/Etc. services as well as IOT nodes, some sensor values,, security cameras/recording equipment, personnel attendance readers and so on.
With this program stopped, the ARP cache grows very slowly, in fact it took 4 hours to reach 100 MAC's.
And, to my surprise there was not a single packet loss in this 4 hour period.
As soon as the ARP count exceeded ~100, some packet loss started.
This problem is starting look like it is directly related to the ARP cache size...