1834798 Members
2611 Online
110070 Solutions
New Discussion

Re: L2000 and RF

 
Michael Elleby III_1
Trusted Contributor

L2000 and RF

Hello-

I have been working on an issue for the last couple of days and I am looking for some suggestions.

I have 100-200 RF guns on a wireless network, that connect to an LClass running an Oracle application.

The problem that is occurring is that the telnet sessions that the guns are establishing keep getting dropped.

I have reviewed and changed inetd.conf (TCP_DELAY), I have made changes to tcp keepalive parameters, trying to reduce timeout periods, to no avail.

Any ideas?

Thanx..







and in my hands, I'm holding a magic bunny...he he
Knowledge Is Power
14 REPLIES 14
Steven E. Protter
Exalted Contributor

Re: L2000 and RF

This suddenly just started happening?

What changed on the system?

What changed on the network?

We had some telnet issues from a wan we resolved by making a local DNS server. That made up for some bad ios and a freaky power supply on the wan router.

telnet sessons talking to oracle? Say it aint so.

What is an RF gun btw? Like those scanners at home depot?

I'm obviouly reaching, but you've gone over all the log files including the oracle alert logs and found nothing helpful?

look at listener.ora and tnsnames.ora and use the tnsping fucntion from outside the box(hopefully the same place as these gun thingies) and see if connectivity is solid.

Could be oracle dumping the connection you know.

Look for dump files in the ORACLE_HOME directory.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Michael Elleby III_1
Trusted Contributor

Re: L2000 and RF

Don't worry, it gets worse:

- Only changes on the system as of the last week, 1. Upgrade to application as of last week on Tuesday, problem started occurring on Thursday.

- A firewall was added between the Server and the gateway to the wireless network (I can't even do a traceroute to it., but curiously, I can traceroute from my server to the Access point for the guns.

- When a gun gets logged in, it automatically launches an executable the connects it to the Oracle DB

- I haven't looked at Oracle logs as of yet.

Been trying to get the Connectivity folks to put a sniffer trace between the guns and the UNIX box to see what may be timing out the telnet session. Personally, I think it's the firewall.. Will look for dump files however..

Mike-
Knowledge Is Power
Steven E. Protter
Exalted Contributor

Re: L2000 and RF

Have the firewall admin people shut the firewall down for a 15 minutes test.

That will give you more ammunition to aim at them.

I would look at the application as a secondary suspect, just because it was recently messed with.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Michael Tully
Honored Contributor

Re: L2000 and RF

This firewall will be the culprit. I would find it extremely difficult to believe that the oracle application was responsible.

What breaks things like these ... changes of course. As suggested by SEP get them to drop the firewall and see what happens.

I gather the usual finger has been fun ...
Anyone for a Mutiny ?
Michael Elleby III_1
Trusted Contributor

Re: L2000 and RF

As this problem is ongoing, I've been able to run nettl and get some info, but can't find any info on an error message I receive in the trace file:


Ei 192.180.71.2.60754 > 192.180.71.1.5300: [DF] PA 65beeed3:65beef2f(5c) ack: d3cd5ff9 win: 8000 hacl-hb
Ei 192.180.71.1.5300 > 192.180.71.2.60754: [DF] A ack: 65beef2f win: 8000
Eb arp request for: 148.128.27.253 from: 172.22.23.103
Eb 172.22.23.117.138 > 172.22.31.255.138: udp c9 netbios_dgm
Eb Unknown ETHER Type: 0x886d
Eb Unknown ETHER Type: 0x886d
Eb Unknown ETHER Type: 0x886d
Eb Unknown ETHER Type: 0x886d
8Si Unknown ETHER Type: 0x167e
8Si Unknown ETHER Type: 0x167e
Eb Unknown ETHER Type: 0x886d
Eb Unknown ETHER Type: 0x886d
Ei 192.180.71.1.5300 > 192.180.71.2.60754: [DF] PA d3cd5ff9:d3cd6055(5c) ack: 65beef2f win: 8000 hacl-hb
Ei 192.180.71.2.60754 > 192.180.71.1.5300: [DF] PA 65beef2f:65beef8b(5c) ack: d3cd6055 win: 8000 hacl-hb
Eb arp request for: 172.22.17.67

Any takers?

Thanx in advance-

Mike-
Knowledge Is Power
Ron Kinner
Honored Contributor

Re: L2000 and RF

Not errors just packets that your box doesn't understand
0x886d = Intel Adapter Fault Tolerance heartbeats

0x167e = Service Guard

Any reason why you can't run tcpdump on the server to see what's going on?

TCP/IP doesn't really care whether you send packets or not. HPUX's keepalive usually only fires up after 2 hours of inactivity.

How often do the sessions drop and are they inactive at the time?

Often the telnet server has an idle timeout in it but since this is a recent change I would expect it's the firewall. Lots of them feel it is their right to terminate any connection which has stayed on too long regardless of whether there is activity on the circuit or not.

Ron

Michael Elleby III_1
Trusted Contributor

Re: L2000 and RF

Ron-

Thanx for the info...

It is difficult to make the firewall people understand if it is causing a problem with telnet traffic, because all they say is that traffic is flowing through without any issues. I asked them to look at the timeout xlate on the firewall (because it is PIX), but haven't gotten any response as of yet..

The guns get logged off intermittently, and most of the time, the guns are logged in, but sitting idle.

Mike-
Knowledge Is Power
Ron Kinner
Honored Contributor

Re: L2000 and RF

Ask them to give you a printout of the PIX's output of
show timeout

I expect you will find the answer right there.

See the Timeout entry at:

http://www.cisco.com/univercd/cc/td/doc/product/iaabu/pix/pix_sw/v_63/cmdref/tz.htm#1026093

Ron
Michael Elleby III_1
Trusted Contributor

Re: L2000 and RF

Tell me what you think of the reply I received from the Firewall admin:

Michael,
That information is real time, so it's not possible. However, to explain better of the timeout issue:
When the gun makes a request to connect to the server, the firewall needs to create a translation slot (this involves IP address translation and connection state setup) for this specific connection in order to pass traffic through. If during the 1 hour period where the firewall sees no activities between the gun and the server, it will timeout that connection to free up resources for new connection requests.

Seems like a smokescreen... but then I'm not sure..

Mike
Knowledge Is Power
Ron Kinner
Honored Contributor

Re: L2000 and RF

Bogus about the realtime stuff. It's just a static config show command. Example from the URL I gave you:

show timeout
timeout xlate 3:00:00
timeout conn 1:00:00 half-closed 0:10:00 udp 0:02:00 rpc 0:10:00 h323 0:05:00
sip 0:30:00 sip_media 0:02:00
timeout uauth 0:05:00 absolute

I think they are just being paranoid which is a typical mindset for a PIX operator. Hopefully your PIX actually says inactivity where the example says absolute or it will only allow you one hour per connection.

Actually what they go on to say it is that it is their fault and they are not going to fix it. So we have to insure that the gun and the server have some traffic before the hour is up. I expect you can fix this with the tcp_keepalive_interval from HPUX. The default is set to 2 hours. Try setting it down to 30 minutes. (You are using 11.0 or better aren't you?)

tcp_keepalive_interval:

Interval for sending keep-alive probes.

If any activity has occurred on the connection or if there is
any unacknowledged data when the time-out period expires, the
timer is simply restarted. If the remote system has crashed
and rebooted, it will presumably know nothing about this
connection, and it will issue an RST in response to the ACK.
Receipt of the RST will terminate the connection.

If the keepalive packet is not ACK'd by the remote TCP, the normal
retransmission time-out will eventually exceed threshold R2,
and the connection will be terminated.

With this keepalive behavior, a connection can time-out and
terminate without actually receiving an RST from the remote TCP.
[10000, 10*24*3600000] Default: 2 * 3600000 (2 hours)


ndd -set /dev/tcp tcp_keepalive_interval 1800000

IF that works then edit /etc/rc.config.d/nddconf to add:

TRANSPORT_NAME[0]=tcp
NDD_NAME[0]=tcp tcp_keepalive_interval
NDD_VALUE[0]=1800000

so it will stay after a reboot. Use the next higher integer if you already have something in nddconf.

Ron

Michael Elleby III_1
Trusted Contributor

Re: L2000 and RF

Ron-

Actually, I set tcp_keepalive_interval to 5 minutes. It was one of the first things that I did to circumvent the firewall..

Mike-
Knowledge Is Power
Ron Kinner
Honored Contributor

Re: L2000 and RF

Make sure you have the latest ndd patches. Some of the ndd fixes don't really work as advertized without the patches.

Can you get them to change the inactivity timer? The default is 1 hour which is where it is apparently set but there is no reason it could not be set to 8 hours or whatever makes sense for your application.

Ron
Michael Elleby III_1
Trusted Contributor

Re: L2000 and RF

Ron,

I guess it would make sense to ask them to do that, but then, the guns actually drop even after a few minutes.. So I would think that the 1 hour inactivity timeout would be sufficient.

Mike
Knowledge Is Power
Ron Kinner
Honored Contributor

Re: L2000 and RF

"the guns actually drop even after a few minutes"

This is new information which changes everything. Doesn't seem like we can blame the firewall for that. Just for fun, run lanadmin
lan
display

on your HPUX and see if you are getting a lot of errors (on the second page of the display). It's unlikely but we should rule it out. IF you have more than one NIC you will need to do ppa x to change to the correct one.

Check the switch for errors too (both on your connection and the firewall's) and ask the firewall guy to verify that his interfaces are clean and don't show any interface resets or carrier drops or other errors.

If there is anything else on the way back to the WAP then check it too. What sort of box is the WAP? Does it have any management statistics which would indicate problems with either the wired or the wireless side of the world? Does it have any self testing capabilities? Has anyone added a new radio or other wireless device to the area recently? Anything generating interference like a bad electrical motor or a sparkplug using forklift? Could a gun have been dropped and gone rogue (transmitting anytime it felt like and jamming the airways?)

If back on the HPUX you set up a constant ping to a gun does it still drop out? Does the ping show any problems?

Ron