HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

How to identify process from UDP port

 
Mark Itzcovitz
Occasional Contributor

How to identify process from UDP port

I have an application that occasionally seems to lose the timeout value for a select on a UDP port and then transmits continually. I can identify the port using TCPDUMP, but I haven't found a way to tie that back to the process (there will be many processes at any one time running the application).

 

Defining TCPIP$SOCKET_TRACE system-wide would work, but I'm concerned about the load this would put on a live system. It would have to be defined permanently because defining it once the rogue process is already running doesn't work.

 

Does anyone have any other ideas?

 

Thanks,

 

Mark

6 REPLIES
vman
Frequent Visitor

Re: How to identify process from UDP port

If it's continuous, a monitor process/topdio (or /topbio) should make the process readily apparent, shouldn't it?

Mark Itzcovitz
Occasional Contributor

Re: How to identify process from UDP port

Unfortunately not, so I'm told. The flow is that it sends a message, waits for a reply and then immediately sends another message instead of waiting five minutes. It manages about 20 to 30 round trips per second but this doesn't seem to be enough activity to identify the process.

Hoff
Honored Contributor

Re: How to identify process from UDP port

Given you're apparently working with a large and interconnected and gnarly and poorly-instrumented aggregation of application source code - we probably wouldn't be having this discussion otherwise - and given that local management likely doesn't want to "perturb" this "morass" through "changes" - they'll often use the euphemism "production" for this particular class of "longstanding application fragility"...

 

Monitoring (polling) I/O rates among the processes is the usual approach.  Often TCPIP> SHOW DEVICE, and then lexical functions or system services, or a dedicated runaway process monitor tool.

 

If the application(s) stay connected to the port), then the SDA> and TCPIP> utilities (and associated polling) can usually help spot the culprits.

 

(I'd not expect security alarms to be particularly helpful here, whether for device access or otherwise.)

 

But the usual (best) fix for these cases is to start draining the "gnarly" "morass" with centralized I/O routines and with integrated monitoring and logging capabilities.  This is a longer-term fix, but it does tend to trend toward better stability when architected and implemented correctly.  Management usually likes the results, but they don't often like the length of time that can be involved, the necessity for changes and the associated risk, and related.  The means to this particular end can be financially and politically and schedule-ly unpopular.

 

Mark Itzcovitz
Occasional Contributor

Re: How to identify process from UDP port

Thanks for the response. You're somewhat off-base with " large and interconnected and gnarly and poorly-instrumented aggregation of application source code", although I'll admit to the poorly-instrumented bit. The code causing the problem is a small section that runs in its own thread, that is responsible for renewing a lease on a licence. All it does is send a little message to the licence server (on Windows), receive and check the reply and then sleep for 5 minutes (using pthread_delay_np()). When it goes wrong, it appears to return from the pthread_delay_np() immediately.

 

Although it seems to put very little load on the VMS system when it goes wrong, it causes the licence server on Windows to max out its cpu, which affects other users trying to get a licence.

 

What I was hoping to find from this post was an easy way of tying the UDP port number (found using tcpdump) to the process ID, so I could give our support team at the customer site specific instructions on how to locate the process, whereupon they could take a view as to whether it was safe to kill it. Obviously, I want to solve the underlying problem, but that is secondary at the moment.

Hoff
Honored Contributor

Re: How to identify process from UDP port

Disable and preferably rip out the existing and buggy licensing code.

 

If your management proves obstinant here and won't let you entirely remove the licensing code, then move to LMF.  Or if not, then rip out the existing thread-based code and replace it, and probably with a repeating AST design.  Or with an entirely different design.

 

Threaded code can be subtly non-portable, and threaded code is a common source of bugs.  There is a nice paper from HP Labs describing why library-based process threading including pthreads is problematic and tends to be platform-specific, at best:

 

http://www.hpl.hp.com/personal/Hans_Boehm/misc_slides/pldi05_threads.pdf

http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf

 

And threads specifically on VMS tend to be sensitive to patches and to linker options (eg: upcalls), which are details that you generally don't want to inflict on your customers specifically for licensing code.

 

John Gillings
Honored Contributor

Re: How to identify process from UDP port

Mark,

   I'm not sure how to answer your original question, but can maybe help address your symptom.

 

 > appears to return from the pthread_delay_np() immediately.

 

Without knowing the implementation details of pthread_delay_np, I have to guess, but typically the reason for this kind of behaviour is using a $HIBER/$WAKE delay mechanism where there are stray "$WAKE"s floating around the system. I think there's a known issue with Oracle (or Oracle Rdb?) which can cause this type of symptom, but there are plenty of other possibilities.

 

If you can't find the root cause, a relatively simple way to fix your code or add diagnostics would be to calculate the time you expect to be woken, and do the delay call in a loop with a sanity check against the actual time you're woken. From there you can either repeat the delay or issue some diagnostics.

A crucible of informative mistakes