Operating System - HP-UX
1830160 Members
4636 Online
109999 Solutions
New Discussion

RPC program loops using 100% cpu

 

RPC program loops using 100% cpu

We have developed an ONC RPC program that has been running happily for many years on Digital UNIX (OSF) and then Tru64. Recently we have ported it to HP-UX 11 and TI-RPC. Now we have noticed that occasionally it goes into a loop and the CPU usage shoots up to nearly 100%.

When this happens the truss command reveals that the program is calling poll() continuously, I think (although I?m not sure) that all this happens when a client fails to connect for some reason.

Has anyone seen anything like this before? As our code is just calling svc_run() we have very little control over it?s behaviour at the point it is going wrong.

Aside issue is that the version of lsof we have doesn?t tell us anything about the TCP sockets in use by the process because of the transport-independence layer I guess.
It just reports open files of type tcpm->ip which is a bit frustrating. But ultimately fixing lsof isn?t my main problem.
Hello World
13 REPLIES 13
Stephen Keane
Honored Contributor

Re: RPC program loops using 100% cpu

svc_run is called once and never returns. It usually sits there waiting for poll to return.

poll is a bit like select, it monitors file descriptors awaiting certain conditions. One of the arguments for poll is a timeout. If this is too low, then you will use a lot of CPU.

Do you have access to the source code?

Re: RPC program loops using 100% cpu

Yes I have the source code. As you state we're calling svc_run, and it's svc_run that calls poll (i.e. we're not calling it directly). But truss tells me that the timeout argument to poll is -1 which means it's event driven and not timing out as such. This works fine when it works (if you see what I mean). But when it gets in to this 'funny' state poll returns immediately and then runs again - and therefore hammers the CPU.

I can tell your are itching to see the source; here is the relavent section...

if ((nconf = getnetconfigent("tcp")) == NULL)
[ snip error handling code here ]

if ((fd = t_open("/dev/tcp", O_RDWR, (struct t_info *) NULL)) < 0)
[ snip error handling code here ]

if (tli_set_reuseaddr(fd) == -1)
[ snip error handling code here ]

if ((bind = (struct t_bind *)t_alloc(fd, T_BIND, T_ADDR)) == NULL)
[ snip error handling code here ]
bind->qlen = 8;
bind->addr.len = sizeof(srv_addr);
bind->addr.buf = (char *)&srv_addr;

srv_addr.sin_family = AF_INET;
srv_addr.sin_addr.s_addr = INADDR_ANY;
srv_addr.sin_port = RASRPC_PORT;

if ((transp = svc_tli_create(fd, nconf, bind, 0, 0)) == NULL)
[ snip error handling code here ]

if (!svc_reg(transp, RAS_PROG, RAS_VERS, ras_prog_1, nconf))
[ snip error handling code here ]

if (nconf)
freenetconfigent(nconf);

svc_run();
/* NOTREACHED */
exit(-1);


Hello World
A. Clay Stephenson
Acclaimed Contributor

Re: RPC program loops using 100% cpu

I've looked for patches and nothing jumps out at me. I suspect you are going to have to bite the bullet and call poll() directly and let it call svc_getreq_poll(). I don't think any other approach has a prayer of fixing this. As you've already found, once you begun executing svc_run there is no way to probe. I looked for any possible external variables that would affect svc_run()'s behavior but nothing turned up.
If it ain't broke, I can fix that.

Re: RPC program loops using 100% cpu

To help me debug the problem I have been following the example set out in http://www.zip.com.au/~dtucker/rc-debug/ and I'm thinking that something very similar is happening to our RPC server. But again that seems to point to a bug in the HP library code.

As you say maybe we need to re-code to call poll directly - but RPC meant to be easy isn't!?
Hello World
Stephen Keane
Honored Contributor

Re: RPC program loops using 100% cpu

In your truss output, what integer is poll returning and what is nfds set to?

Re: RPC program loops using 100% cpu

Thatâ s a good question. At the moment the Program is behaving itself and truss is returning: -

poll(0x60000000001589a0, 43, -1) [sleeping]

From memory I think last time it went haywire it was returning something like:-

poll(????, 41, -1) = 1
poll(????, 41, -1) = 1
poll(????, 41, -1) = 1
poll(????, 41, -1) = 1
[ad infinitum]

Iâ m sure that I have also seen it returning 6 when looping as above. Next time it hammers the CPU Iâ ll confirm the return
Hello World
rick jones
Honored Contributor

Re: RPC program loops using 100% cpu

when it is looping (definitely get a tusc trace then :) is it calling anything else besides poll()?

I suppose the usual song and dance about being up on the latest RPC and Transport patches would apply
there is no rest for the wicked yet the virtuous have no pillows

Re: RPC program loops using 100% cpu

I have managed to caputre some tusc trace at the poin the program goes wrong. As follows: -

poll(0x60000000001589a0, 48, -1) = 1
ioctl(298, I_XTI_RCV, 0x9ffffffffffff4e0) ERR#5 EIO
ioctl(298, I_PEEK, 0x9ffffffffffff4a0) = 1
ioctl(298, I_PEEK, 0x9ffffffffffff490) = 1
getmsg(298, 0x9ffffffffffff3f0, NULL, 0x9ffffffffffff3e0) = 0
close(298) = 0
poll(0x60000000001589a0, 48, -1) = 1
poll(0x60000000001589a0, 48, -1) = 1
poll(0x60000000001589a0, 48, -1) = 1
poll(0x60000000001589a0, 48, -1) = 1
poll(0x60000000001589a0, 48, -1) = 1
poll(0x60000000001589a0, 48, -1) = 1
poll(0x60000000001589a0, 48, -1) = 1
poll(0x60000000001589a0, 48, -1) = 1
poll(0x60000000001589a0, 48, -1) = 1
poll(0x60000000001589a0, 48, -1) = 1
poll(0x60000000001589a0, 48, -1) = 1


Which seems to be on the shutdown or close of a client. The problem seems to be that the number of open files (sockets) hasn't gone down from 48 to 47 as I'd expect. And as it does when I try to repdrouce the problem on our development systems.


To answer your question. The program still works when it's looping on poll - so yes sometimes it calls other stuff. But it's calling poll so quickly and so often it appears to just be calling poll (and filling up the disk when I leave truss in trace mode).
Hello World
rick jones
Honored Contributor

Re: RPC program loops using 100% cpu

ok, now it is time to see more completely what is comeing back from that poll call - if your disc can take it :) - then add the -v option and tusc should decode the poll return data.
there is no rest for the wicked yet the virtuous have no pillows

Re: RPC program loops using 100% cpu

Adding the -v switch was a good idea! I've captured the following...

poll(0x60000000001589a0, 31, -1) .............................................. = 4
poll[0].fd: 3
poll[0].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[0].revents: 0
poll[1].fd: 256
poll[1].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[1].revents: 0
poll[2].fd: 257
poll[2].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[2].revents: 0
poll[3].fd: 258
poll[3].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[3].revents: 0
poll[4].fd: 259
poll[4].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[4].revents: 0
poll[5].fd: 260
poll[5].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[5].revents: 0
poll[6].fd: 261
poll[6].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[6].revents: 0
poll[7].fd: 262
poll[7].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[7].revents: 0
poll[8].fd: 263
poll[8].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[8].revents: 0
poll[9].fd: 264
poll[9].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[9].revents: 0
poll[10].fd: 265
poll[10].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[10].revents: 0
poll[11].fd: 266
poll[11].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[11].revents: 0
poll[12].fd: 267
poll[12].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[12].revents: 0
poll[13].fd: 268
poll[13].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[13].revents: 0
poll[14].fd: 272
poll[14].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[14].revents: 0
poll[15].fd: 274
poll[15].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[15].revents: 0
poll[16].fd: 275
poll[16].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[16].revents: 0
poll[17].fd: 276
poll[17].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[17].revents: 0
poll[18].fd: 277
poll[18].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[18].revents: 0
poll[19].fd: 278
poll[19].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[19].revents: 0
poll[20].fd: 281
poll[20].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[20].revents: 0
poll[21].fd: 282
poll[21].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[21].revents: 0
poll[22].fd: 286
poll[22].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[22].revents: 0
poll[23].fd: 287
poll[23].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[23].revents: 0
poll[24].fd: 288
poll[24].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[24].revents: POLLNVAL
poll[25].fd: 289
poll[25].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[25].revents: 0
poll[26].fd: 290
poll[26].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[26].revents: POLLNVAL
poll[27].fd: 291
poll[27].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[27].revents: 0
poll[28].fd: 292
poll[28].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[28].revents: POLLNVAL
poll[29].fd: 293
poll[29].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[29].revents: POLLNVAL
poll[30].fd: 294
poll[30].events: POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
poll[30].revents: 0
poll(0x60000000001589a0, 31, -1) .............................................. = 4



So it looks like the server currently has 4 connections returning POLLNVAL.
Hello World
rick jones
Honored Contributor

Re: RPC program loops using 100% cpu

sooo, if this is all happening with svc_run, and if file/socket descriptor management is all contained within the confines of the HP supplied RPC code, it looks as though you now have the ammunition for a proper defect submittal :)

otherwise, (I'm not fully ONC RPC aware) it points at a possible hole in the application where it isn't removing something it should?
there is no rest for the wicked yet the virtuous have no pillows

Re: RPC program loops using 100% cpu

how do I go about sumbiting a defect?
Hello World
rick jones
Honored Contributor

Re: RPC program loops using 100% cpu

You phone or otherwise contact the Response Centre.
there is no rest for the wicked yet the virtuous have no pillows