Operating System - HP-UX
1834434 Members
2373 Online
110067 Solutions
New Discussion

diald process taking up 100% CPU

 
SOLVED
Go to solution
Gilbert Eu_1
Occasional Advisor

diald process taking up 100% CPU

I am wondering why the diald process is taking up 100% CPU? Any ideas?

CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND
7 ? 1462 root 241 20 44752K 72K run 4347:09 100.07 99.89 diald

I am running an old N4000 Server with 8 CPU's.
12 REPLIES 12
Alex Lavrov.
Honored Contributor

Re: diald process taking up 100% CPU

Try and tusc the proccess:

tusc -fp 1462

(http://hpux.connect.org.uk/hppd/hpux/Sysadmin/tusc-7.7/)


Did you try to restart it?

Alex.
I don't give a damn for a man that can only spell a word one way. (M. Twain)
Gilbert Eu_1
Occasional Advisor

Re: diald process taking up 100% CPU

No, I did not try to restart the process. I know restarting it would solve the problem but, why does this happen?

By the way, what does tusc does?
Alex Lavrov.
Honored Contributor

Re: diald process taking up 100% CPU

Tusc will show you what the process is doing right now. You can attach the output here and we'll try to figure out what is going on there.

From looking on the line from top, I doubt that anyone can tell you something.
I don't give a damn for a man that can only spell a word one way. (M. Twain)
Gilbert Eu_1
Occasional Advisor

Re: diald process taking up 100% CPU

I've downloaded the file and seem to be having problem unzipping the file. It doesn't unzip as a depot file?
Alex Lavrov.
Honored Contributor

Re: diald process taking up 100% CPU

gunzip xxx.depot.gz
swinstall -s /path_to/xxx.depot.gz \* @
I don't give a damn for a man that can only spell a word one way. (M. Twain)
Gilbert Eu_1
Occasional Advisor

Re: diald process taking up 100% CPU

[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0

This line keeps repeating itself.
Alex Lavrov.
Honored Contributor

Re: diald process taking up 100% CPU

Can you attache the whole output? Because in order to know what it tries to read, with read() syscall, I'll need to know what is file descriptor 8. It should be assigned with open() syscall earlier.

Alex.
I don't give a damn for a man that can only spell a word one way. (M. Twain)
Gilbert Eu_1
Occasional Advisor

Re: diald process taking up 100% CPU

( Attached to process 1462 ("/opt/sanmgr/hostagent/sbin/diald") [32-bit] )
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... [running]
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0
[1462] read(8, 0x42b922e0, 8192) ................................................................................................................... = 0


I output it to a file and it reached 12 MB is mere 2 seconds. I had to interrupt it from running any further.
Alex Lavrov.
Honored Contributor
Solution

Re: diald process taking up 100% CPU

It seems that it tries to read from some files, but it fails ... Probably file does not exist or no permissions or the file is corupted.

I'm not sure if it's possible to find what file is it only be the file descriptor.

You can kill diald and start it like this:

tusc -fn > /tmp/tusc.out

And then look for "open" syscalls and see what files are opened and what FD they have. Maybe this will help to find what brings the process to the infinite loop.

Alex.
I don't give a damn for a man that can only spell a word one way. (M. Twain)
Gilbert Eu_1
Occasional Advisor

Re: diald process taking up 100% CPU

I did what you suggested and restarted the process. I looked through the output and did not find anything unusual.

Anyway, thanks for your helpful information. I've also attached the output.
Alex Lavrov.
Honored Contributor

Re: diald process taking up 100% CPU

Well, because obviously it's some bug, so at the start everything seems OK. You can leave it this way and when it will hang again and consume 100% CPU, you'll be able to know what file it's trying to read.

Ofcourse pay attention so the log file will not fill out your filesystem. And if it happens not long after you start it, you can give it a shot, if it happens after days, weeks etc, than I think it's not worth it and just contact the program authors.
I don't give a damn for a man that can only spell a word one way. (M. Twain)
TwoProc
Honored Contributor

Re: diald process taking up 100% CPU

I've seen this very same behavior before. Just kill it and restart. In calling HP on it, they gave the above advice, and said (way back then) that a bugfix was coming out for it. It's been a long time since it has recurred, so I figure that our regular patch schedule must have caught it with a fix. I used to see it pretty much monthly, but we've not seen it do this for a over a year now.
We are the people our parents warned us about --Jimmy Buffett