- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- diald taking 100% CPU
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-31-2008 01:34 AM
тАО03-31-2008 01:34 AM
diald taking 100% CPU
I've seen quite a few of this occurrences in some previous threads, and got some useful hints on the subject but would like some support about the information I retrieved while debugging the issue.
I have Itanium based (HP-UX 11.23) systems with an EMC Clariion connected.
diald on one of the units (we have 6 of them exactly the same in terms of SW/HW) that after being quiet for a few hours starts ramping up the CPU utilization.
I installed tusc 7.9 and restarted diald. Initially everything was ok but this morning the trace file was humongous. It was stuffed with:
[./diald ][8185] open("/dev/fcd1", O_RDONLY, 0) .............................................. = 9
[./diald ][8185] ioctl(9, 0xc00c4601, 0x7fff9550) ............................................ = 0
[./diald ][8185] ioctl(9, 0xc00c4604, 0x7fff9560) ............................................ = 0
[./diald ][8185] close(9) .................................................................... = 0
[./diald ][8185] ioctl(9, SIOC_GET_PLUN, 0x7fff9540) ......................................... ERR#9 EBADF
[./diald ][8185] close(9) .................................................................... ERR#9 EBADF
[./diald ][8185] open("/dev/fcd1", O_RDONLY, 0) .............................................. = 9
[./diald ][8185] ioctl(9, 0xc00c4601, 0x7fff9550) ............................................ = 0
[./diald ][8185] ioctl(9, 0xc00c4604, 0x7fff9560) ............................................ = 0
[./diald ][8185] close(9) .................................................................... = 0
[./diald ][8185] ioctl(9, SIOC_GET_PLUN, 0x7fff9540) ......................................... ERR#9 EBADF
[./diald ][8185] close(9) .................................................................... ERR#9 EBADF
[./diald ][8185] open("/dev/fcd1", O_RDONLY, 0) .............................................. = 9
[./diald ][8185] ioctl(9, 0xc00c4601, 0x7fff9550) ............................................ = 0
[./diald ][8185] ioctl(9, 0xc00c4604, 0x7fff9560) ............................................ = 0
[./diald ][8185] close(9) .................................................................... = 0
[./diald ][8185] ioctl(9, SIOC_GET_PLUN, 0x7fff9540) ......................................... ERR#9 EBADF
[./diald ][8185] close(9) .................................................................... ERR#9 EBADF
We have 4 FC HBA (AB465) running B11.23.07 FibrChanl-01 and this specific one (/dev/fcd1) is not the one connected to the EMC DIsk Array but to a MSL6030 LTO library via a FC switch.
There's a chance that this behavior could be triggered by a backup running overnight over that channel, I could verify disabling the backup for a few days.
Still I would like some support from you about the meaning of the tusc traces above.
Thanks !
Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-31-2008 01:38 AM
тАО03-31-2008 01:38 AM
Re: diald taking 100% CPU
I'd check the EMC logs for possible hardware problems.
Also /var/adm/syslog/syslog.log for clues to this issue.
I suspect the application itself that is using the CPU cycles, perhaps remove it and reinstall it, or copy the binary from another similar system that does not exhibit this behavior.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-31-2008 02:03 AM
тАО03-31-2008 02:03 AM
Re: diald taking 100% CPU
I checked EMC and sys logs but nothing obvious there.
The /dev/fcd1 device refers to the HBA connected to the Backup LTO library, so that I suspect the issue could be triggered by the nightly backup running over that connection. I've disable the backup just to monitor the situation and restarted diald and its traces.
Do you know the meaning of:
[./diald ][8185] close(9) .................................................................... ERR#9 EBADF
[./diald ][8185] open("/dev/fcd1", O_RDONLY, 0) .............................................. = 9
I understand the EBADF refers to a file which diald tries to close but descriptor refers to no open file, not sure about the open row.
Thanks !
Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-31-2008 03:34 AM
тАО03-31-2008 03:34 AM
Re: diald taking 100% CPU
close(9) ...... = 0
ioctl(9, SIOC_GET_PLUN, 0x7fff9540) .. ERR#9 EBADF
close(9) ...... ERR#9 EBADF
>I understand the EBADF refers to a file which diald tries to close but descriptor refers to no open file
Basically there is some sloppy programming. The file is closed then ioctl and close is done on the unopened file.
Then this whole thing is repeated from the open.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-31-2008 05:22 AM
тАО03-31-2008 05:22 AM
Re: diald taking 100% CPU
I looked at the tusc taken when the process started behaving "wildly":
[./diald ][8185] ioctl(9, 0xc00c4604, 0x7fff9560) .... = 0
[./diald ][8185] close(9) ............................ = 0
[./diald ][8185] ioctl(8, SIOC_GET_PLUN, 0x7fff9540) . = 0
[./diald ][8185] stat("/dev/rmt/1mnb", 0x7fff9310) ... = 0
[./diald ][8185] stat("/tmp/devfcdtape0601c0", 0x7fff9380) = 0
[./diald ][8185] open("/tmp/devfcdtape0601c0", O_RDONLY|O_NDELAY, 01210) = 9
[./diald ][8185] close(10) ........................... = 0
[./diald ][8185] open("/dev/fcd1", O_RDONLY, 0) ...... = 10
[./diald ][8185] ioctl(10, 0xc00c4601, 0x7fff9550) ... = 0
[./diald ][8185] ioctl(10, 0xc00c4604, 0x7fff9560) ... = 0
[./diald ][8185] close(10) ........................... = 0
[./diald ][8185] ioctl(9, SIOC_GET_PLUN, 0x7fff9540) . ERR#16 EBUSY
[./diald ][8185] close(9) ............................ = 0
[./diald ][8185] open("/dev/fcd1", O_RDONLY, 0) ...... = 9
[./diald ][8185] ioctl(9, 0xc00c4601, 0x7fff9550) .... = 0
[./diald ][8185] ioctl(9, 0xc00c4604, 0x7fff9560) .... = 0
[./diald ][8185] close(9) ............................ = 0
[./diald ][8185] ioctl(9, SIOC_GET_PLUN, 0x7fff9540) . ERR#9 EBADF
[./diald ][8185] close(9) ............................ ERR#9 EBADF
[./diald ][8185] open("/dev/fcd1", O_RDONLY, 0) ...... = 9
[./diald ][8185] ioctl(9, 0xc00c4601, 0x7fff9550) .... = 0
[./diald ][8185] ioctl(9, 0xc00c4604, 0x7fff9560) .... = 0
[./diald ][8185] close(9) ............................ = 0
[./diald ][8185] ioctl(9, SIOC_GET_PLUN, 0x7fff9540) . ERR#9 EBADF
[./diald ][8185] close(9) ............................ ERR#9 EBADF
[./diald ][8185] open("/dev/fcd1", O_RDONLY, 0) ...... = 9
[./diald ][8185] ioctl(9, 0xc00c4601, 0x7fff9550) .... = 0
[./diald ][8185] ioctl(9, 0xc00c4604, 0x7fff9560) .... = 0
[./diald ][8185] close(9) ............................ = 0
[./diald ][8185] ioctl(9, SIOC_GET_PLUN, 0x7fff9540) . ERR#9 EBADF
[./diald ][8185] close(9) ............................ ERR#9 EBADF
[./diald ][8185] open("/dev/fcd1", O_RDONLY, 0) ...... = 9
[./diald ][8185] ioctl(9, 0xc00c4601, 0x7fff9550) .... = 0
I can see that an ioctl(9, SIOC_GET_PLUN, 0x7fff9540) gets a BUSY response, then it close the (9).
Next open on /dev/fcd1 gets assigned (9) but at this point diald starts with the ioctl on (9) after the descriptor is already closed. I think there could be a bug on the code maybe triggered only after that BUSY ocndition occurs, does it make sense ?
Thanks !
Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-31-2008 05:29 PM
тАО03-31-2008 05:29 PM
Re: diald taking 100% CPU
Yes. But FD 9 was on /tmp/devfcdtape0601c0 where it got that EBUSY.
Unless this was suppose to be on FD 8 where it also did that ioctl(8, SIOC_GET_PLUN, 0x7fff9540).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-02-2008 07:16 AM
тАО04-02-2008 07:16 AM
Re: diald taking 100% CPU
we suspended the backup on the device connected to /dev/fcd1 and the issue has not appeared again which so far makes me believe something could be related to that device.
About the /tmp/devfcdtape0601c0 it's true that it was the one replying BUSY but don't understand its meaning, could it be created by diald itself ?
Sorry for the naive question but what does that device mean in /tmp directory ? I checked it and it was really a character dev ice, maybe needed by diald for some temporary repository ?
I'll wait a few more days and then start the backup to monitor a bit closer that device.
Thanks !
Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-02-2008 08:19 PM
тАО04-02-2008 08:19 PM
Re: diald taking 100% CPU
Most likely, check the date.
>Sorry for the naive question but what does that device mean in /tmp directory?
I don't know either. I've seen /var/tmp/rdsk* files.