- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: "nulptr dereferences trap enabled"
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-24-2008 01:31 AM
тАО12-24-2008 01:31 AM
Re: "nulptr dereferences trap enabled"
If that close() hang for some kernel related reasons, zombie children will not be collected, and the application can't be killed.
then only crashinfo -l -s -v -t can help to find what hap.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-24-2008 04:40 AM
тАО12-24-2008 04:40 AM
Re: "nulptr dereferences trap enabled"
admin 20479 1 0 09:20:20 pts/0 00:04 /usr/local/../server64
This is not a zombie.
>After normal shutdown (no kill involved):
-3 20479 1 0 09:20:20 pts/0 00:07 /usr/local/../server64
Hmm, why the -3? If this stays like this, this isn't a zombie, it is wedged. Are there other zombies that are children of this process?
>The tusc output shows what I expect in a process shutdown:
lwp_detached_exit()
exit(0) <--- last entry in the log
After the exit system call, it can only be the kernel that has messed up.
>I don't understand the comments about buggy application code related to waiting for SIGCHLD. The parent pid is init.
A zombie is a child of a process that has sloppy code that doesn't handle the death (SIGCHLD) of the child. Unless 20479 has zombie children, there are no zombies here. I suppose a zombie master could be created if the parent is hung on an NFS mount before it can handle SIGCHLD.
>This is _very_ mature application code
This is meaningless if there is a kernel bug that causes a hang on the exit system call.
We need the output (before and after) of this ps command:
UNIX95=EXTENDED_PS ps -Hfu admin
(Make sure you select "Retain format" or attach a file.)
>Laurent: one of the possibility for zombies to not be collected is when the parent process is exiting,
In this case the application is already broken and is a zombie master. I.e. it should handle the SIGCHLD as soon as possible and not wait until the end.
>the process must close all its children procs.
I'm not sure what you mean here? This is not windows, only files are closed. Orphaned processes are reparented to init.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-24-2008 05:09 AM
тАО12-24-2008 05:09 AM
Re: "nulptr dereferences trap enabled"
parent process close all its fildescriptors - socket or files-
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-24-2008 05:17 AM
тАО12-24-2008 05:17 AM
Re: "nulptr dereferences trap enabled"
Ah, perhaps using gpm before and after to look at the open files may help?
Or write a program to call pstat(2) to get the open files?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-24-2008 08:01 AM
тАО12-24-2008 08:01 AM
Re: "nulptr dereferences trap enabled"
There are no child procs. This is multi-threaded, not multi-process.
>Prior to server shutdown, ps output looks like this:
> admin 20479 1 0 09:20:20 pts/0 00:04 /usr/local/../server64
> This is not a zombie.
Of course not ... "prior to shutdown" is the key phrase. You kept pushing on the concept of our application failing to issue wait().
There is no wait ... no child process ... init is the only process here with small children to care for.
> why -3 ... wedged
Don't know, because I'm not driving the tests, so I don't know if the process remains like this. This is a very repeatable experiment that only a reboot can clear up.
Is this something I should track?
> After the exit system call, it can only be the kernel that has messed up.
Agreed.
>This is _very_ mature application code
>This is meaningless if there is a kernel bug that causes a hang on the exit system call.
Now we're all on the same page. That's the meaningful part ... this is an HPUX kernel bug.
> We need the output (before and after) of this ps command:
> UNIX95=EXTENDED_PS ps -Hfu admin
Precisely the output I requested and have shown in my previous post. There is no complicated process structure here.
>the process must close all file descriptors
I didn't count them all up, but all the file descriptors involved with connect/fcntl/accept/send/recv activity in the tusc output received a zero return code from shutdown(##, SHUT_RDWR) and close(##) requests.
You have a toy program that demonstrates such a failure?
Ooops, I just found reference to:
accept(9,....) = 11
With no close(9) request at shutdown.
Is HPUX "fuser" robust enough, or should I be installing "lsof" ? How about post-exit "netstat" output?
> perhaps using gpm before and after to look at the open files may help?
"gpm" / glance is not part of my normal debugging toolbag. I can suggest it to the onsite HP team.
You can write a test program to generate such a condition? I've never seen unix behave in such a fashion.
"wedged" and "zombie" seem interchangeable to me. If the process can't be killed with_no_mercy ... it walks and feels like a zombie.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-24-2008 09:13 AM
тАО12-24-2008 09:13 AM
Re: "nulptr dereferences trap enabled"
crashinfo -l -s -v -t
is a good start. - crashinfo is a WTEC/L3 support tool-
It will give the stack trace, and it is probable that a TOC dump may be necessary then.
Ask to HP support to elevate your call to L3 -where it should have been elevated already-
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-24-2008 09:56 AM
тАО12-24-2008 09:56 AM
Re: "nulptr dereferences trap enabled"
Thanks Laurent, will report when we get that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-24-2008 10:38 AM
тАО12-24-2008 10:38 AM
Re: "nulptr dereferences trap enabled"
There is a version of it posted here:
http://forums13.itrc.hp.com/service/forums/questionanswer.do?threadId=1089090&admit=109447627+1230143352115+28353475
Could you explain what these flags do?
> crashinfo -l -s -v -t
Do I have to get the latest version from HP?
It's not clear to me whether this will diagnose zombie processes, or if it is just used for a panic crash or core file.
Note: we can't get a core file. kill -6 hangs the process without producing a core file. That's a huge stumbling block for remote debugging.
What do I tell the onsite team?
1. Escalate to L3
2. Get crashinfo
3. Run with these flags, and return output to L3.
Anything else?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-24-2008 06:55 PM
тАО12-24-2008 06:55 PM
Re: "nulptr dereferences trap enabled"
Then there are no zombies.
>Of course not, "prior to shutdown" is the key phrase.
>There is no complicated process structure here.
That's not what I meant. Since its parent is INIT, it can't be a zombie.
Since you incorrectly mentioned zombies, which have to have a complicated structure.
>With no close(9) request at shutdown.
This could be the problem. And having multiple threads could also be an issue.
>"wedged" and "zombie" seem interchangeable to me. If the process can't be killed it walks and feels like a zombie.
They are completely different. A zombie is a well defined term, a defunct process, due to sloppy application programming. A hung process is due to a bad design of UNIX (I/O hung) or a bug in the kernel.
And as I said, a zombie can be killed by killing the zombie master.
>It's not clear to me whether this will diagnose zombie processes
It will help in diagnosing your hung process.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-24-2008 10:19 PM
тАО12-24-2008 10:19 PM
Re: "nulptr dereferences trap enabled"
The process runs correctly, and the parent pid is init. Of course it's not a zombie?
Who said it was?
The process is then told to shutdown.
exit() never completes, the process is hung, holds onto critical resources that prevent it from being restarted without rebooting the machine.
I really don't care what you call the second process state.
I call it "HP's problem".
> And as I said, a zombie can be killed by killing the zombie master.
Sure, stick to your story. Most people following this thread know that you cannot kill init.
>With no close(9) request at shutdown.
This could be the problem.
I have 4 tusc outputs. Only one showed an accept(9) system call that wasn't paired with a close(9). All 4 shutdowns show a clean exit(0), and the process remains hung.
If that's a problem, prove it with a toy program that exhibits the failure.
> And having multiple threads could also be an issue
Oh my ... a problem for modern software, or just HPUX. That's just plain silly.
>It's not clear to me whether this will diagnose zombie processes
>It will help in diagnosing your hung process.
And I'll just have to trust you on this, because the crashinfo binary (it's not a shell script I could hack) is not documented in a public place, and is only available through HP support.
Conclusion: it's HP's problem to solve.
I don't see how I can contribute anything more.