Operating System - HP-UX
1751920 Members
5063 Online
108783 Solutions
New Discussion юеВ

Re: "nulptr dereferences trap enabled"

 
SOLVED
Go to solution
vmguy
Frequent Advisor

"nulptr dereferences trap enabled"

I have two nearly identical HPUX ia64 systems.

An identical library on each (confirmed with cksum) shows different output from chatr:

* "nulptr dereferences trap enabled"
* "nulptr references enabled"

The first system causes a zombie process when the application is shutdown normally.

The second does not.

If these systems are identical, what kernel configuration setting is causing null pointer dereferences to be trapped?

How do I change that?
Is it hardware differences, or software?

Google is virtually silent on this.

HPUX ia64 B.11.23d
HP Case ID: 1603027519

59 REPLIES 59
Patrick Wallek
Honored Contributor

Re: "nulptr dereferences trap enabled"

You should start by checking the differences in patches between the 2 systems.

# show_patches

is a good place to start.
Dennis Handly
Acclaimed Contributor

Re: "nulptr dereferences trap enabled"

>An identical library on each (confirmed with cksum) shows different output from chatr:
>* "nulptr dereferences trap enabled"
>* "nulptr references enabled"

(The -z/-Z option isn't valid for shlibs. So this makes no difference, especially if the same cksum.)

As Patrick says, you have different linker/dld patches on the two systems. I specifically asked that the chatr(1) message be changed to be more understandable. I'm assuming the first one has the newer linker.

>causing null pointer dereferences to be trapped

This is a good thing(tm).
vmguy
Frequent Advisor

Re: "nulptr dereferences trap enabled"

Patrick:
> check the differences in patches

I've checked, but I don't have enough experience to know what the _significant_ differences are.

Attached is my own format of differences derived from inventory.xml from each machine.

"machine1" fails (crashes)
"machine2" does not.

Dennis:
> (The -z/-Z option isn't valid for shlibs. So this makes no difference, especially if the same cksum.)

Perhaps you missed my point. This is a 3rd party library, downloaded to each machine from the vendor, and chatr gives different results.

The reason for using "chksum" is to discover, during install, if the library had been altered. It was not altered.

The development guru for the product seems pretty certain that -z provides the functionalty needed.

It's just that each HPUX has a different understanding of the library behaviour.

> you have different linker/dld patches

Can you explain why that makes a difference?
I didn't build this library on each machine.
Only chatr (and ultimately the execution stack) treats this binary library differently.

> I'm assuming the first one has the newer linker

The message is from chatr, not the linker.

>causing null pointer dereferences to be trapped ... is a good thing(tm).

Great engineering theory. Terrible production problem if it cannot be controlled. I don't have control over this library, but it causes the software that uses it to create zombies on shutdown.

-----
Note: we have used tusc to observe the shutdown system calls. Everything looks normal, except the process doesn't actually come down.

Many methods were tried to get more information on the issue.

kill #pid .... produces a zombie
kill -6 #pid ... never produces a core, just a zombie
kill -9 pid# ... does not remove the process, nor can we get rid of the zombie.

------
I'm suspicious that of the suggestion that this is merely a difference in message format.

One ignores null pointer references:
One causes a "trap" (some HPUX mechanism) when a null pointer is dereferenced.

All of this is during shutdown of the process. The software works correctly in all other respects.

What is the mechanism for trapping null pointer dereferences?
vmguy
Frequent Advisor

Re: "nulptr dereferences trap enabled"

Oooo, I learned something new in my google wanderings.

Google
[ "null pointer" trap -java site:hp.com ]

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1259127

"If I "chatr -z " (enable the "nulptr dereferences trap")on the exe"

I didn't know you could use chatr this way.

Should I be instructing the customer to use:

chatr -Z library.so
vmguy
Frequent Advisor

Re: "nulptr dereferences trap enabled"

There goes that idea. Dennis has already contributed to the thread.

> Are you sure you used chatr(1) on the executable and not some shared lib?

And now I understand the confusion about the -z ... one is used a compiler time, the other at execution.

I don't know if the compiler builds in special code for handling null pointers, but the explanation provided (thanks Dennis) makes sense:

> In order to implement the -Z default, a special null pointer page is used by the kernel. All processes with -Z share it.

So perhaps I should be instructing the user to use:

chatr -z executable_name

so that this global page zero is not used?

Thanks for the valuable discussion so far.
Dennis Handly
Acclaimed Contributor

Re: "nulptr dereferences trap enabled"

>but I don't have enough experience to know what the _significant_ differences are.

I already told you, linker/dld patches.
You do have PHSS_37947. Machine2 seems to have this old patch PHSS_34353.

(But this is besides Patrick's point, the party line is if one fails, you make it like the other. :-)

>Perhaps you missed my point. This is a 3rd party library, downloaded to each machine from the vendor, and chatr gives different results.

I don't see how, I told you exactly what patches to check. And I mentioned that the different chatr results are not important for 2 reasons, only that there are differences.

>The reason for using "cksum" is to discover, during install, if the library had been altered.

Exactly and that goes to what I said. I was the proximal cause of the change to chatr, JAGag09149 in PHSS_34852.

>The development guru for the product seems pretty certain that -z provides the functionality needed.

I'm not sure how? -z detects sloppiness. It won't fix things.

>It's just that each HP-UX has a different understanding of the library behaviour.

Not really, just better wording.

>Can you explain why that makes a difference? Only chatr (and ultimately the execution stack) treats this binary library differently.

dld is the software that handles shlibs and process start/exit. So, dld has everything to do with the execution.

>The message is from chatr, not the linker.

The N inferences are that if chatr changes so does dld.

>but it causes the software that uses it to create zombies on shutdown.

You haven't explained how it fails and there may be NO connection with -z and the abort. There are plenty of ways dld can hose you over.

>Everything looks normal, except the process doesn't actually come down.

If you see no signals, this it is likely unrelated.

>kill -9 pid does not remove the process, nor can we get rid of the zombie.

You get rid of zombies by killing the zombie master.

>I'm suspicious that of the suggestion that this is merely a difference in message format.

Why?? That's exactly what JAGag09149 did. Also that difference means that dld is different.

>All of this is during shutdown of the process.

dld has had problems in this area.

>What is the mechanism for trapping null pointer dereferences?

Hardware R/W protection on page 0.

>Should I be instructing the customer to use: chatr -Z library.so

I said that was useless, it is effective only on the executable.

>-z one is used a compile time, the other at execution.

No, one is used at link time, the other post link.

>I don't know if the compiler builds in special code for handling null pointers

Why? That's what the hardware is for.

>So perhaps I should be instructing the user to use: chatr -z executable_name

That may catch a problem earlier. But your major point is that everything is the same except the patches installed.
vmguy
Frequent Advisor

Re: "nulptr dereferences trap enabled"

Patrick: Thanks. I learned:

1. Somehow I seemed to have annoyed you with my ignorance of HPUX. I'm sorry.

2. Wasn't familiar with the "dld" acronym, now I am. You told me, I didn't understand.

3. Didn't know there was a zombie master on HPUX, now I do.

4. Understand now why chatr on the executable puts the process into that protected page 0 group.

"Null pointer reference" is slightly misleading; any pointer value less than the protected page size will be trapped.

5. Thanks for the PHSS_37947 patch reference. I understand now why I need that. I wasn't able to pick that out of the list of differences on my own.

-- Cheers. I'll report results when we have a resolution.
Patrick Wallek
Honored Contributor
Solution

Re: "nulptr dereferences trap enabled"

vmguy,

You didn't annoy me. My ONLY response to this thread was the first one. Dennis Handly responded the rest of the time.

Please try to keep the people that respond straight.

No offense Dennis, but I really don't want to be you! :)
vmguy
Frequent Advisor

Re: "nulptr dereferences trap enabled"

> You get rid of zombies by killing the zombie master

"zombie master" is a term of your own creation to describe the "parent pid" of a zombie process?

Please alert when you want to play games on a technical forum.

http://en.wikipedia.org/wiki/Zombie_Master

http://research.facetime.com/term_show.php?id=90