1833016 Members
2897 Online
110048 Solutions
New Discussion

Re: semop memory fault

 
LMURA
Occasional Advisor

semop memory fault

We are migrating from HP-UX 11.11 to HP-UX 11.23 on new HP Itanium Rx3600 servers. After recompiling, the program crashes with a "Memory fault" doing a semop() call, it doesn't return with an error message (ie. EINVAL) so it's difficult to troubleshoot. It is compiled 32bit. It appears that the semget() and semctl() correctly obtain and release the semaphore per debug and ipcs -cs. The semaphore kernel parameters are: semume=256,semmnu=16384,semmni=16388,semmns=32776. This program has been running in production for many years on HP with PA-RISC.
Can anyone help?
13 REPLIES 13
Steven E. Protter
Exalted Contributor

Re: semop memory fault

Shalom,

Tough to help without seeing the code.

Some resouces with FAQ's and common problems.

http://h21007.www2.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,9890,00.html

http://www.hp.com/products1/evolution/9000/faqs.html

http://devresource.hp.com/drc/STK/docs/refs/srctransitions.jsp

With me ustually thelast link is best.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
A. Clay Stephenson
Acclaimed Contributor

Re: semop memory fault

It would help if you posted a stack trace or at least tusc output so that we could see the actual parameters. One thing that does occur to me is that struct sembuf * may reference a variable that has gone out of scope (or is too small for the nsops value passed). This is just the sort of code that might work by accident for many years and only show up when ported to new platforms or OS versions.
If it ain't broke, I can fix that.
LMURA
Occasional Advisor

Re: semop memory fault

Attached is a snapshot captured while running glance.
A. Clay Stephenson
Acclaimed Contributor

Re: semop memory fault

Glance is useful for many things but it's rather pointless for this. You need a stack trace from gdb or the tusc output from the section of code that executes the failing semop.
If it ain't broke, I can fix that.
LMURA
Occasional Advisor

Re: semop memory fault

The attached file contains tusc output from two programs and output from ipcs. The beginning of the file has the tusc output from cxmm400a. It then has the tusc output from cxmm100a. It then shows the output from ipcs which shows the shared memory and semaphores that are being created.

We have a script which calls the program cxmm400a. This is the program getting a memory fault. This program is doing a fork and calling the program cxmm100a. Once cxmm100a is up and running successfully cxmm400a should shutdown gracefully.

The program cxmm100a is creating the shared memory and semaphores. In the creation we were hanging on a semop call. We fixed this by using the union structure in the semctl calls and not just passing in an integer.

The cxmm100a program is now trying to reattach to the segment. It is executing a semop call and this is where the memory fault is occuring.
A. Clay Stephenson
Acclaimed Contributor

Re: semop memory fault

I must be blind but I don't see any semxxx() or shmxxx() system calls in any of your tusc output.
If it ain't broke, I can fix that.
LMURA
Occasional Advisor

Re: semop memory fault

We ran the tusc without any options, so it doesn't appear to show the system calls. Can you suggest an option to specify on the tusc?
A. Clay Stephenson
Acclaimed Contributor

Re: semop memory fault

Tusc IS displaying the system calls (e.g. open,lseek,read,write, ...) in the first section of each line; it's just not displaying the system calls you expect because it isn't yet executing that section of code. It looks as though your program is failing much earlier than you think it is.
If it ain't broke, I can fix that.
LMURA
Occasional Advisor

Re: semop memory fault

Attached is a more detailed trace of the pgm execution with the system calls.
A. Clay Stephenson
Acclaimed Contributor

Re: semop memory fault

At this point, I would go back and carefully check all of the arguments for type consistancy and compare your declarations with the man pages and make sure that you don't have any "homegrown" or "pet" header files that redefine standard types. I see a couple of things that might be of interest. When you are doing the shmget()'s that actually creat the segments, how is the size actually determined? Are you using hard-coded values or are you using the sizeof() pseudo-function? The second thing is that using keys 1,2,3, .... is fraught with danger of collision with other applications so the use of the ftok() function would be a wiser choice. One other question: Are all the applications which access the shared memory 32-bit applications? It's possible to mix 32-bit and 64-bit applications which access common shared memory but extra caution is required.

Something else that would be very useful would be to compile/link/make your application with the -g compiler option and let it crash and then do a stack trace using gdb. That's a bit better than tusc but your tusc goes a long way towards finding this.
If it ain't broke, I can fix that.
A. Clay Stephenson
Acclaimed Contributor

Re: semop memory fault

Really, really check your arguments because when you mentioned changing from an integer argument fixed one problem, I suspect that you code is less than carefully coded for portability. For example, in most cases size_t and int are fairly interchangable but note "in most cases". Even running your code through lint might instantly point out your problem. I also assume that any of your data structures are defined in exactly one header file.
If it ain't broke, I can fix that.
Dennis Handly
Acclaimed Contributor

Re: semop memory fault

If you are on IPF, you should compile with +wlint.

Your tusc output doesn't show any signal 11s. Did you run tusc with -fp (to follow forks)?

But as Clay says, you should use -g and gdb to just debug the abort.

LMURA
Occasional Advisor

Re: semop memory fault

Thanks for all the information and suggestions. We will try compiling with the -g option and use gdb to get a trace. We'll also try using lint.

All the code for the application was recompiled using 32bit.

It was suggested that we scan the code with the HP-UX STK (software transition kit) version 2.2.

We also noticed that the system include header files used by the kernel for semaphores /usr/include/sys/sem.h and /usr/include/sys/ipc.h are different on the new server compared with our hp-ux 11.11 operating system.