Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Use Pthread[CMA-F-EXIT_THREAD]

 
SOLVED
Go to solution
JimsanTsai
Advisor

Use Pthread[CMA-F-EXIT_THREAD]

Dear Sirs~
I'm programming in OpenVMS 7.3-2, COMPAQ C V6.5. I use pthread and socket to implement 16 clients for send/recv data(16 socket connection and they are never be closed!).
When I start to send/recv data for a while, I get error message as follows:
----------------------------------------------
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000
0000, PC=0000000000000000, PS=0000001B
%TRACE-F-TRACEBACK, symbolic stack dump follows
image module routine line rel PC abs PC
0 0000000000000000 0000000000000000
QUEUESYNCAGENT ? ?
PTHREAD$RTL 0 0000000000055E8C 000000007BD3FE8C
PTHREAD$RTL 0 0000000000042C90 000000007BD2CC90
0 0000000000000000 0000000000000000
PTHREAD$RTL ? ?
0 FFFFFFFF80275F14 FFFFFFFF80275F14
%CMA-F-EXIT_THREAD, current thread has been requested to exit
----------------------------------------------
This error generates two dmp file as follows:
1.QueueSyncAgent.DMP;1

%DEBUG-I-AIMGMISMATCH, the image file DISK$DATA1:[USER.JIMSANTSAI.MQ]QUEUESYNCAG
ENT.DSF;1 does not appear to match with the target (or dumpfile) image
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000
0000, PC=0000000000000000, PS=0000001B
break on unhandled exception at 0 in THREAD 3
image name set base address end address

CMA$TIS_SHR no 000000007B95A000 000000007B96BFFF
DECC$SHR_EV56 no 000000007C03C000 000000007C0D1FFF
DPML$SHR no 000000007BCB8000 000000007BCFDFFF
LIBOTS no 000000007B63E000 000000007B645FFF
LIBRTL no 000000007B5EC000 000000007B63DFFF
PTHREAD$RTL no 000000007BCFE000 000000007BD69FFF
*QUEUESYNCAGENT yes 0000000000010000 00000000000503FF
TCPIP$ACCESS_SHR no 000000000094E000 00000000009EE9FF
TCPIP$IPC_SHR no 00000000008CC000 000000000094D1FF
UCX$IPC_SHR no 00000000009F0000 0000000000A711FF

total images: 10
Thread Name State Substate Policy Pri
------ ------------------------- --------------- ----------- ------------ ---
1 default thread blocked join 2 SCHED_OTHER 11
2 ready gbl $hiber SCHED_OTHER 11
3 running V3 SCHED_OTHER 11
4 ready gbl SCHED_OTHER 11
5 ready gbl $setast SCHED_OTHER 11
6 blocked $synch 64 SCHED_OTHER 11
7 running V2 SCHED_OTHER 11
8 ready gbl $hiber SCHED_OTHER 11
9 ready gbl $hiber SCHED_OTHER 11
10 ready gbl $hiber SCHED_OTHER 11
11 ready gbl $hiber SCHED_OTHER 11
12 ready gbl $hiber SCHED_OTHER 11
13 ready gbl $setast SCHED_OTHER 11
14 ready gbl ims 160006 SCHED_OTHER 11
15 ready gbl $setast SCHED_OTHER 11
16 ready gbl $setast SCHED_OTHER 11
17 ready gbl $setast SCHED_OTHER 11
module name routine name line rel PC abs PC
0000000000000000 0000000000000000
----- the above looks like a null frame in the same scope as the frame below
*QUEUESYNCAGENT QueueSyncThread ? ?
SHARE$PTHREAD$RTL_DATA0 0000000000025E8C 000000007BD3FE8C
SHARE$PTHREAD$RTL_DATA0 0000000000012C90 000000007BD2CC90
0000000000000000 0000000000000000
----- the above looks like a null frame in the same scope as the frame below
SHARE$PTHREAD$RTL_DATA0 ? ?
FFFFFFFF80275F14 FFFFFFFF80275F14
---------------------------------------------

2.QueueSyncAgent.DMP;1

%DEBUG-I-AIMGMISMATCH, the image file DISK$DATA1:[USER.JIMSANTSAI.MQ]QUEUESYNCAG
ENT.DSF;1 does not appear to match with the target (or dumpfile) image
%CMA-F-NOMSG, Message number 004081A4
break on unhandled exception preceding SHARE$PTHREAD$RTL_DATA3+76648 in THREAD 3
image name set base address end address

CMA$TIS_SHR no 000000007B95A000 000000007B96BFFF
DBGTBKMSG no 00000000081BA000 00000000081C79FF
DECC$MSG no 000000000819E000 00000000081A1FFF
DECC$SHR_EV56 no 000000007C03C000 000000007C0D1FFF
DPML$SHR no 000000007BCB8000 000000007BCFDFFF
LIBOTS no 000000007B63E000 000000007B645FFF
LIBRTL no 000000007B5EC000 000000007B63DFFF
PTHREAD$RTL no 000000007BCFE000 000000007BD69FFF
*QUEUESYNCAGENT yes 0000000000010000 00000000000503FF
SHRIMGMSG no 0000000008196000 000000000819C9FF
TCPIP$ACCESS_SHR no 000000000094E000 00000000009EE9FF
TCPIP$IPC_SHR no 00000000008CC000 000000000094D1FF
TCPIP$MSG no 00000000081A2000 00000000081B81FF
TRACE no 000000007B976000 000000007BA77FFF
UCX$IPC_SHR no 00000000009F0000 0000000000A711FF

total images: 15
Thread Name State Substate Policy Pri
------ ------------------------- --------------- ----------- ------------ ---
1 default thread blocked join 2 SCHED_OTHER 11
2 blocked $hiber SCHED_OTHER 11
3 running V2 SCHED_OTHER 11
4 blocked $hiber SCHED_OTHER 11
5 blocked $hiber SCHED_OTHER 11
6 blocked $hiber SCHED_OTHER 11
7 blocked $hiber SCHED_OTHER 11
8 blocked $hiber SCHED_OTHER 11
9 blocked $hiber SCHED_OTHER 11
10 blocked $hiber SCHED_OTHER 11
11 blocked $hiber SCHED_OTHER 11
12 blocked $hiber SCHED_OTHER 11
13 blocked $hiber SCHED_OTHER 11
14 blocked $hiber SCHED_OTHER 11
15 blocked $hiber SCHED_OTHER 11
16 blocked $hiber SCHED_OTHER 11
17 blocked $hiber SCHED_OTHER 11
module name routine name line rel PC abs PC
SHARE$PTHREAD$RTL_DATA0 0000000000012B68 000000007BD2CB68
SHARE$PTHREAD$RTL_DATA0 000000000003C698 000000007BD56698
FFFFFFFF80165084 FFFFFFFF80165084
FFFFFFFF8027521C FFFFFFFF8027521C
SHARE$TRACE_DATA1 00000000000002CC 000000007B9962CC
FFFFFFFF802761A0 FFFFFFFF802761A0
----- above condition handler called with exception 0000000C:
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000
0000, PC=0000000000000000, PS=0000001B
----- end of exception message
FFFFFFFF800A60BC FFFFFFFF800A60BC
0000000000000000 0000000000000000
----- the above looks like a null frame in the same scope as the frame below
*QUEUESYNCAGENT QueueSyncThread ? ?
SHARE$PTHREAD$RTL_DATA0 0000000000025E8C 000000007BD3FE8C
SHARE$PTHREAD$RTL_DATA0 0000000000012C90 000000007BD2CC90
0000000000000000 0000000000000000
----- the above looks like a null frame in the same scope as the frame below
SHARE$PTHREAD$RTL_DATA0 ? ?
FFFFFFFF80275F14 FFFFFFFF80275F14
---------------------------------------------
If you have and suggestion, thanks a lot...

11 REPLIES 11
JimsanTsai
Advisor

Re: Use Pthread[CMA-F-EXIT_THREAD]

Dear Sirs~
When I change LINK parameter as follows
LINK/THREADS_ENABLE => LINK/THREADS_ENABLE=UPCALLS
This situation didn't seem to occur. The Server has two CPUs. I don't know the root case. If you have any suggestion, please let me know...Thanks a lot~
Hoff
Honored Contributor

Re: Use Pthread[CMA-F-EXIT_THREAD]

Your code appears to have one or more bugs; stuff on the stack appears to be getting zeroed. I'd guess that enabling the upcalls option merely moves things around in memory; that the bug is still latent here.

Some research into the trigger for the AIMGMISMATCH is warranted.

These things are not going to get resolved here; only through debugging. Through fenceposts / canaries. Through application-integrated logging. These and all the steps and tools and techniques that are necessary when maintaining large applications.

You would be wrong to assume that ACCVIOs in system routines are not your problem, too. It's quite possible and even rather common to trigger these reports through application errors.

H.Becker
Honored Contributor

Re: Use Pthread[CMA-F-EXIT_THREAD]

>>>
LINK/THREADS_ENABLE => LINK/THREADS_ENABLE=UPCALLS
<<<

This actually disables multiple kernel threads.
JimsanTsai
Advisor

Re: Use Pthread[CMA-F-EXIT_THREAD]

Dear Hoff&Becker~
Thanks for your advice~ Let me explian my program logic as follows:

Step1.Initial pthread attribute.
nRetNumber=pthread_attr_init(&attr);

Step2.set attribute stackize to 8192000 bytes.
nRetNumber=pthread_attr_setstacksize
(&attr,DEF_PER_THREAD_STACK_SIZE);

Step3.
nRetNumber=pthread_create(&QSyncpth, &attr, QueueSyncThread, (void*)pstthreadparam);
-----------------------------------------
QueueSyncThread Logic:
3.1 socket connect to another server
3.2 send data to another server
3.3 recv data from another server
3.4 sleep(1)
Looping 3.2~3.4
-----------------------------------------
Looping step 1~3 to create 16 threads.

Step4.pthread join
nRetNumber=pthread_join(QSyncpth,NULL);

For example: pstthreadparam means number 1~16, When I use LINK/THREADS_ENABLE=MULTIPLE_KERNEL_THREADS, all threads get the parameter pstthreadparam always equal to 16. So I chage it to UPCALLS, it never happen again.
I don't know the root cause, but it never crash since 8/12(UPCALLS). Otherwise, I mapping the global section frequently in each thread.
Some advice to me???thanks~
Hoff
Honored Contributor
Solution

Re: Use Pthread[CMA-F-EXIT_THREAD]

If you're not already familiar with the OpenVMS Debugger, you'll want to skim the debugger manual.

As for some previous articles related to the access violation (ACCVIO) error...

http://labs.hoffmanlabs.com/node/800
http://labs.hoffmanlabs.com/node/401
http://labs.hoffmanlabs.com/node/848

You've threads involved here, which makes the whole application and the debugging sequence here rather more complicated.

First step? Find the area of the code that is involved with the error, and work toward the error. Use (program) the debugger, or (add?) use integrated logging, or both.

It's exceedingly unlikely folks will be able to help you with this application debugging here in ITRC, except in fairly general suggestions.

JimsanTsai
Advisor

Re: Use Pthread[CMA-F-EXIT_THREAD]

Dear Hoff~
Thanks for your advice. I'll study the references and try to find out the bugs of my application.
JimsanTsai
Advisor

Re: Use Pthread[CMA-F-EXIT_THREAD]

Dear Sirs~
I add code as follows to catch signal:
-------------------------------------
for(i=1;i<=NSIG;i++)
{
signal(i, abortTheProcessWhenAnExceptionOccurs);
}
-------------------------------------
And I get a dump file as follows:

DBG> sh call
module name routine name line rel PC abs PC
SHARE$PTHREAD$RTL_DATA0 0000000000012B68 000000007BD2CB68
SHARE$DECC$SHR_EV56_DATA1 00000000001B73F4 FFFFFFFF80B5F3F4
SHARE$DECC$SHR_EV56_DATA1 00000000001B70BC FFFFFFFF80B5F0BC
FFFFFFFF8017570C FFFFFFFF8017570C
FFFFFFFF8017570C FFFFFFFF8017570C
----- above condition handler called with exception 0000000C:
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000
0000, PC=0000000000000000, PS=0000001B
----- end of exception message
FFFFFFFF800A60BC FFFFFFFF800A60BC
0000000000000000 0000000000000000
----- the above looks like a null frame in the same scope as the frame below
*QUEUESYNCAGENT QueueSyncThread ? ?
SHARE$PTHREAD$RTL_DATA0 0000000000025E8C 000000007BD3FE8C
SHARE$PTHREAD$RTL_DATA0 0000000000012C90 000000007BD2CC90
0000000000000000 0000000000000000
----- the above looks like a null frame in the same scope as the frame below
SHARE$PTHREAD$RTL_DATA0 ? ?
FFFFFFFF80275F14 FFFFFFFF80275F14


The application running about 8~10 hours, then the exception occurs(SIGBUS), one of threads will be useless.

There's my cc&link command:
CC:
cc/nowarning/check SRC$DIR:shmqinc_l200 /obj=OBJ$DIR:shmqinc_l200.obj
cc/nowarning SRC$DIR:log4c.c /obj=OBJ$DIR:log4c.obj
cc/nowarning SRC$DIR:Readini.c /obj=OBJ$DIR:Readini.obj
cc/nowarning SRC$DIR:SocketAPI.c /obj=OBJ$DIR:SocketAPI.obj
cc/debug/nowarning/reentrancy=multithread/list/machine/check AUO$MQ$SRC$DIR:QueueSyncAgent /obj=OBJ$DIR:QueueSyncAgent.obj

LINK:
link/debug/threads_enable=UPCALLS/map/full/cross /exe=HOME$DIR:QueueSyncAgent.exe OBJ$DIR:QueueSyncAgent, OBJ$DIR:log4c, OBJ$DIR:shmqinc_l200, OBJ$DIR:SocketAPI, OBJ$DIR:Readini
Hoff
Honored Contributor

Re: Use Pthread[CMA-F-EXIT_THREAD]

I generally catch exceptions with the debugger; you can use SET BREAK /EXCEPTION for that. If the code is tossing exceptions around normally, then you can either program the debugger to ignore the "normal" ones, or you can insert a signal of SS$_DEBUG into the code in a handler such as yours.

Now that you've located the area of the code that is involved with the error, start looking at what happened. Look at variables. Look at the context. From here, look "upstream" in the code. Find some code upstream of the trigger, set a breakpoint there, and work forward.

Given the length of time involved in this application run before the error, I tend to use integrated logging here. Looking at key variables at run-time, and logging progress and details. This can sometimes help spot the trigger, or isolate the trigger. (I've had a few of these cases over the years.)

Depending on what the code is doing and how long this code is running, also look to implement some form of application checkpoint-restart, too.

These cases are not fun to debug. These are not easy to debug. And that's with the source code access, and with access to the debugger and tools. (This code looks to add interprocess communications into the mix, which adds more complexity, and more exposure to corner cases.) For us folks out here in ITRC-land, questions such as this are basically impossible to help you with in anything other than generalities.

Compiler warnings are your friends.

I would encourage switching from CC /NOWARNINGS (or from its rough analog CC /STANDARD=VAXC) to at least CC /WARNINGS, and for complex and hairy or buggy (or where portability is required) to enable added warnings with CC /ENABLE= WARN= QUESTCODE during code debug and testing. Fix the warnings reported by the compiler, and don't suppress them. (Debugging code with warnings is arguably premature. Take the time to fix the warnings first.)

The SIGBUS error is the C representation of what OpenVMS calls an ACCVIO.

Or, if you're (really) stuck, talk to somebody else on the local engineering staff that has strong debug skills ("team programming" or whatever the kids call it now), or talk with your manager about this case and what to do. That might be training, it might involve refactoring the code, or it might involve bringing in somebody that can help with the debug here. But I'd start by fixing the warnings.
JimsanTsai
Advisor

Re: Use Pthread[CMA-F-EXIT_THREAD]

Dear Hoff~
Thank for your advice, I modify the CC command to "cc/WARNINGS=ENABLE=QUESTCODE", and fix the warnings, about 4.5 hours, the application catch SIGILL(4) signal, the message as follows:
--------------------------------------
Catch Signal=[4]
%CMA-F-EXIT_THREAD, current thread has been requested to exit
%TRACE-F-TRACEBACK, symbolic stack dump follows
image module routine line rel PC abs PC
PTHREAD$RTL 0 0000000000042B68 000000007BD2CB68
DECC$SHR_EV56 0 00000000001B73F4 FFFFFFFF80B5F3F4
DECC$SHR_EV56 0 00000000001B70BC FFFFFFFF80B5F0BC
0 FFFFFFFF8017570C FFFFFFFF8017570C
0 FFFFFFFF8017570C FFFFFFFF8017570C
----- above condition handler called with exception 00000454:
%SYSTEM-F-ROPRAND, reserved operand fault at PC=FFFFFFFF8015B074, PS=0000001B
----- end of exception message
0 FFFFFFFF800A60BC FFFFFFFF800A60BC
0 0000000000000000 FFFFFFFF8015B074
QUEUESYNCAGENT QUEUESYNCAGENT QueueSyncThread
50718 00000000000015BC 00000000000215BC
PTHREAD$RTL 0 0000000000055E8C 000000007BD3FE8C
PTHREAD$RTL 0 0000000000042C90 000000007BD2CC90
0 0000000000000000 0000000000000000
PTHREAD$RTL ? ?
0 FFFFFFFF80275F14 FFFFFFFF80275F14

The source code of QueueSyncThread line 50718 is:
4 50718 memset(&szSendSocketBuf, DEFINE_NULL, sizeof(szSendSocketBuf));

Could I ignore the SIGILL signal???
Hoff
Honored Contributor

Re: Use Pthread[CMA-F-EXIT_THREAD]

What is the declaration of szSendSocketBuf here?

memset(&szSendSocketBuf, DEFINE_NULL, sizeof(szSendSocketBuf));

I can guess what DEFINE_NULL is, but do confirm that, too.

This memcpy statement may be an innocent bystander, too; the fault could be upstream.

I'd look to examine the values here, and see if I can find some code upstream to set watchpoints or to program the debugger to detect (the debugger supports conditionals) and break on the run-up to the error.

If the buffer is shared among threads, it's possible there's a collision. Interlocking among threads is required; it seems reasonable to expect untoward application behavior when a memset goes flying past when some other thread is busy reading the buffer, for instance.

I prefer to honor errors and signals (and compiler warnings). My general preference here (and unless I have specific reasons to the contrary) is to catch an (unexpected) error and to exit the application with a diagnostic. I've worked with more than a few cases where continuing after an error has shown a nasty habit of contributing to an error avalanche, or to triggering secondary and more subtle errors, Employing "catch and release" programming with errors is certainly possible, but requires great care.
JimsanTsai
Advisor

Re: Use Pthread[CMA-F-EXIT_THREAD]

Dear Hoff~
#define DEF_SEND_MSG_LENGTH 64000
char szSendSocketBuf[DEF_SEND_MSG_LENGTH+1]; /*Send Buffer(Socket)*/

There's only stThreadParam global variable in the application, and there's 16 of QueueSyncThread threads running concurrent.
The thread structure as follows:
----------------------------------------
typedef struct
{
char szStage[10];
char szGlobalSectionName[10];
}stThreadParam;

void* QueueSyncThread(void* pstthparam)
{
char szSendSocketBuf[DEF_SEND_MSG_LENGTH+1];
...(other variables)

/*get pstthparam value to local variable Process*/
/*Read Ini Process*/
/*Socket Connect to another Host*/

/*Initial Variable(memset)*/

while(1)
{
memset(&szSendSocketBuf, DEFINE_NULL, sizeof(szSendSocketBuf));
...(initial other variables)

/*mapping global section1*/
/*mapping global section2*/

/*socket send*/
/*socket recv*/

/*mapping global section3*/
/*mapping global section1*/
}
pthread_exit(NULL);
}
----------------------------------------
I didn't use the mutex in the thread(16 threads do the same things, but mapping different global section).
May I ask all variables in QueueSyncThread is share among threads?