Operating System - OpenVMS
1826372 Members
3562 Online
109692 Solutions
New Discussion

Problem with application that use PPL (%PPL-W-SYSERROR)

 
SOLVED
Go to solution

Problem with application that use PPL (%PPL-W-SYSERROR)

Hi guys,
I have application that use PPL (Parallel programming library - shared image PPLRTL.EXE).
This application works on every OpenVMS cluster node (3 nodes at all).
It worked fine till today. Today at morning our sysadmins restarted one node and failed to start this application. After that next node gone down and everything repeated - application couldn't start.
At this moment we have working application only on one node (this application was started two weeks ago).
To clarify problem I tried to repeat this problem with two simple applications, that communicate using PPL.
The first application (master) produced this log:
OMNITM $ r ppl_send1
%PPL-W-SYSERROR, system service error
-SYSTEM-W-VALNOTVALID, value block is not valid
%TRACE-W-TRACEBACK, symbolic stack dump follows
image module routine line rel PC abs PC
PPLRTL 0 00000000000338FC 00000000000858FC
PPLRTL 0 0000000000033B20 0000000000085B20
PPLRTL 0 0000000000029D50 000000000007BD50
PPLRTL 0 0000000000029C40 000000000007BC40
PPLRTL 0 0000000000029AD0 000000000007BAD0
PPL_SEND1 PPL_SEND1 main 8027 0000000000000128 0000000000030128
PPL_SEND1 PPL_SEND1 __main 0 0000000000000068 0000000000030068
0 FFFFFFFF8029163C FFFFFFFF8029163C
%PPL-W-SYSERROR, system service error
-SYSTEM-E-DEADLOCK, deadlock detected
%TRACE-W-TRACEBACK, symbolic stack dump follows
image module routine line rel PC abs PC
PPLRTL 0 00000000000338FC 00000000000858FC
PPLRTL 0 0000000000033FCC 0000000000085FCC
PPLRTL 0 000000000002A094 000000000007C094
PPLRTL 0 0000000000029DEC 000000000007BDEC
PPLRTL 0 0000000000029C40 000000000007BC40
PPLRTL 0 0000000000029AD0 000000000007BAD0
PPL_SEND1 PPL_SEND1 main 8027 0000000000000128 0000000000030128
PPL_SEND1 PPL_SEND1 __main 0 0000000000000068 0000000000030068
0 FFFFFFFF8029163C FFFFFFFF8029163C
%PPL-W-SYSERROR, system service error
-SYSTEM-E-DEADLOCK, deadlock detected
%TRACE-W-TRACEBACK, symbolic stack dump follows
image module routine line rel PC abs PC
PPLRTL 0 00000000000338FC 00000000000858FC
PPLRTL 0 0000000000033FCC 0000000000085FCC
PPLRTL 0 00000000000314B8 00000000000834B8
PPLRTL 0 000000000002A834 000000000007C834
PPLRTL 0 000000000002A6E4 000000000007C6E4
PPLRTL 0 000000000002A200 000000000007C200
PPLRTL 0 0000000000029DEC 000000000007BDEC
PPLRTL 0 0000000000029C40 000000000007BC40
PPLRTL 0 0000000000029AD0 000000000007BAD0
PPL_SEND1 PPL_SEND1 main 8027 0000000000000128 0000000000030128
PPL_SEND1 PPL_SEND1 __main 0 0000000000000068 0000000000030068
0 FFFFFFFF8029163C FFFFFFFF8029163C
Interrupt

OMNITM $


Our sysadmins say that nothing changed after last application start.
Please help me!!! Why every application that use PPL doesn't work any more?

Sorry for the very long story.
Thanks in advance.
Sergejus
10 REPLIES 10
Ian Miller.
Honored Contributor

Re: Problem with application that use PPL (%PPL-W-SYSERROR)

I notice the SYSTEM-W-VALNOTVALID and wonder if there is a cluster wide lock connected with your application or PPL which has a value block. I also note the SYSTEM-E-DEADLOCK errors which also suggest a lock problem.

Clearing all the locks may need a cluster wide application shutdown - an unpopular option I expect.
____________________
Purely Personal Opinion

Re: Problem with application that use PPL (%PPL-W-SYSERROR)

Thanks, Ian.
Sorry, I assigned only 5 points for your advice, because we can't stop the last server instance alive. It is business critical application, so we can't stop it now. Maybe we will do that at night.
Again, thank you very much for your reply!

Sergejus
Ian Miller.
Honored Contributor

Re: Problem with application that use PPL (%PPL-W-SYSERROR)

Sergejus, can you investigate the locks held by the currently running app ?

ANAL/SYS

SHOW PROC/ID=xx/LOCK
____________________
Purely Personal Opinion

Re: Problem with application that use PPL (%PPL-W-SYSERROR)

Thanks once more, Ian.
I didn't use SDA and our admins too. We executed this command, but we can't interpret results, because it's seems that everything is Ok with locks, created by our process. Could you please help us what lock statuses or some other info we need to look for in SDA output.
Many thanks.
Sergejus
John Gillings
Honored Contributor

Re: Problem with application that use PPL (%PPL-W-SYSERROR)

Sergejus,

I agree with Ian. You have an instance of the application running on one node in the cluster that is holding locks in a state that is incompatible with starting another instance.

Note that the PPL library is no longer supported (and hasn't been for about 10 years!) Its functionality has been replaced by threads (whatever the marketing name is today, DECthreads, pthreads, etc...).

I've seen a few previous reports of this type of error for PPL applications. Unfortunately no one ever did any real analysis, they just rebooted nodes until the application started.

One thing you might want to try (long shot). The VALNOTVALID is probably being signalled from PPL$CREATE_APPLICATION. I's -W- (Warning) so you can continue. Try calling PPL$CREATE_APPLICATION a second time, it *might* succeed. Your code should probably have an exception handler to detect the signals.

If you don't know how to interpret lock traces, I recommend you log a case with your local customer support centre and have a specialist help you. This is not something that can be dealt with in this type of forum.

If it's not possible to log a case then the quickest "solution" is probably a full cluster reboot (as VMS people, we don't often stoop to MS style solutions, and I hesitate to mention it as there are almost certainly alternatives, but in this case it's pretty much guaranteed to work).

Long term I'd be recommending you migrate your code from PPL to threads.
A crucible of informative mistakes
Ian Miller.
Honored Contributor

Re: Problem with application that use PPL (%PPL-W-SYSERROR)

wouldn't a cluster wide shutdown of the application clear the locks?
____________________
Purely Personal Opinion
John Gillings
Honored Contributor
Solution

Re: Problem with application that use PPL (%PPL-W-SYSERROR)

Ian,

>wouldn't a cluster wide shutdown of the
>application clear the locks?

Yes, it should, assuming user mode locks. I'm fairly sure PPL is strictly user mode. But, if there are system locks, and shutting down the application doesn't help, then a cluster reboot will be required.
A crucible of informative mistakes

Re: Problem with application that use PPL (%PPL-W-SYSERROR)

Ian, John - thanks for your help.
Today at night last alive application on the was stopped. After that it was started on all nodes (3 nodes) without any problem. No node or cluster reboot was made. So, everything completed successfully.
Now, I would like to tell few words about PPL. I know that it is no longer supported, but I found it very convenient to develop software that need interprocess communication. Also I am using pthreads, but I as I understand pthreads can't deal with different processes. It is about one process. Of, course I can use lib$insqhi/ti and global sections, but it is already realized in PPL. So, there are main reasons why I am using PPL.

My best regards
Sergejus
Ian Miller.
Honored Contributor

Re: Problem with application that use PPL (%PPL-W-SYSERROR)

Have a look at the VMS Programming Concepts Manual
http://h71000.www7.hp.com/doc/82FINAL/5841/5841PRO.HTML

Its gives an good overiew of the many wonderful things in VMS including interprocess communication methods and data sharing.

I think not enough people read it.
____________________
Purely Personal Opinion

Re: Problem with application that use PPL (%PPL-W-SYSERROR)

I have discovered a solution to a problem.