Operating System - OpenVMS
1752780 Members
6386 Online
108789 Solutions
New Discussion юеВ

Re: Error activating image ...

 
SOLVED
Go to solution
H.Becker
Honored Contributor

Re: Error activating image ...

> What is lib$fis?
An abbreviation for LIB$FIND_IMAGE_SYMBOL

>What do you mean with "install it /share=addr"? Install the image "DISPO_RETOUR" or install the shared library TPSHR?

The shareable image, TPSHR (and all shareable images it depends on, when they aren't already installed that way like DECC$SHR and friends). The idea here is to let INSTALL do all the relocation and fixup work at install time. Later, at activation time, there is no need to read the dynamic segment from the disk. This may be a workaround, it is not a solution. This may work but then there may be other errors. It looks like reading the dynamic segment from the disk at activation time is a problem. However, the image file was already read, the ELF tables which let the image activator know where in the file the dynamic segment is. That seems to work. And before relocations are applied, some other image segments were mapped. But here we don't know if their contents is correct. Relocations are applied without much double checking. But I may be on the wrong track, anyway,

> There are two other shared libraries SHR.EXE and $TASHR.EXE with the same problem.

Any idea what these images have in common?
Are there more shareable images (without this problem)?

> n times the activation is good - but the next activation goes wrong.

So n+1 fails, and when you re-run the batch job the n+1 succeeds. Is there an n+2, n+3, ... when n+1 fails (but maybe the job can't continue when one program doesn't produce its output). And is n - or the time - always the same for all the observed failures?

> Nobody does any changes at 1:00 or 2:00 in the night.

But what else is going on in the system or on this disk DSA0 at the same time?

> Is there any hint in building shared libraries on IA64?

For the developer is is 99% the same as on Alpha. To me the problem doesn't look like a migration, compiler, linker or image activator problem.
H.Becker
Honored Contributor

Re: Error activating image ...

> The BADIMGOFF stuff was classically exec images; I don't know if that stuff has changed. (Hartmut would.) Are these images execlets or privileged? (The help here points to the potential use of the LOAD_SYS_IMAGES system parameter for cases of BADIMGOFF code involving Alpha. Might be stale help text, too; this is OpenVMS Alpha V8.3 text I'm looking at.)

On I64 the image activator and system loader share the code for applying relocations and fixups. So it is "normal" to have a "LOADER" message in this case.
Hoff
Honored Contributor

Re: Error activating image ...

> So n+1 fails, and when you re-run the batch job the n+1 succeeds. Is there an n+2, n+3, ... when n+1 fails (but maybe the job can't continue when one program doesn't produce its output). And is n - or the time - always the same for all the observed failures?

That behavior is often indicative of a leak. An I/O channel leak. A memory leak. A quota leak. Something. Is there (for instance) a whole pile of DCL I/O channels or logical names or symbols being built up within the context of the batch process?

Can you set up a second process that (for instance) issues SDA or DCL commands for SHOW PROCESS to watch and to log the batch process? (There's a quota-monitoring DCL tool around that would be a good starting point for this widget.) You're particularly looking for any "stuff" that's depleted over time.
John McL
Trusted Contributor

Re: Error activating image ...

In a search for clues ...

What other images are activated at the same time as the activation of DISPO_RETOUR?

In particular I'm wonder if there's some other shareable image that might be causing this problem. No offense, but if there's an in-house image among them then I'd be looking very closely at it. And are those VMS shareables up to date?

Also, have all the images (i.e. DISPO_RETOUR and the shareables) been recompiled and relinked on IA64 or have you used a translation tool? I've not used them but generally these tools seem fine for getting things running on IA64 but are not ideal. If you are using them (it?) maybe you've got some really odd code that wasn't handled correctly.

Talking of possibly odd stuff, is any of your code getting deep into VMS, e.g. by linking with the system symbol table?

I'm just casting around looking for any possible source of wrong data or data corruption.
John McL
Trusted Contributor

Re: Error activating image ...

Another long-shot or two ...

The fact that it only happens in the midnight job is intriguing. Could there be something on the system that might be "rolling over" around that time and interfers with your job? I'm thinking mainly of system-wide logical name TPSHR (or DSA0) being redefined briefly during that rollover.

Are you running a mixed architecture cluster and somehow accessing an Alpha image when it should be an IA64 image?
John Gillings
Honored Contributor

Re: Error activating image ...

>$ DISPO_RETOUR SO_AUS_NN
>%DCL-W-ACTIMAGE, error activating image TPSHR
>-CLI-E-IMGNAME, image file DSA0:[TPLIB]TPSHR.EXE;5
>-LOADER-E-BADIMGOFF, image offset not within any image section

Is this the complete message, or is it followed by a stack dump? If that's all there is, chances are this is an issue on initial activation. If there's stack dump, the shareable image may be activated with LIB$FIND_IMAGE_SYMBOL.

It may be instructive to try to reproduce the error with SET WATCH/CLASS=MAJOR to trace the sequence of activations.
A crucible of informative mistakes
Pramod Kumar M
Advisor

Re: Error activating image ...

I wonder why Activation is failing only in case of $ DISPO_RETOUR SO_AUS_NN.

1. Are you running this command on the same system you run the other commands?

2. I hope you don't change the logicals in your script for different DISPO_RETOUR executions?

%DCL-W-ACTIMAGE, error activating image TPSHR indicates that there could be problem with the shareable images on which TPSHR depends on.

Use the following command

$anal/image/segm=all/out=TPSHR_EXE.ANL TPSHR.EXE

Then verify whether the fixups inside TPSHR.EXE are good.

Also it is required to understand whether all the images are recompiled and relinked while porting or are there any translated images. Generally I believe with recompiled, relinked images these kind of issues should not arise.

-Pramod.


Klaus Heim
Advisor

Re: Error activating image ...

Hoff>We use PASCAL (95-98%). Only few functions are in C or MACRO. The Problem is reported to HP support - but we got no help until now.
The shared library routines only contains user mode functions (no execlets or privileged).
How to check the quotas? Why a quota problem?

H.Becker>The sequence in the batch job is important. When we re-run the batch job completes. Neither n, the time or any thing else is the same.

>But what es is going on in the system or on this disk DSA0 at the same time?
TPSHR.EXE is on DSA0 (sytem disk). The to other shared libraries are on DSA3. We didn't know whats going on the system at that time. May be some cleanup jobs.

>For the developer it is 99% the same as on Alpha.
Why is it not an image activator problem? The activation of an shared library is not possible (with an curious error message). Without any intervention the next activation is good. Thats the point I didn't understand.

John McL>All images and all shared libraries ar recompiled - no translation tool at all. Odd stuff: Okay TPSHR is linked with system symbol table. But it is possible to remove this. And the other shared libraries had no odd stuff (I hope).
We use an IA64 cluster with 2 nodes plus 1 quorum node - no mixed architeture.

John Gillings>Thats the complete error message. No stack dump, no traceback info. Only the LOADER-E-xxx message and the shared library name are different.
How to use the set watch command?

Pramod Kumar M>The batch procedure runs on one system. All images are activated in the batch procedure. The logicals are not changed in the batch procedure. Ana/ima is always without error. Everything is recompiled and relinked - no translation tool. When we re-run without any correction everything is ok.

@All>The problem is not only with TPSHR there are 2 other shared libraries with the same error. The image DISPO_RETOUR is not the only image with this problem. We had a lot of images with this problem. All images run in batch mode. We got the problem 5 times. 4 times in the midnight jobs and 1 time during the day.
The images are linked with 2 other shared libraries "without" activation error. But I didn't see similarity. TPSHR and SHR are common libraries. $TASHR contains database transactions. LVS$TASHR.EXE and FHSHR.EXE were currently not reported by image activation. These new libraries are also database transaction or the global database "FileHandler" (FHSHR.EXE).
H.Becker
Honored Contributor

Re: Error activating image ...

> Why is it not an image activator problem? The activation of an shared library is not possible (with an curious error message). Without any intervention the next activation is good. Thats the point I didn't understand.

The image activator is used a lot. I never heard of any such problem. The image activator just reports - more or less - that the data read from the image file is corrupt. But the next time it reads the data it is OK. This doesn't look like a problem in the image activator. Additionally, this data is read-only.

>Ana/ima is always without error.

This is no surprise. The next activation is without any error. A static tool like analyze can't help, here. Even if it shows no error, analyze/image is not a tool to verify the image for consistancy. It only formats the contents. You may want to format all of the contents. So you need /segm=all /section=all. But a wrong offset would not be caught. An incorrect image relocation type may show.
Hein van den Heuvel
Honored Contributor

Re: Error activating image ...

I like John's suggestion of SET WATCH.
It may be 'noisy', but it could help in case the image activator reports the wrong image name. I believe that used to, and might still, happen when the image activator activates images further down. (Hartmut will correct me if need be :-)

You have to have CMKRNL privs to use it.
$ SET PROC/PRIV=CMKRNL
$ SET WATCH FILE/CLASS=MAJOR
$ ... run your suspect stuff ...
$ SET WATCH FILE/CLASS=NONE


Along the same lines i might want to try ti TRACE IOs, with with the XFC or toss in the LD Driver ( http://www.digiater.nl/lddriver.html )
Specifically check out LD CONNECT /REPLACE

Upon failure, stop activity best you can can snarf a trace.

The images activator code just works. It works all day every day without 'surprise' failure, everywhere an on your site. Untill proven otherwise we have to assume it is presented incorrect data to work with or is not handling IO errors correctly. A file trace, or IO trace _might_ help you find that.

Hartmut, yes this should not be LIB$FIS considering it is DCL that's complaing.

Hein