Operating System - OpenVMS
1839313 Members
2749 Online
110138 Solutions
New Discussion

Re: Error activating image ...

 
SOLVED
Go to solution
Klaus Heim
Advisor

Error activating image ...

We ported our Appliction from AXP to IA64. In one of our batch procedures we got the following error message:

$ DISPO_RETOUR :== $KLS$SYSTEM:SIN$DISPO_RETOUR
$ DISPO_RETOUR SO_AUS_NN
%DCL-W-ACTIMAGE, error activating image TPSHR
-CLI-E-IMGNAME, image file DSA0:[TPLIB]TPSHR.EXE;5
-LOADER-E-BADIMGOFF, image offset not within any image section

This is mystery to us, because the image DISPO_RETOUR runs with different input parameters without any error messages. And the shared library TPSHR is ativated by nearly every image from the batch procedure - also without any error message.

There are also the following messages:

-LOADER-E-BAD_DYNTBL, image format error - illformed dynamic table
or
-LOADER-E-BADIMGRELA, unexpected rela type for image

The errors occurs very rarely. However only in our midnight job - which terminates abnormally.

34 REPLIES 34
Hoff
Honored Contributor

Re: Error activating image ...

Check for bogus logical names for the image; make sure you're invoking the image you think you're invoking. (Invoking a DIRECTORY /FILE of the image just before you invoke the image itself is probably a sufficient checksum to determine that.)

I'd also check constituent shareable images, just for completeness.

Check process quotas. When the process quotas are insufficient, the image activation process can be unable to generate decent error messages.

Check that you're current on patches. (There were some bugs in older I64 linkers.) If you're not on V8.3-1H1 with fairly recent UPDATE patches, get there. If you're on anything prior to V8.3, upgrade.

Make sure you're invoking the same image that you've installed, if you have the image installed. (Adding a version number onto an invocation will bypass the installed image processing, and you might be activating a different version of the image depending on how you've specified the invocation.)
H.Becker
Honored Contributor
Solution

Re: Error activating image ...

Hmm, I saw this report in another "forum" some time ago.

Essentially, at the time the image activator needs to relocate or fixup TPSHR, it finds an offset which is not in any image segment (or section if you prefer pre-I64 terminology) of the shareable images activated so far. This offset was read from the "dynamic table". Later, when reading the "dynamic table" the image activator encounters an illegal relocation type. So the image activator reports that something is wrong with or in the "dynamic table".

The dynamic table is part of the image file. It is usually read at activation time and brought into memory. If this "occurs very rarely", something else is going on. So this is very likely no problem in the image activator.

Running with different input parameters may obly be realated, if the image is activated via lib$fis. But that's not the case as you see the %DCL-W-ACTIMAGE.

Although these questions were already asked (and maybe answered in the mean time):
Is the image installed with /header? If it is, how?

As a workaround, I would try to install it /shared=addr. This may not be possible and/or may require more images to be installed that way.
P Muralidhar Kini
Honored Contributor

Re: Error activating image ...

Hi Klaus,

>> This is mystery to us, because the image DISPO_RETOUR runs with
>> different input parameters without any error messages
You mean to say that only when you pass a parameter of "SO_AUS_NN", you
are getting the "DCL-W-ACTIMAGE" messages. But with other input parameter
the program works fine.

What does the program do differently when provided with "SO_AUS_NN" as
the input parameter.

Regards,
Murali
Let There Be Rock - AC/DC
Hein van den Heuvel
Honored Contributor

Re: Error activating image ...

Hartmut>> The dynamic table is part of the image file. It is usually read at activation time and brought into memory.

That would be user writable memory right?


Klaus>> DISPO_RETOUR runs with different input parameters without any error messages.

My guess is that this input values cause some table/pointer to overflow and that a subsequent semi random happens to stomp on this table, setting that up for failure when it is (re)used.

This could be a tedious debug process, and yo may need process dumps.
Hmmm, of you could figure out how to find that table when you may be able to see a before and after image and spot what might have caused it.

It may be more productive to check that 'special value' to try to figure out what makes it special. Does it correspond with more 'objects'/'things'? fewer? zero? does it invoke object that have errors? along those lines...

fwiw,
Hein



H.Becker
Honored Contributor

Re: Error activating image ...

> That would be user writable memory right?

Not really: there is no "user code" running at that time. This is at activation time, no command line input was processed, yet.

Hein van den Heuvel
Honored Contributor

Re: Error activating image ...

Hein> That would be user writable memory right?
Hartmutt> Not really: there is no "user code" running at that time. This is at activation time, no command line input was processed, yet.

Right, but I had thought that the table would be used later on as well, but if it is only used for/during the fixups? Maybe LIB$FIS goes back to it? Dunno right now.

Klaus, is TPSHR linked with the images, or are the functions in TPSHR activated through LIB$FIND_IMAGE_SYMBOL?

fwiw,
Hein

H.Becker
Honored Contributor

Re: Error activating image ...

>Right, but I had thought that the table would be used later on as well, but if it is only used for/during the fixups? Maybe LIB$FIS goes back to it? Dunno right now.

Nope. The dynamic table or dynamic segment is only for the image activator. This segemnt is generated at link time. There is nothing in any object pointing to it. Nobody else knows where it is. (In case the image is not installed /header, the image activator deletes this process section after it is done with it.)

In my point of view, the DCL message excludes lib$fis, here.

Klaus Heim
Advisor

Re: Error activating image ...

Hoff>System is running V8.3-1H1. The latest patches had not been installed. The image/shared library is not installed. We use a foreign command, so we use activate the same image.

H.Becker>The errors are very rarely - but in the night. So we got an emergency call.
What is lib$fis?
The image and the shared library is not installed.
What do you mean with "install it /share=addr"? Install the image "DISPO_RETOUR" or install the shared library TPSHR?

P Muralidhar Kini>No. The problem is not the parameter.

Hein>TPSHR is linked and not activated through LIB$FIND_IMAGE_SYMBOL.

@All>The problem is not only with the shared library TPSHR. There are two other shared libraries SHR.EXE and $TASHR.EXE with the same problem. We had also 5 diffrent batch jobs with this problem. All jobs are similar. They define for each image an foreign command and do there work (call the different images with or without an parameter). The image DISPO_RETOUR is called many times - ex. first with SO_AUS parameter, second with SO_OV_RUS and third SO_AUS_NN and so on. The same image DISPO_RETOUR is activated many times (in one batch job) n times the activation is good - but the next activation goes wrong. The image and the shared library are the same. Nobody does any changes at 1:00 or 2:00 in the night. When the problem occurs we "just" start the batch job and anything is ok. In reality it is not "just" starting - first we must do some "undo" work - but we do no changes at the image or the shared library.

We had no problem on Alpha. We just "compile and link" ported our system. Is there any hint in building shared libraries on IA64?
Hoff
Honored Contributor

Re: Error activating image ...

Hoff>System is running V8.3-1H1. The latest patches had not been installed. The image/shared library is not installed. We use a foreign command, so we use activate the same image.

H.Becker>The errors are very rarely - but in the night. So we got an emergency call.
>What is lib$fis?

shorthand for lib$find_image_symbol. Also often used to refer to the dynld stuff (dlopen, etc) in the C RTL, which is based on the same mechanisms.

>The image and the shared library is not installed.
>What do you mean with "install it /share=addr"? Install the image "DISPO_RETOUR" or install the shared library TPSHR?

If it's not installed, then you can ignore the INSTALL command options here including /SHARE=ADDR

It can be inferred from the file names that different languages might be in play. Is the code using different VMS language settings or locales or different C settings in the different environments?

I'd suggest ringing up HP support. They'll probably want the most recent UPDATE kits and such loaded, of course.

And definitely check for marginal quotas, and other differences in the run-time environments; logical names and quotas can cause weird errors, and the C run-time environment is a forest of subtlety. And batch processes and interactive environments can also have quota differences.

The BADIMGOFF stuff was classically exec images; I don't know if that stuff has changed. (Hartmut would.) Are these images execlets or privileged? (The help here points to the potential use of the LOAD_SYS_IMAGES system parameter for cases of BADIMGOFF code involving Alpha. Might be stale help text, too; this is OpenVMS Alpha V8.3 text I'm looking at.)

H.Becker
Honored Contributor

Re: Error activating image ...

> What is lib$fis?
An abbreviation for LIB$FIND_IMAGE_SYMBOL

>What do you mean with "install it /share=addr"? Install the image "DISPO_RETOUR" or install the shared library TPSHR?

The shareable image, TPSHR (and all shareable images it depends on, when they aren't already installed that way like DECC$SHR and friends). The idea here is to let INSTALL do all the relocation and fixup work at install time. Later, at activation time, there is no need to read the dynamic segment from the disk. This may be a workaround, it is not a solution. This may work but then there may be other errors. It looks like reading the dynamic segment from the disk at activation time is a problem. However, the image file was already read, the ELF tables which let the image activator know where in the file the dynamic segment is. That seems to work. And before relocations are applied, some other image segments were mapped. But here we don't know if their contents is correct. Relocations are applied without much double checking. But I may be on the wrong track, anyway,

> There are two other shared libraries SHR.EXE and $TASHR.EXE with the same problem.

Any idea what these images have in common?
Are there more shareable images (without this problem)?

> n times the activation is good - but the next activation goes wrong.

So n+1 fails, and when you re-run the batch job the n+1 succeeds. Is there an n+2, n+3, ... when n+1 fails (but maybe the job can't continue when one program doesn't produce its output). And is n - or the time - always the same for all the observed failures?

> Nobody does any changes at 1:00 or 2:00 in the night.

But what else is going on in the system or on this disk DSA0 at the same time?

> Is there any hint in building shared libraries on IA64?

For the developer is is 99% the same as on Alpha. To me the problem doesn't look like a migration, compiler, linker or image activator problem.
H.Becker
Honored Contributor

Re: Error activating image ...

> The BADIMGOFF stuff was classically exec images; I don't know if that stuff has changed. (Hartmut would.) Are these images execlets or privileged? (The help here points to the potential use of the LOAD_SYS_IMAGES system parameter for cases of BADIMGOFF code involving Alpha. Might be stale help text, too; this is OpenVMS Alpha V8.3 text I'm looking at.)

On I64 the image activator and system loader share the code for applying relocations and fixups. So it is "normal" to have a "LOADER" message in this case.
Hoff
Honored Contributor

Re: Error activating image ...

> So n+1 fails, and when you re-run the batch job the n+1 succeeds. Is there an n+2, n+3, ... when n+1 fails (but maybe the job can't continue when one program doesn't produce its output). And is n - or the time - always the same for all the observed failures?

That behavior is often indicative of a leak. An I/O channel leak. A memory leak. A quota leak. Something. Is there (for instance) a whole pile of DCL I/O channels or logical names or symbols being built up within the context of the batch process?

Can you set up a second process that (for instance) issues SDA or DCL commands for SHOW PROCESS to watch and to log the batch process? (There's a quota-monitoring DCL tool around that would be a good starting point for this widget.) You're particularly looking for any "stuff" that's depleted over time.
John McL
Trusted Contributor

Re: Error activating image ...

In a search for clues ...

What other images are activated at the same time as the activation of DISPO_RETOUR?

In particular I'm wonder if there's some other shareable image that might be causing this problem. No offense, but if there's an in-house image among them then I'd be looking very closely at it. And are those VMS shareables up to date?

Also, have all the images (i.e. DISPO_RETOUR and the shareables) been recompiled and relinked on IA64 or have you used a translation tool? I've not used them but generally these tools seem fine for getting things running on IA64 but are not ideal. If you are using them (it?) maybe you've got some really odd code that wasn't handled correctly.

Talking of possibly odd stuff, is any of your code getting deep into VMS, e.g. by linking with the system symbol table?

I'm just casting around looking for any possible source of wrong data or data corruption.
John McL
Trusted Contributor

Re: Error activating image ...

Another long-shot or two ...

The fact that it only happens in the midnight job is intriguing. Could there be something on the system that might be "rolling over" around that time and interfers with your job? I'm thinking mainly of system-wide logical name TPSHR (or DSA0) being redefined briefly during that rollover.

Are you running a mixed architecture cluster and somehow accessing an Alpha image when it should be an IA64 image?
John Gillings
Honored Contributor

Re: Error activating image ...

>$ DISPO_RETOUR SO_AUS_NN
>%DCL-W-ACTIMAGE, error activating image TPSHR
>-CLI-E-IMGNAME, image file DSA0:[TPLIB]TPSHR.EXE;5
>-LOADER-E-BADIMGOFF, image offset not within any image section

Is this the complete message, or is it followed by a stack dump? If that's all there is, chances are this is an issue on initial activation. If there's stack dump, the shareable image may be activated with LIB$FIND_IMAGE_SYMBOL.

It may be instructive to try to reproduce the error with SET WATCH/CLASS=MAJOR to trace the sequence of activations.
A crucible of informative mistakes
Pramod Kumar M
Advisor

Re: Error activating image ...

I wonder why Activation is failing only in case of $ DISPO_RETOUR SO_AUS_NN.

1. Are you running this command on the same system you run the other commands?

2. I hope you don't change the logicals in your script for different DISPO_RETOUR executions?

%DCL-W-ACTIMAGE, error activating image TPSHR indicates that there could be problem with the shareable images on which TPSHR depends on.

Use the following command

$anal/image/segm=all/out=TPSHR_EXE.ANL TPSHR.EXE

Then verify whether the fixups inside TPSHR.EXE are good.

Also it is required to understand whether all the images are recompiled and relinked while porting or are there any translated images. Generally I believe with recompiled, relinked images these kind of issues should not arise.

-Pramod.


Klaus Heim
Advisor

Re: Error activating image ...

Hoff>We use PASCAL (95-98%). Only few functions are in C or MACRO. The Problem is reported to HP support - but we got no help until now.
The shared library routines only contains user mode functions (no execlets or privileged).
How to check the quotas? Why a quota problem?

H.Becker>The sequence in the batch job is important. When we re-run the batch job completes. Neither n, the time or any thing else is the same.

>But what es is going on in the system or on this disk DSA0 at the same time?
TPSHR.EXE is on DSA0 (sytem disk). The to other shared libraries are on DSA3. We didn't know whats going on the system at that time. May be some cleanup jobs.

>For the developer it is 99% the same as on Alpha.
Why is it not an image activator problem? The activation of an shared library is not possible (with an curious error message). Without any intervention the next activation is good. Thats the point I didn't understand.

John McL>All images and all shared libraries ar recompiled - no translation tool at all. Odd stuff: Okay TPSHR is linked with system symbol table. But it is possible to remove this. And the other shared libraries had no odd stuff (I hope).
We use an IA64 cluster with 2 nodes plus 1 quorum node - no mixed architeture.

John Gillings>Thats the complete error message. No stack dump, no traceback info. Only the LOADER-E-xxx message and the shared library name are different.
How to use the set watch command?

Pramod Kumar M>The batch procedure runs on one system. All images are activated in the batch procedure. The logicals are not changed in the batch procedure. Ana/ima is always without error. Everything is recompiled and relinked - no translation tool. When we re-run without any correction everything is ok.

@All>The problem is not only with TPSHR there are 2 other shared libraries with the same error. The image DISPO_RETOUR is not the only image with this problem. We had a lot of images with this problem. All images run in batch mode. We got the problem 5 times. 4 times in the midnight jobs and 1 time during the day.
The images are linked with 2 other shared libraries "without" activation error. But I didn't see similarity. TPSHR and SHR are common libraries. $TASHR contains database transactions. LVS$TASHR.EXE and FHSHR.EXE were currently not reported by image activation. These new libraries are also database transaction or the global database "FileHandler" (FHSHR.EXE).
H.Becker
Honored Contributor

Re: Error activating image ...

> Why is it not an image activator problem? The activation of an shared library is not possible (with an curious error message). Without any intervention the next activation is good. Thats the point I didn't understand.

The image activator is used a lot. I never heard of any such problem. The image activator just reports - more or less - that the data read from the image file is corrupt. But the next time it reads the data it is OK. This doesn't look like a problem in the image activator. Additionally, this data is read-only.

>Ana/ima is always without error.

This is no surprise. The next activation is without any error. A static tool like analyze can't help, here. Even if it shows no error, analyze/image is not a tool to verify the image for consistancy. It only formats the contents. You may want to format all of the contents. So you need /segm=all /section=all. But a wrong offset would not be caught. An incorrect image relocation type may show.
Hein van den Heuvel
Honored Contributor

Re: Error activating image ...

I like John's suggestion of SET WATCH.
It may be 'noisy', but it could help in case the image activator reports the wrong image name. I believe that used to, and might still, happen when the image activator activates images further down. (Hartmut will correct me if need be :-)

You have to have CMKRNL privs to use it.
$ SET PROC/PRIV=CMKRNL
$ SET WATCH FILE/CLASS=MAJOR
$ ... run your suspect stuff ...
$ SET WATCH FILE/CLASS=NONE


Along the same lines i might want to try ti TRACE IOs, with with the XFC or toss in the LD Driver ( http://www.digiater.nl/lddriver.html )
Specifically check out LD CONNECT /REPLACE

Upon failure, stop activity best you can can snarf a trace.

The images activator code just works. It works all day every day without 'surprise' failure, everywhere an on your site. Untill proven otherwise we have to assume it is presented incorrect data to work with or is not handling IO errors correctly. A file trace, or IO trace _might_ help you find that.

Hartmut, yes this should not be LIB$FIS considering it is DCL that's complaing.

Hein
H.Becker
Honored Contributor

Re: Error activating image ...

> in case the image activator reports the wrong image name

As far as I can see, the error shows in relocation of a shareable image. At that time the reported image name is correct. An incorrect image name may be reported in processing after mapping and relocating all the images. Then the last mapped image name is printed for any error and that may not be correct.
labadie_1
Honored Contributor

Re: Error activating image ...

Hein

>>>You have to have CMKRNL privs to use it.

Unless this has changed, you need CMEXEC to use it, which is not a highest privilege as CMEXEC (which quickly givess you all the privileges)
Hein van den Heuvel
Honored Contributor

Re: Error activating image ...

Right, I guess it's 'just' CMEXEC that's needed, but that is still the same as 'all' in my book.

__int64 all=-1;
int args[] = { 4,1,(int)&all), 1, 0 };
sys$cmexec (&sys$setprv, args);

Hein.
John McL
Trusted Contributor

Re: Error activating image ...

Could it be a physical memory fault and this job at midnight, with its specific job mix, just hits the wrong spot? Have you tried running your midnight batch job on the other machine?
John Gillings
Honored Contributor

Re: Error activating image ...

>you need CMEXEC to use it

Klaus,
Sorry I should have posted commands to turn it on and off... As Hein showed:

$ SET WATCH/CLASS=MAJOR FILE ! ON

$ SET WATCH/CLASS=NONE FILE ! OFF

For even more verbose output use /CLASS=ALL, but for this issue MAJOR should be more than sufficient.

You need CMEXEC or CMKRNL, but if there's an issue granting privilege to the account, another option is to install the image:

$ INSTALL SYS$SYSTEM:SETWATCH /PRIVILEGE=CMKRNL

(or CMEXEC). Not much risk, as all it can do is enable or disable watch.

One caveat.. although it's been around forever, this is NOT a supported utility. Rumour has it that it can crash systems (though I've been using it for more than 20 years, and never seen a proven case). The vulnerable time is allegedly process rundown, so make sure you SET WATCH/CLASS=NONE before logging off. (that warning has been around since VAX days, so it's entirely possible that any potential bugchecks have been fixed).
A crucible of informative mistakes