Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Memory dump agan.

SOLVED
Go to solution
Darijo
Frequent Advisor

Memory dump agan.

Ok, so we got "new" alpha 1000 and the application which was working with < DIGITAL 21164 PICMG SBC 5/500> configuration, now is not working with .
The whole process of installation goes well but first time I boot the machine I get memory dump when it tries to load that application???

And then CPU halts and goes back to SRM....
Can I bypass init scripts and start application manually?

Problem/output:

-----------------------------------------------

-----------------Starting MCC Version 008

Job BOOTCHECK (queue SYS$BATCH, entry 3) started on SYS$BATCH

End of AEC_STARTUP at 16:20:50.28

Job STARTUP$1 (queue SYS$BATCH, entry 4) pending
pending status caused by queue busy state

**** OpenVMS (TM) Alpha Operating System V7.1-1H2 - BUGCHECK ****
** Bugcheck code = 000001CC: INVEXCEPTN, Exception while above ASTDEL
** Crash CPU: 00 Primary CPU: 00 Active CPUs: 00000001
** Current Process = BATCH_2
** Image Name = MZR1P1$DKB0:[SYS0.SYSCOMMON.][SYSEXE]SYSMAN.EXE
**** Starting compressed selective memory dump at 7-JUN-2008 16:21...
...........................................................
...Complete ****
------------------------------
35 REPLIES
Andy Bustamante
Honored Contributor

Re: Memory dump agan.

>>> Can I bypass init scripts and start application manually?

That depends. We don't know the application and what it's environment should be.

An Alphaserver 1000 is fairly dated system and it appears you're running VMS 7.2-1H2. First thing I would start with is with "$ analyse/crash" and see if there's any useful data in the crash dump. Second, check patches on the "old" 1000 against the "new" 1000. User account privileges?

If all this doesn't provide a useful result, I would shut the 1000 down to SRM and running console diagnostics, memexer for example. New old hardware sometimes takes a bit of debugging.

Andy
If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Darijo
Frequent Advisor

Re: Memory dump agan.

>>>That depends. We don't know the application and what it's environment should be.

Actually there are more processes making that application...

>>>An Alphaserver 1000 is fairly dated system and it appears you're running VMS 7.2-1H2.

But shouldn't it depend on the processor's ISA???

>>>First thing I would start with is with "$ analyse/crash" and see if there's any useful data in the crash dump.

I cant access VMS because of that crash...

>>Second, check patches on the "old" 1000 against the "new" 1000. User account privileges?

This is the first time I have used AlphaServer 1000 to replace that newer configuration.
Thing is that we have one special fibre optic adapter(ISA/EISA) which uses none standard protocols/vendor specific protocols for communication with some machines. Therefor we cannot replace it with newer Alphas with PCI sys. bus and our supplier is asking for enormous amount of money for < DIGITAL 21164 PICMG SBC 5/500>.
So we found couple of old Alphas in the storage to replace it with <21164 PICMG SBC 5/500>

Bojan Nemec
Honored Contributor

Re: Memory dump agan.

Darijo,

You can do a "conversational boot"

>>> boot -fl 0,1
SYSBOOT> SET STARTUP_P1 "MIN"
SYSBOOT> CONTINUE

You can add the device name at the end of the boot command.
This will boot the VMS but will not execute the systartup_vms.com.

When you are finished you must return the value of the STARTUP_P1 to "":

$ RUN SYS$SYSTEM:SYSGEN
SYSGEN> USE CURRENT
SYSGEN> SET STARTUP_P1 ""
SYSGEN> WRITE CURRENT

Bojan
Darijo
Frequent Advisor

Re: Memory dump agan.

thank you Bojan...I'll try this first thing in the morning when I get back to work.

Hoff
Honored Contributor
Solution

Re: Memory dump agan.

[[[[>>>That depends. We don't know the application and what it's environment should be.

Actually there are more processes making that application...]]]]

It's a batch job which is blowing, which means it's something you're doing in or after startup.

[[[>>>An Alphaserver 1000 is fairly dated system and it appears you're running VMS 7.2-1H2.]]]]

Technically, the quite buggy V7.1-1H2 release. V7.1-2 was released as a way to roll up and install all of the ECO kits that existed for V7.1, V7.1-1H1 and V7.1-1H2, and PCSI was implemented as a way to better manage ECOs.

[[[But shouldn't it depend on the processor's ISA???]]]]

The instruction set is relevant, and so are the details of the system platform configuration; the devices and hardware that are (often uniquely) involved in the platform. There was far more to booting a new Alpha platform than which EV processor was used.

Here, the official support is "v6.2-1H3, or v7.1 or later", which means OpenVMS Alpha should work on this box.

[[[>>>First thing I would start with is with "$ analyse/crash" and see if there's any useful data in the crash dump.

I cant access VMS because of that crash...]]]

Sure you can. Swap the disk over and use the other box to analyze the crash.

>>Second, check patches on the "old" 1000 against the "new" 1000. User account privileges?

[[[[This is the first time I have used AlphaServer 1000 to replace that newer configuration.]]]]

You're technically using an AlphaServer 1000A 5/333 here, based on what I see listed in this thread -- there are some differences between the AlphaServer 1000 and AlphaServer 1000A boxes. Graphics support is better, IIRC.

[[[Thing is that we have one special fibre optic adapter(ISA/EISA) which uses none standard protocols/vendor specific protocols for communication with some machines. Therefor we cannot replace it with newer Alphas with PCI sys. bus and our supplier is asking for enormous amount of money for < DIGITAL 21164 PICMG SBC 5/500>.
So we found couple of old Alphas in the storage to replace it with <21164 PICMG SBC 5/500>]]]]

I'd be willing to bet that the device driver for that device is what is blowing up here, too. It's probably a SYSMAN I/O connect in that batch job, and if you wander up the stack, you'll find it's connecting the driver.

Do you have specifications and/or source code driver for the host view of the adapter? Or is it a complete buy-out? Debugging existing or writing a new driver can range from easy to ugly. Specs and/or samples make the process a whole lot easier.

Stephen Hoffman
HoffmanLabs LLC

Darijo
Frequent Advisor

Re: Memory dump agan.

---------------------------------------------
>>>I'd be willing to bet that the device driver for that device is what is blowing up here, too.
---------------------------------------------

I was thinking the same thing...

---------------------------------------------
>>>Do you have specifications and/or source code driver for the host view of the adapter?
---------------------------------------------

No I don't...
The whole package comes on one CD (VMS+Application Kit) and everything is installed together.
Sadly I'm not VMS expert nor we have one here.
Too bad there is an ocean between us :)

Anyway the vendor of that adapter is some german firm which doesn't provide any support for that HW. So I guess is specifically made for needs of these machines.

---------------------------------------------
>>>Debugging existing or writing a new driver can range from easy to ugly. Specs and/or samples make the process a whole lot easier.
---------------------------------------------

Ufff...I'm not paid that good ;)
...but would love to check things under the hood.

So what are my options?
I don't know does this make any difference but that adapter was on ISA slot and now is connected on EISA...and if I remember that HW made for ISA are also compatible whit 32bit EISA...

Alos I did this:

---------------------------------------------
$ dir *.dmp*

Directory SYS$SYSROOT:[SYSEXE]

SYS$ERRLOG.DMP;1 SYSDUMP.DMP;1

Total of 2 files.
$ analyse/crash
_Dump File: SYSDUMP.DMP;1



OpenVMS (TM) Alpha system dump analyzer
...analyzing a compressed selective memory dump...

%SDA-W-SDALINKMISM, link time of SYS$BASE_IMAGE built into SDA$SHARE (19-OCT-199
8 23:37) does not match link time of image in system dump (20-OCT-1998 11:36)
Dump taken on 7-JUN-2008 16:21:48.32
INVEXCEPTN, Exception while above ASTDEL

SDA>

---------------------------------------------










Darijo
Frequent Advisor

Re: Memory dump agan.

Well...if if helps:

SDA> show stack

Process Stacks (on CPU 00)
--------------------------
Current Operating Stack (KERNEL):
00000000.7FFA1C08 00000000.00001100 UCB$M_UNLOAD+0010
0
00000000.7FFA1C10 00000000.00000001
00000000.7FFA1C18 00000000.00000002
00000000.7FFA1C20 FFFFFFFF.FFE040B8
SP => 00000000.7FFA1C28 00000000.7FFA1DF8
00000000.7FFA1C30 00000000.7FFA1D48
00000000.7FFA1C38 00000000.000001C8
00000000.7FFA1C40 00000000.00000050
00000000.7FFA1C48 00000000.00000210 BUG$_MACHINECHK
00000000.7FFA1C50 00000000.002A86CC
00000000.7FFA1C58 00000000.00000000
00000000.7FFA1C60 00000000.00000009
00000000.7FFA1C68 FFFFFFFF.80C301F8 MMG$ALLOC_SVA_MAP
00000000.7FFA1C70 FFFFFFFF.00000250 BUG$_NETRCVPKT
00000000.7FFA1C78 00000000.00000001
CHF$IS_MCH_ARGS 00000000.7FFA1C80 00000000.0000002C
CHF$PH_MCH_FRAME 00000000.7FFA1C88 00000000.7FFA1E90


Press RETURN for more.
SDA>
John Travell
Valued Contributor

Re: Memory dump agan.

Darijo,
There is something you can do that will help us to help you. It is quite likely to confirm the suspicion that the crash occurs when SYSMAN configures the device driver for your fibre optic adapter.
At the SDA> prompt do:
SDA> set out sys$login:cluecrash.txt
SDA> clue crash
SDA> clue stack
SDA> clue register
SDA> clue config
SDA> set out tt:
SDA> Exit
then post cluecrash.txt as an attachment.

You may get suggestions to upgrade, but unless your vendor has shown that their driver works on later versions of VMS you may be stuck on V7.1-*.
While not a universal truth, if it works on V7.1-1H2, then generally it will also work on V7.1-2.

Question to everyone: Does anyone have any information on how the IO mapping is done on the particular SBC this fibre optic adapter works in as compared to the Alphaserver1000(a?). Could the driver be trying to access registers that are at different relative addresses in the 'new' machine ?
JT:
Darijo
Frequent Advisor

Re: Memory dump agan.

John,

I'm currently running VMS without any services/drivers...so I cant't transfer via ftp to my laptop to send an attachment.

Also I did everything like you told me to but it wont output command 'SDA>clue register' to a file since that command doesn't exist on this version of VMS.
I looked in help for clue....bur nothing.



I tried to start UCX manually to get networking working but then I also get dump...this time on some UCX$INET_ACP process???

--------------------------------------------------------------------
$ @UCX$STARTUP.COM;1
%JBC-E-JOBQUEDIS, system job queue manager is not running
%JBC-E-JOBQUEDIS, system job queue manager is not running
%UCX$PPP-I-INFO, Loading PPP Drivers and CallBack
%RUN-S-PROC_ID, identification of created process is 00000209
The Internet driver and ACP were successfully loaded.
%%%%%%%%%%% OPCOM 8-JUN-2008 12:23:06.78 %%%%%%%%%%%
Message from user INTERnet on MZR1P1
INTERnet Loaded

%UCX-I-SETLOCAL, Setting domain and/or local host
%UCX-I-SETPROTP, Setting protocol parameters
%UCX-I-STARTCOMM, Starting communication
%%%%%%%%%%% OPCOM 8-JUN-2008 12:23:07.42 %%%%%%%%%%%
Message from user INTERnet on MZR1P1
INTERnet Started

%UCX-I-DEFINTE, Defining interfaces

**** OpenVMS (TM) Alpha Operating System V7.1-1H2 - BUGCHECK ******* keyboard n.

** Bugcheck code = 000001CC: INVEXCEPTN, Exception while above ASTDEL
** Crash CPU: 00 Primary CPU: 00 Active CPUs: 00000001
** Current Process = UCX$INET_ACP
** Image Name =
**** Starting compressed selective memory dump at 8-JUN-2008 12:23...
.......................
...Complete ****

halted CPU 0

halt code = 5
HALT instruction executed
PC = ffffffff8006df00
--------------------------------------------------------------------


Darijo
Frequent Advisor

Re: Memory dump agan.

I guess that I can't get some services running in minimum system startup...
Darijo
Frequent Advisor

Re: Memory dump agan.

Ok, I finaly managed to get the file...hope it helps.
Darijo
Frequent Advisor

Re: Memory dump agan.

Oh sorry I sent wrong DMP!
Here is the real one!
John Travell
Valued Contributor

Re: Memory dump agan.

I may be missing something, but I do not recognise SYS$KPDRIVER. Is that the driver for your special fibre optic adapter?
If so, Hoff's comment - "I'd be willing to bet that the device driver for that device is what is blowing up here, too. It's probably a SYSMAN I/O connect in that batch job, and if you wander up the stack, you'll find it's connecting the driver." is almost certainly spot on.
I will take another look later, as I have no doubt will others.

Sorry about SDA> CLUE REGISTER, I forgot that it first appeared somewhat later that V7.1-1H2.
JT:
Darijo
Frequent Advisor

Re: Memory dump agan.

>>> I may be missing something, but I do not recognise SYS$KPDRIVER. Is that the driver for your special fibre optic adapter?

I guess so...
Wim Van den Wyngaert
Honored Contributor

Re: Memory dump agan.

I see that you have firmware 5.4 on the new 1000. What was it on the old one ?

I think it could be 5.3. May be test with 5.3 and with newer/latest firmware ?

Wim
Wim
Hoff
Honored Contributor

Re: Memory dump agan.

If this is this SYS$KPDRIVER device that's at fault...

Boot conversationally (minimally), wander over to the console, set default over to SYS$LOADABLE_IMAGES:, and RENAME that device driver.

Here's a similar boot sequence:

http://64.223.189.234/node/939

Here are the commands for use at the console prompt:

SET DEFAULT SYS$COMMON:[SYS$LDR]
RENAME SYS$KPDRIVER.EXE SYS$KPDRIVER_SAVE.EXE

This will cause the startup to log errors when the SYSMAN command and the autoconfiguration stuff occurs, but the startup should complete without a driver crash. This will catch cases where a batch job connects the device, or when automatic mechanisms are used; it'll prevent pretty much any attempt to reference the driver from working.

Once you get the box fully booted, you can get the network and such working and hunt down the specific command(s) used to connect the failing device driver.

I don't know off-hand if the system memory model differs between the Takara and Noritake boxes (AFAIK, those details are around), but both that and a case of a latent driver bug could be involved here. Or both, of course. And this could be a case where the driver recognizes a lack of support for the platform and punts -- though that usually uses a more obvious bugcheck code, or the device is forced off-line. Different folks will program driver error paths differently.

There's easily fodder here for several offline discussions, too, whether around the particular hardware widget or the network protocol or the device driver.

Stephen Hoffman
HoffmanLabs LLC

Darijo
Frequent Advisor

Re: Memory dump agan.

>>> see that you have firmware 5.4 on the new 1000. What was it on the old one ?
I think it could be 5.3. May be test with 5.3 and with newer/latest firmware ?


Ok, I'll see when I get back tomorrow to work.
Thank you!

Hoff,

OK, I will try that...
I will report results asap.
Thank you!
Darijo
Frequent Advisor

Re: Memory dump agan.

>>>Here are the commands for use at the console prompt:

SET DEFAULT SYS$COMMON:[SYS$LDR]
RENAME SYS$KPDRIVER.EXE SYS$KPDRIVER_SAVE.EXE
----------------------------------------------------
SYS$KPDRIVER.EXE doesn't exist in that directory....

Darijo
Frequent Advisor

Re: Memory dump agan.

Hoff,

Ok, I found the file and did everything like you told me to...now it boots the system, all aplications(processes) are working including GUI via X11 but, as expected, there is no communication between the machines becasue the driver is not loaded.

Now...what are my options?
And what HW is different form that one on the 5/500 Alpha that is causing incompatibility with that optic adapter?
Darijo
Frequent Advisor

Re: Memory dump agan.

Also I did '>>>sh config' on machine with working driver:

>>> sh config
Firmware
SRM Console: V5.7-11
PALcode: OpenVMS PALcode V1.21-8, Digital UNIX PALcode V1.23-9
System SROM: V1.2

---------------------------------------------


So Firmware is newer then one on the Alpha 1000.

Here on CD I have ver. 5.6....
Darijo
Frequent Advisor

Re: Memory dump agan.

...and on Alpha 1000A:

>>>show config
Digital Equipment Corporation
AlphaServer 1000A 5/333

Firmware
SRM Console: V5.4-110
ARC Console: v5.68
PALcode: VMS PALcode V1.20-4, OSF PALcode V1.22-6
Serial Rom: V2.0
Hoff
Honored Contributor

Re: Memory dump agan.

The device driver here appears somewhere between rudely incompatible and entirely broken.

You may want to get somebody to take a detailed look at the contents of the crashdump, and at the device driver itself, assuming you're willing to pay for the reverse-engineering and also assuming reverse-engineering is permissible here.

Or you might decide to try to locate a system that's closer to Takara, or to locate another replacement Takara board.

Or you could choose to work with somebody and to acquire as much information for the hardware widget and for the host API and to then get somebody to create a new device driver for you.

None of these are particularly good nor cheap choices.

Hoff
Honored Contributor

Re: Memory dump agan.

ps: in my experience in writing device drivers for various hardware widgets over the years, the Alpha SRM firmware is very seldom involved.

Driver-related crashes -- and particularly entirely predictable crashes -- are usually bugs located somewhere between the hardware widget and its firmware, or (rather more commonly) somewhere in the device driver.

INVEXCEPTN crashes can be ACCVIO or most any other thing that would be a stackdump in user-mode code. A quick look at the stack and/or the dump file can usually spot a "normal" signal array. And yes, an INVEXCEPTN crash can also be triggered by "weirder" errors or data corruptions.

I happen to like repeatable driver crashes, personally. It's the weird and transient and more subtle ones -- where you DMA somewhere you should not have, or where you corrupt a byte or a flag or a register somewhere -- that can drive you batty.

Bill Hall
Honored Contributor

Re: Memory dump agan.

Darijo,

Did you run the ECU to configure the ISA fibre adapter in the replacement AlphaServer? I couldn't find mention of it in your responses. Its been a very long time since I've installed ISA or EISA adapters in an Alpha, so I can't recall if you can boot without configuring those adapters.

Bill
Bill Hall