Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Bugcheck code = 0000036C: PROCGONE, Process not in system

 
SOLVED
Go to solution
FernandoML
Advisor

Bugcheck code = 0000036C: PROCGONE, Process not in system

Hi all,
I reboot both nodes of a cluster just to verify they worked fine before moving to a new Data Center.
After booting from SAN I get on both nodes the same error:

**** OpenVMS Alpha Operating System V7.1 - BUGCHECK ****
** Bugcheck code = 0000036C: PROCGONE, Process not in system
** Crash CPU: 01 Primary CPU: 00 Active CPUs: 00000003
**** Starting compressed selective memory dump ........ COMPLETE
SYSTEM SHUTDOWN COMPLETE

Halted CPU 0

Halt code = 5
P00>>>

Can anyone help how to troubleshoot this problem? No hardware errors on disks, just on tapes since months.

Thanks in advanced,
Fernando.




24 REPLIES 24
Hoff
Honored Contributor
Solution

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

PROCGONE covers a gazillion different cases of failures early in the bootstrap. Load the ECO kits up to current, seriously consider an upgrade to the current release, then ring up HP. There's often a code left in register R0 that can sometimes help identify the trigger, and HP will have a list of the various R0 codes that can be seen on OpenVMS Alpha V7.1. There are a variety of FC patches for known FC SAN bugs; these bugs have cropped up at seemingly random times.

As much for grins as anything else, I'd try the same sequence with OpenVMS Alpha V8.3, too.
FernandoML
Advisor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Hoff,
Thank you for your quick response.
We are thinking of reinstalling/upgrading from original CD. Maybe this could repair any corrupt file concerning licensing that makes impossible to boot from disks.

This system is new for us as part of a recent support contract that involves many other intel systems and we know it has never been updated. I'm afraid no HP support is "alive" but I will tray to ring them.

This is an AlphaServer 800, How can we get the code left in register R0?

Thanks again.
Hoff
Honored Contributor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

[[[We are thinking of reinstalling/upgrading from original CD. Maybe this could repair any corrupt file concerning licensing that makes impossible to boot from disks.]]]

Ah; OK. I'd (incorrectly) inferred this was an existing and known system that had started tipping over, and not a new-to-you system.

As for the corrupt file, that's not the approach I'd look for first. PROCGONE can be all over the map; fragmentation, problems accessing disks, volume label collisions in a cluster, all sorts of stuff.

And licensing failures don't typically overlap with PROCGONE; I've never seen that combination.

Do look at the configuration for the FC SAN here first, and most definitely do not use whatever random bits were found on the box if this is a new-to-you and fresh box. Load it fresh. With existing bits found on a system disk, who knows what might happen.

If it's an existing box that was managed and run for a specific task and you're now adopting support for the box, then re-installation probably isn't appropriate as a first step. Start with the FC SAN configuration and diagnosing the R0 and mayhap an AUTOGEN pass and do load the current ECO kits and work from there.

[[[This is an AlphaServer 800, How can we get the code left in register R0?]]]

Old gear. Ok.

The value in register R0 usually displayed as part of the spewed mass of characters displayed during the crash. It may well be stored in the crashdump, too; I've not confirmed the PROCGONE code ends up written there, as the displayed value during the crash is much more directly visible.
Duncan Morris
Honored Contributor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Fernando,

Welcome to the itrc OpenVMS forum!

Are you sure that this really is an AlphaServer 800?

Your crash report show that there are 2 CPUs on this system - but the AS800 is a single processor system!!!

"Crash CPU: 01 Primary CPU: 00 Active CPUs: 00000003"

You might try booting

>>> boot -flags 0,30000

and post the results in an attachment.
There may be a clue in the output.


Has this system ever been booted from a SAN disk before? I seem to recall that fibre channel support only came in with VMS V7.2, so I am surprised to see you saying that you are booting from a SAN.

Duncan
FernandoML
Advisor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Yes, you noted already it's new for me. ;-(

I know MA8000 SAN systems this is an older one but serial console management is very similar. When showing disks, units and connections everything seems to be ok.

Controllers are HSZ50 model.

This system has always booted from shared storage (Storageworks for both nodes with scsi connections) so It is a DAS not a SAN (Sorry!)

Look at the attached phone pic. There is a code at the end of the crash, after "halt code = 5" as follows:
PC = ffffffff80083ee0

We tried to boot -fl 0,1 but with same results.

Now I cannot post the results cause I have no direct access to console right now (23:00 PM) here in Spain. Tomorrow I will follow your guidelines.

Fernando.


Hoff
Honored Contributor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Ok; the JPG image shows the PROCGONE is occurring just after the second processor is launched, and shows two processors in the configuration.

That's already very odd, as the AlphaServer 800 that was mentioned earlier is a uniprocessor.

You're going to want to specifically identify the processor here, and more of the configuration involved here. (One cause of PROCGONE is an attempt to boot a processor on a release that lacks support for same, for instance.)

There are AlphaServer 8200 and AlphaServer 8400 class boxes; there are unfortunately a gazillion similar-named systems around.

With the Alpha SRM console, some combination of SHOW CONFIG and SHOW DEVICE or such (at the >>> prompt) usually elucidates sufficient identifying information.

FernandoML
Advisor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Ok. I dont remember output of show config or show device but I will bring it here tomorrow.

Thanks again.
Jur van der Burg
Respected Contributor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

You can get full console output (including R0) by setting bit 1 in the parameter DUMPSTYLE. So you can do a conversational boot (boot -flags 0,1) en set DUMPSTYLE to 11 (9 is the default). This will show you the contents of R0 after the crash.

Jur.

Willem Grooters
Honored Contributor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Just for clearification:

What would happen if a uni-processor system (AS800) would boot from a system disk of a multi-CPU-system (AS8x00)?

For if multiple processors are expected according the system parameter file(s), it's obvious something may go wrong if that processor does not exist.

If that is true: are you booting from the right disk or system root?

Willem Grooters
OpenVMS Developer & System Manager
Heinz W Genhart
Honored Contributor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Hi FernandoML

does this cluster have multiple Systemdisks?
Are all cluster nodes booting from same systemdisk?

If you have more than one Systemdisk, be sure that they have different labels otherwise you have this problem with procgone

Regards

Geni
FernandoML
Advisor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Hi all,
The systems we have doing clustring are an Alpha server 4000 and an Alpha server 8400 (we have another 800 sorry for the mistake)

Disks are on the storageworks (2 disks on raid 1 for booting system and the rest for data)

Storage controller console give warnings concerning "cache battery is now sufficiently charged" and "Previous controller operation terminated by removal of program card" ???!


Register dump shows R0 as 00000000.004D8CFC
(see pic attached)

We are trying to repair autochanger TZ887 and try to recover from last system backup of the cliente done in 1999!!!! My god!

Before booting last time we made a system backup on the disks of storage.

Duncan Morris
Honored Contributor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Fernando,

that translates to

exit %x004D8CFC
%IMGACT-F-NOTNATIVE, image is not an OpenVMS Alpha image

You definitely want to use -flags 0,30000 during the boot to identify the invalid image.

Hoff
Honored Contributor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

[[[The systems we have doing clustring are an Alpha server 4000 and an Alpha server 8400 (we have another 800 sorry for the mistake)]]]

Ok. So which specific model of AlphaServer box are we working with here? Some AlphaServer boxes will not bootstrap as far back as OpenVMS V7.1. With the Alpha SRM console, some combination of SHOW CONFIG and SHOW DEVICE or such (at the >>> prompt) will provide platform and configuration details on the box. You should see a specific system name and specific system model.

Verify the hardware path out to the disk is correct, and correctly configured. Here, you can use the installation directions that are available for the various widgets to confirm that your particular combination is correctly configured.

Verify that the ECOs are current for V7.1 or whatever release is involved here. V7.1 had a *gazillion* ECO kits; so many that this release effectively begat the V7.1-2 release and its roll-up of ECOs and of a whole new and massively improved way of dealing with and of installing ECO kits on OpenVMS.

Verify version support for whichever hardware is involved here: http://h71000.www7.hp.com/openvms/hw_supportchart.html

You can use the QuickSpecs or such to verify controller-level support.

Enable boot-time diagnostics with a conversational bootstrap and setting STARTUP_P2 to P might help, and enable boot-time diagnostics with >>> boot -fl root,30000 or such.

Call HP to help decode register R0, if it's not the bad image header message nor something else involved here.

Here, if the disks are functional, I would not overwrite the contents. I'd use a known scratch disk (and preferably a local disk; not a FC SAN disk), install OpenVMS Alpha V8.3 on it, and see if I could sort out the configuration and boot issues from there.

Alternatively, you can call in some more formal and more experienced assistance for a direct look at the configuration and at the particular AlphaServer box. This could be HP support, or one of the various HP partners that specialize in OpenVMS.

FernandoML
Advisor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Ok. HP is already working on it. Changing autoloaders will help us to backup disk 0 before any action.

We shall boot -fl 0,30000 to identify invalid image and probably try to install OpenVMS in local disk.

Tomorrow I will give you more information.
I really thank your support, it's incredible how this forum works.
Volker Halle
Honored Contributor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Fernando,

the %IMGACT-F-NOTNATIVE reason code in R0 indicates, that SYSINIT or one of it's shareable images is not an OpenVMS Alpha image. You can boot from CD and check those images...

This type of crash has also been seen when such an image is 'too fragmented' for the early boot phase (e.g. DECC$SHR.EXE after installing ALPACRT03_071).

Volker.
FernandoML
Advisor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Hi all,
When booting -fl 0,30000 images not valid are these ones:
SYS$FILES_64.EXE
SYS$XFS_CLIENT.EXE
SYS$XFS_SERVER.EXE
SYS$LFS.EXE

We are now backing up one of the disks in order to install on that disk a new VMS to get those images and copy to the 0 disk.
FernandoML
Advisor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Bad news, those images files are not on the new installation nor in the backup copies cartridges.

HP came and couldn`t do anything except helping to repair the autochanger.
Duncan Morris
Honored Contributor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Fernando,

you can ignore the errors with those files.

See

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=416077&prodTypeId=18964&prodSeriesId=416077&objectID=c00625969

Did you see any other problems?

Hoff's "Ask the Wizard" articles and Volker have both pointed to excessive fragmentation as possible issues.

Now that your autochanger is fixed, maybe you should try doing a full image backup and restore of the system disk.

Duncan
FernandoML
Advisor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Ok. We'll stop looking for those files and ignore messages. No other errors appear to flow.

Recover on one of the disk from tape of 1999 resulted in no success as OS was of a former version. We have installed from CD in another disk (not in DCK0) a new OpenVMS 7.2 and works fine but we need to recover lots of configuration and procedures files as lots of products.

Before shuting down the system we made a system backup on one of the disk of the SW500 as the changer was offline.

Tomorrow we'll try to recover system from that copy on DCK105.

Last chance is to reinstall OpenVMS again and start from the begining looking for products and licenses but we are afraid to lose some of those products installed later.

Fernando.
FernandoML
Advisor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

By the way. I understand that you mean if we backup and recover the system disk then defragmentation would dissappear... and so the problem... We'll do it before reinstalling. Thnks again.
Volker Halle
Honored Contributor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Fernando,

this type of bugcheck can NOT be solved by the information provided with a diagnostic boot (>>> b -fl 0,30000). The problem is detected by the image activator, which does not provide debug output during boot.

Please check all the images needed by SYSINIT:

Shareable Image List

0) "DECC$SHR"
1) "LIBRTL"
2) "LIBOTS"
3) "SYS$BASE_IMAGE"
4) "SYS$PUBLIC_VECTORS"

The most suspect image would be DECC$SHR. First try ANAL/IMAGE, then DUMP/HEAD/BLOCK=COUNT=0.

You can certainly boot OpenVMS from CD or another disk and analyze the dump:

$ ANAL/CRASH sys-disk:[sysn.sysexe]
SDA> exa exe$gl_state ! bit 3 is BOOSTATE$M_SYSINIT

Is this bit clear ? If so, SYSINIT has not been run.

Then check the image activator scratch area:

SDA> CLUE PROC/LAYOUT
...
Image Activator Scratch Area start end length
...

SDA> EXA start;length

Look for a file name

You said 'HP could not help'. And I say: YES, WE CAN !

Volker.
FernandoML
Advisor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Thank you Volker.
I am absolutely sure "YES YOU CAN" as seen in this forum and over there. I meant that this is a software problem not a hardware one. And customer does not have software suppport contract.

But we have ITRC.

We'lll follow your advices. Thnks again!
FernandoML
Advisor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Hi everybody,
Finally we solved by reinstalling the VMS cluster on both nodes.

We previosuly did a backup to save products installed and configurations files.

Then we create sys0 and sys1 for boot each node from the DKC0 of the shared storage (SW500).

We configured and restored what we needed and now it's at last everything working fine.

I really want to thank you for your help, advices and your support under ITRC.

Best Regards,
Fernando.
FernandoML
Advisor

Re: Bugcheck code = 0000036C: PROCGONE, Process not in system

Finally reinstalled vms cluster sys images and restore configuration files.

Thanks to all for your support.