Operating System - OpenVMS
1827895 Members
1687 Online
109969 Solutions
New Discussion

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

 
SOLVED
Go to solution
Camiel
Frequent Advisor

Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

Hello everyone,

Please notice: The following question concerns a software emulated ES40 I'm working on, not a real system!

When I boot OpenVMS 7.3-1 or 8.3 (I have no other flavors currently available), I get quite a bit into the boot process, when I get an INCON_SCHED bugcheck:

**** OpenVMS (TM) Alpha Operating System V7.3-1 - BUGCHECK ****
**** Accessing system disk via original boot path
** Bugcheck code = 0000038D: INCON_SCHED, Inconsistent scheduling state
** Crash CPU: 00 Primary CPU: 00
** Active/Available CPU Masks: 00000001/00000001
** Current Process = NULL
** Current PSB ID = 00000001

I've attached a complete capture of screen output from the moment I enter "boot dqa0 - flags 0,30000 to the point where I'm dropped back to the SRM prompt.

Could anyone tell me what kind of error could cause this bugcheck? The problem is obviously due to an incorrect emulation of some part of the system, but I'd like to pinpoint exactly what part.

I've looked into the possibility of obtaining the OpenVMS source listings, but that seems unobtainable for mere mortal hobbyists (you need to have a support contract, a paid license (rather than a hobbyist one), and I have none of these (couldn't afford it either). If anyone reading this does have access to the sources, I'd be very grateful if they might find a way to find out where the error comes from.

Thanks a lot in advance!

Camiel.
24 REPLIES 24
Jur van der Burg
Respected Contributor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

INCON_SCHED can happen in a number of places, all because of unexpected things. The console listings however do not add up. If I look at the pc and the module where you crashed (tm_support in process_management) I can't pinpoint the right place, although I do have the exact listings. Taking a good look at the crashdump on a real machine may give a clue.

Fwiw,

Jur.
Camiel
Frequent Advisor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

Hi Jur,

I'm going to try if I can get it to write that crash dump. I'll see if I can use LD to create an image file that can contain the dump, and if I can get the dump file off that image; Can you give me any clues how to analyze that dump file (I'm unfamiliar with that)? Would it have to be done on a system running the same OpenVMS version, or could I analyze a 7.3-1 dumpfile on a system running 8.3 as well?

Thanks for the suggestion!

Camiel Vanderhoeven

iamcamiel@gmail.com
http://www.sf.net/projects/es40
Camiel
Frequent Advisor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

I'm trying to get OpenVMS to write that dump file, but it tells me that there's no dump file available. Should I create this file on the dump_dev disk? What should this file be called and how do I create it?

Thanks,

Camiel.
Hoff
Honored Contributor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

WAG: a bug with the "hardware" implementation underneath the queue operations.



atul sardana
Frequent Advisor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

Dear Camiel,

you can analyse if dump is created succesful
by below command
$>anal/crash_dump sys$system:sysdump.dmp

after this you will get sda> prompt for anylyse

Atul Sardana
I love VMS
Camiel
Frequent Advisor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

Hoff,

I don't think that that's it... The queue implementation is using the actual SRM palcode, and it doesn't look as though this code uses any exotic opcodes that aren't used all over SRM- and APB-code...

Camiel.
Camiel
Frequent Advisor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

Hi Atul,

It appears as though nothing gets written to the disk, unless there already is a sysdump file present?

Once I get the file, I'll try your suggestion!

Thanks,

Camiel.
Jur van der Burg
Respected Contributor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

You need to create a dumpfile first: $ mc sysgen create dka0:[sys0.sysexe]sysdump.dmp/size=200000 for example. Let it crash, ana analyze from a real system. You can just copy the dump around with $backup/ignore=nobackup. No need for LD.

Jur.
Hoff
Honored Contributor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

This looks to be an instruction error, and the scheduler uses interlocked queues, and it may well be the first piece of code that really starts pounding on these constructs. That's the thought behind the WAG. If not, well...

Do you have an instruction-level test harness in place? Something that can be run on a real Alpha, and that can then be verified with results from the emulator?

The OpenVMS operating system itself isn't necessarily the most expedient way to test the instruction set and related behaviors, nor does OpenVMS use anywhere near the entire instruction set. It's exceedingly difficult to debug hardware this way, too.

I won't get into how much "fun" it can be to debug with real hardware with real hardware bugs latent.

You might seek information from the Trailing Edge folks or somebody else working on an emulator, if there's not already an Alpha instruction test suite available somewhere. (I'd tend to assume HP won't release its Alpha or VAX instruction test suites.)
Camiel
Frequent Advisor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

Hi Jur,

Thanks!

The disks I'm working with aren't real disks, but image files, so I need to use LD to access them from a "real" VMS system. I'm getting the hang of LD as we speak. I'll create the dumpfile using LD from a real VMS system, and then run the emulator again.
Camiel
Frequent Advisor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

Hi Hoff,

My emulator doesn't implement the Alpha instruction set fully yet (most floating point operations are not implemented; integer operations with overflow checking aren't implemented), but it will warn me as soon as an unrecognized Opcode is used. When that happens, I add the instruction that is missing. I'm really grateful for the RISC-nature of the Alpha processor, as the instructions tend to be simple, without a lot of by-effects.

I've been looking for - but haven't found - an instruction set verification tool for the Alpha processor. Whenever I doubt what my emulator is doing, I'll write a simple C-program using asm to check the behaviour of a real Alpha processor (I have an AS 4100 that was thrown out as junk at work)

Camiel.
Camiel
Frequent Advisor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

I can't figure out how to do this, I created [SYS0.SYSEXE]SYSDUMP.DMP on the disk image I'm using to boot. I set DUMP_DEV to the disk I'm booting from. However, according to the OpenVMS manuals, I should modify modparams.dat to include the location of the dump file, and then run autogen, to make the location of the dump file known (is this put into the boot block or something?)

Obviously, since I can't boot OpenVMS from the disk image yet, I can't run autogen on it. I could run autogen on the real OpenVMS system, and specify "DUMPFILE_DEVICE = LDA1". That, however, updates the pointer on my real system disk, which is not what I want...

Anyone got a clue how to go about this?

Thanks,

Camiel.
Hoff
Honored Contributor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

The Macro64 assembler is available on the Freeware.

If in your situation, I'd write a test suite if I could not find one available. Get the base instruction set working before trying to boot OpenVMS itself.

You're certainly far enough along that you can use the console to load the program as if it were APB, or you can probably instantiate it directly through the emulator.

Debugging these cases and these crashes is particularly difficult when the "hardware" is not entirely trustworthy.

OpenVMS itself also isn't a good test suite -- not just because it's difficult to debug these sorts of failures (as you're finding), but because OpenVMS itself doesn't necessarily use anywhere near all of the instruction set.

There are open-access OpenVMS systems available on the 'net, and you can log in and run user-mode tests on these systems.
Jur van der Burg
Respected Contributor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

No need to muck around with autogen, just create sysdump.dmp. That's all. If you boot from an LD created container file, that's ok. Just mount that container file on a real system after the crash, and you should be able to access the dump.

Jur.
Camiel
Frequent Advisor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

Could it be that the problem is that the disk image I'm trying to boot from is a copy of the OpenVMS installation CD? Perhaps system dumps are disabled, because CD's normally aren't writeable?
Jur van der Burg
Respected Contributor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

Aha, booting from the installation cd, so not from an installed vms version?

Do this: make sure there's a sysdump.dmp created on the container file in [sys0.sysmgr].

Then boot conversational (-fl 0,1) and set these system parameters:

SYSBOOT> SET WLKSYSDSK 0
SYSBOOT> SET DUMPSTYLE 9

Then continue, and if the system crashes it should write a valid dump. I just tried this with a container file of a V7.3-1 installation cd with an Alpha emulator (not this ES40 obviously!).

Jur.

Camiel
Frequent Advisor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

Thanks Jur,

I read Your reply a little bit too late (I can be a bit impatient), so I got the same result via a longer way...

I booted a real AS4100 from the CD-ROM, installed VMS 7.3-1 on a 2GB RZ28 I had lying around, booted 7.3-1 from the RZ28, ran SYSGEN to create the SYSDUMP.DMP file, and set the DUMPSTYLE parameter to 2. Then I booted Linux on the AS4100, and did a dd if=/dec/sdc of=disk0.img to create an image (I used Linux' dd rather than VMS' backup/image because I wanted the image to be a bit-exact replica of the known good disk), transferred the image to a different machine, booted VMS 8.3 on the AS4100, tranferred the image back to the AS4100, and ran the emulator. It generated a dump last night, but due to other duties (work...) I haven't checked it out yet.

If I had only read your message earlier, I could have saved myself a lot of time!

Thanks for the tip! I'll keep it for future reference!
Ian Miller.
Honored Contributor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

would not BACKUP/PHYSICAL to a LD have done the same as the unix dd if=/dec/sdc of=disk0.img ?

However I think a BACKUP/IMAGE would have been fine.
____________________
Purely Personal Opinion
Jur van der Burg
Respected Contributor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

You could haved saved yourself more time by using a backup/image of a systemdisk to an LD device, and then boot from the container file.
No need for Linux and dd.

Jur.
Camiel
Frequent Advisor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

Hello everyone,

It took me a while, but I finally managed to write a sysdump that analyze could read.

For those interested, I've put the system dump files in a file called sysdump.zip, that can be downloaded from the following page:

http://sourceforge.net/project/showfiles.php?group_id=187340

I very much appreciate any help you could give me!

Thanks,

Camiel.
Camiel
Frequent Advisor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

From what I can make from ANALYZE (this is the first time I've ever used that utility), the crash occurred in SCH$FIND_NEXT_PROC_1CPU_C + 34. Perhaps that helps...
Jur van der Burg
Respected Contributor
Solution

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

I had a quick look at the dump. We crash in the scheduler because the comq where we want to find a pcb in the scheduler is empty. To cut a long story short, the problem most likely is in the implementation of the CTTZ instruction (Count Trailing Zero). You claim to emulate an EV6, and for that cpu a specific path in the scheduler is taken. There's a bitfield indicating which priority queue is filled (00008000.00000000 in this case), prio 16 (numbered from msb to lsb). The result of this instruction should be 2F (hex), but the real result is 40 causing the index into an array to be off with the bugcheck as a result.

In my previous life I would have swapped the hardware :-).

Next time you don't need to post sys$errlog.dmp as it does not contain what you might think.

Jur.
Jur van der Burg
Respected Contributor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

And here's your problem (CTLZ has this problem too):

#define DO_CTTZ \
temp_64 = 0;\
temp_64_2 = V_2;\
for (i=0;i<64;i++)\
if ((temp_64>>i)&1)\ <<<<<<<<<<<
break;\
else \
temp_64++;\
r[REG_3] = temp_64;

Jur.
Camiel
Frequent Advisor

Re: Bugcheck 0000038D (INCON_SCHED) during boot on emulated ES40

Here's a big thank you to everyone who spent their valuable time trying to help me, and especially to Jur van der Burg! You have completely solved the problem.

For those interested, I put the fixed source-code (version 0.11, includes some more changes) up for download at http://sourceforge.net/project/showfiles.php?group_id=187340