Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

ACCVIO register stack dump

SOLVED
Go to solution
pacab
Advisor

ACCVIO register stack dump

Greeting to all,
I have spent countless hours trying to figure this access violation issue and have had enough. So I am seeking this forum expertise to shed some light.

I have attached a file with the stack dump the application issues and a snipet of the C++ code.

This is a C++ legacy application that was moved during a hardware and VMS O.S upgrade from 6.2 to 7.3-2. Each time our business users use the printing function more than once they get kicked out of the application and get the stack dump. If they print once and go back to the main menu which is 2 screen levels out and select an option to switch to a different entity which calls a new screen they get kicked out with a stack dump. I have searched and read documents/manuals with no luck...PLEASE Help!!!

Thank you all in advance for your time and feedback...Abel
16 REPLIES
Willem Grooters
Honored Contributor

Re: ACCVIO register stack dump

I have seen this before and lost quite some time to find out that the problem was in a called routine, that used an undocumented feature. It was buried deep - so it took a lot of time to locate the source.

If possible, I'd rebuild the application - including the (non-system) libraries used. Read the releasenotes - all of them - between 6.2 and 7.3-2, to see if any changes in the code are required.

Do you get a traceback? This may shine a light where teh error occurs.
Willem Grooters
OpenVMS Developer & System Manager
Volker Halle
Honored Contributor

Re: ACCVIO register stack dump

Abel,

this looks like a stack overflow problem. The failing virtual address 7AE2C000 points into P1 space, where the user stack resides.

Can you start the image with RUN/DUMP or set the process SET PROC/DUMP/ID=xxx after the image has been started ? It should then write an image dump (SYS$LOGIN:image_name.DMP) when an improperly handled condition has happened. You can look at the dump with ANAL/PROC or even ANAL/CRASH to find out what's going on.

To rule out the fact, that it's just missing PGFLQUOTA, that may prevent automatic stack expansion, try to increase pagefile quota for the user.

Volker.
pacab
Advisor

Re: ACCVIO register stack dump

Willem,
I will check the release notes to see anything that jumps out. Rebuilding the application would be a bit of an undertake, which, at this time we are just trying to keep the application stable. But, you are right on the money when you say some of this code needs to be cleaned up.

Volker, I will run this with /dump and take a closer look. I actually had already increased the /PGFLQOUTA to 700000 and it helped to allow more than 2-3 prints, but, it still creates the same ACCVIO error.

Any other suggestions?...

Thank you for all of your comments, time and feedback.

Abel
Ian Miller.
Honored Contributor

Re: ACCVIO register stack dump

I saw
"increased the /PGFLQOUTA to 700000 and it helped to allow more than 2-3 prints, "

and wondered if you tried increasing it any more?
____________________
Purely Personal Opinion
Volker Halle
Honored Contributor

Re: ACCVIO register stack dump

Abel,

if increasing PGFLQUOTA helps 'a little bit', it may be a temporary workaround to increase it even more, but it also confirms, that the underlying problems is a STACK consumption problem, where some piece of code seems to allocate STACK space and not return it or allocate STACK space based on some variable, which does not get reset and gets bigger from call to call...

You should also be able to 'see' this, if running SHOW PROC/CONT against such a process and having the user issue a couple of prints. Virtual Pages used should increase.

You could examine the stack in the running system with SDA and see it growing from print call to print call, but a process dump would certainly be better.

Volker.
pacab
Advisor

Re: ACCVIO register stack dump

I increased the PGFLQUOTA to 900000 and not sure it helped. I monitored the process and created a dump of the error. Attached is the output from both SET PROC/DUMP and SHO PROC/CONT.

I to feel this is a stack size issue, I am just not sure where it is or what parameters to change to prevent this problem from happening.

I also included the printing code used in this application that is giving me nightmares... Abel
Volker Halle
Honored Contributor

Re: ACCVIO register stack dump

Abel,

before the final failure, the SP was at 7AE2A0C0 (same value as at start of application). The failing VA is 7AE2C000, just one ALPHA page further DOWN the stack. Stacks GROW towards LOWER virtual addresses, but the failing VA is at a HIGHER address.

With the following data, you can confirm, that it really is a STACK UNDERFLOW, which explains why setting PGFLQUOTA higher does NOT help much.

At the DBG prompt, enter SDA, then
SDA> SHOW PROC/PAGE 7AE2A0C0;4000

Look at the Read Writ columns for pages 7AE2A000 (page of USER stack) and 7AE2C000 (page causing ACCVIO).

If this doesn't work in the dump, try it against that process in the running system.

You can check the USER stack start address with:

SDA> EXA CTL$GQ_STACK+18 ! bottom of USER stack

Then let's have a look at the instruction stream leading to the ACCVIO:

DBG> EXA/INS @PC-40:@PC

Volker.
pacab
Advisor

Re: ACCVIO register stack dump

Volker,
Thank you for all your input and time...

I have attached the output of the SDA you referred me to. Can you please advise on the results? Also, is there any HP VMS docs you can referr me to learn some of these administrative tools?

Thank you once again...Abel
Volker Halle
Honored Contributor

Re: ACCVIO register stack dump

Abel,

there is the OpenVMS Debugger Manual and the System Analysis Tools Manual (for SDA), but the problem is, that while those books will extensively describe all available commands, they don't tell you the troubleshooting techniques and internals of OpenVMS, which you need in this case to troubleshoot the problem from a process (or system) dump. But I'm here to help you ;-)

The failing instruction is:

LDQ_U R24,(R17)

It tries to load the un-aligned QUADWORD from the virtual address pointed to by R17 (=7AE2C000) into R24 and fails with ACCVIO accessing the page pointed to by R17 - not in memory !

Looking at the instructions preceeding the failing one (and having the Alpha architecture manual ready for looking up, what those instructions really do), one can see, that the code seems to be searching memory (byte-by-byte, R17 pointing to current byte, R6 containing some byte counter) for a byte value 0x2C (that's a comma ',' in ASCII code !):

ADDL R6,#X01,R6
LDA R17,#X0001(R17)
XOR R20,#X2C,R20
BEQ R20,#X000011
LDQ_U R22,(R17)
EXTBL R22,R17,R22
XOR R22,#X2C,R22 <<< R22 = 0x2C ?
ADDL R6,#X01,R6 <<< incr R6
LDA R17,#X0001(R17) <<< incr R17
BEQ R22,#X00000B <<< branch if R22=0x2C
LDQ_U R24,(R17) <<< ACCVIO here

In your dump, R6 should be something like the 'String length' or number of bytes already checked (R6 is 00002B5F in your register dump attached to your first note).

So the 'string' may have started at 7AE2C000-2B5F ! Try SDA> EXA 7AE2C000-2B5F;100 and see if you could identify the data shown...

You might also want to check your C code for a loop looking for a ',' in a string and make sure, there ALWAYS IS a comma and the loop will be TERMINATED at the end of the string by checking for a terminating 0 in case of an ASCIZ string ?!

Volker.
Volker Halle
Honored Contributor
Solution

Re: ACCVIO register stack dump

Abel,

I'm not a C programmer, so I had to write a little example program - see attachment ;-)

But the following lines of C code in your 1st attachment look suspicious to me:

buf[sizeof(buf)-1]='\0';
for (i=0; (buf[i]!=','); i++);
buf[i] = '\0';

What's gonna happen, if there is no ',' in the string ??????

Volker.
pacab
Advisor

Re: ACCVIO register stack dump

Volker,
I checked at the code you pointed out in the for loop. The record being check looks like "user,printer_queue". I looked at the file that holds the aforementioned record layout, and all records appear to have a "," in between the user and queue name.

I know the problem ("bug") has to be in the printing code, because if I do not invoke that functionality the process will not fail. Do you have any more ideas? I trully appreciate all your help. Do you have any docs of tips and trouble shooting techniques you can share?

Thx - Abel
Volker Halle
Honored Contributor

Re: ACCVIO register stack dump

Abel,

please check the buffer contents in the process dump:

SDA> EXA 7AE2C000-2B5F;100

Just looking at the code and the records you think it will read may be fine, but you have the process dump, in which you can check the REAL DATA.

To find out the source code statement on which the ACCVIO is happening, you'll have to have the linker map and the module listing - including machine code - from exactly the version of the image, which is currently running.

But if you think you know the module/routine the failure is happening in, you can separately compile this module with /LIS/MACHINE_CODE and try to find the instruction stream by comparing the instruction stream data from the machine code listing and the process dump.

I don't have any books or documentation describing these kind of troubleshooting techniques, it's just experience and knowledge obtained during 25 years of OpenVMS support at Digital/Compaq/HP. I've shared this inforamtion in the past with my former colleagues internally and I'm now doing it in this forum...

Volker.
Volker Halle
Honored Contributor

Re: ACCVIO register stack dump

Abel,

were you able to further diagnose and solve this problem ? If so, please provide feedback or report the diagnosis steps, which led to the solution.

Thanks,

Volker.
pacab
Advisor

Re: ACCVIO register stack dump

Thank you to all for your input, it was very helpful. In particular Volker, thank you for your support, time and guidance.

I was able to successfully correct the ACCVIO error I was encountering.

There were several issues:
1) There is a patch we had to install to deal with a known LINKER problem on v7.3 (as we moved this app from v6.2 recently)

2) Improper clean up, usage and handling of SMG$ utiltilies which worked fine under v6.2 (not sure how or why).

Now, I was able to figure out the exact location of this issues, by using SDA and Debugger tools (see previous posting by Volker) in conjunction with a listing of the compiled routine using "CXX/lis/machine_code routinename" (great tool!...thank you Volker!).

When you run the application in debug mode ("run/debug"...Which you also need to compile and link with the debug qualifier), the debugger will stop when it encounters an error for you to review. It will tell you the location of the code that caused the error (function name) and line number of the code in the routine which you can then use in the listing of the "compiled/lis" file to view the actual line of code. BTW, the file name created by using the "/lis" option, will be "routine_name.lis".

Once again, thank you to all for your feedback.

Abel
pacab
Advisor

Re: ACCVIO register stack dump

One last thing...The manuals are great resource...But, certainly do not replace the experience and knowledge of this forum.

Abel
pacab
Advisor

Re: ACCVIO register stack dump

Thank you to all...