Re: ACMS servers get corrupted/hang

Anand Bhupal Kushapp · ‎08-14-2010

Hi,

I am not sure if this the right forum,if you could advice on my issue it would be great help.

Recently we took over VMS application which had a known error unsolved.

In this application,ACMS is running on VAX system and RDB is running on Alpha system.Users are connecting to VAX machines and access menu ( they use TTWIN to connect to VAX machine ). For any read or update transaction done by user, ACMS is connecting to Alpha ( since RDB is present ) and display on user screen.

Issue here is many times the ACMS servers hang/corrupt and application error messages is displayed on user screen and they cannot continue thier work

Workaround is replace the ACMS servers ( acms/replace server .. )

Is there any way to detect faulty ACMS servers, so that we can reset them before user will log a major case to us.

Regards,
Anand
+61451544443

Hein van den Heuvel · ‎08-15-2010

>> I am not sure if this the right forum
Anand, Welcome to the ITRC OpenVMS forums.
This is a good place to start, allthough eventually this may turn out to be more appropriate for some Oracle (DB) forum

>> Recently we took over VMS application which had a known error unsolved.

If it is known, then how about trying to fix it, instead of mopping up after damages done?

>> In this application,ACMS is running on VAX system and RDB is running on Alpha system.

Why not run ACMS on the Alpha as well, or on an Alpha emulator? One reason I hurt not too long ago was that TDMS was not available on Alpha way back, but that is long since fixed!
How is the vax - alpha rdb connection set up?
OpenVMS versions? RDB version(s)? Vax and alpha models?

>> Issue here is many times the ACMS servers hang/corrupt and application error messages is displayed on user screen and they cannot continue thier work

So what is the error message?
Can it be reproduced with ACMS/DEBUG?
Anything in the SWLUP or ACMS Audit log?
How about running the server with DUMP attribute and perhpas capture some useful state/status that way?

>> Workaround is replace the ACMS servers ( acms/replace server .. )

How primitive !

>> Is there any way to detect faulty ACMS servers,

Check logs for tell tale signs.
Install with dump? Check quotas?

I suppose you could try use the ACMS System Interface to call a server task at regular intervals and use its return/status to see if it is still 'alive'.

>> so that we can reset them before user will log a major case to us.

Just fix the darn problem ?!
That why they pay you the big bucks no?

Regards,
Hein
(ACMS dabbler as early as EFT back in 1983, as late as two weeks ago, and more next week!)

Richard J Maher · ‎08-16-2010

Hi Anand,

I too think you might be better off trying to work out what's hanging and seeing if you can stop it.

As well as what Hein said with ACMSATR and SWLUP, you could looh at the active user stall messages in RMU/SHOW STAT and any locks in RMU/SHOW LOCKS.

Does the server hang LEF or CPU spin or something else?

Cheers Richard Maher

abrsvc · ‎08-16-2010

Anand,

As others have stated, additional information is required in order for us to help further. Are you using TDMS as well? There were issues with early versions that would cause hangs or the appearance of hangs when TDMS was involved.

Dan

Anand Bhupal Kushapp · ‎08-16-2010

Hi Hein,Richard,Dan and Peter

Thank you very much for your help.

This application is running over a decade, and our customer dont want to spend money for any development.So they are only interested in preventive action.

We are using VMS V7.3 and RDB version is 7.07.

We are getting average 1 major a month where few users are unable to do thier activity for around 5 - 10 minutes.
After they have raised the major we will reset the ACMS servers.

More then 70% of our majors have been caused due to Database blockers.
Database is present on Alpha system. Users will connect to VAX system and login into menu.
Once they login into VAX a process id is assigned to them.If they want to connect to database for instance for any inquiry or update, thier process id is turn connected to ACMS
server process id ( for instance if it is order update they connect to ACMS update server process id ). ACMS process id will inturn connect to Alpha system where RDB will create a process id and attach to ACMS. This is how link is established.

Again each ACMS sever process can have multiple user process connected to it and the limits are defined in ACMS application.Usually this is not a issue since we have multiple VAX nodes and users will get connected automatically whichever vax node is having free SP's.

When there is a lock in database then the connection link between RDB and ACMS process is disrupted.RDB will release the lock by default and but the ACMS server process will be hanging.Server reset will fix the issue.

Usually the locks are fast and we cannot identify them in RDB monitor utility ( M I A ) . Also the lock history ( M I B ) is not helpful much.

So recently we have written a job which run below command every 5 minutes ( our DBA confirmed it will not have performace issue ) to detect any blockers.

rmu/show lock/mode=blocking

Blockers are short duration lasting few seconds to 1 minute.If the blockers exist for more then 1 minute it might cause issue in ACMS.

Issue here is assume there is a ACMS issue detected at 10:30 by user and we have two blockers found around 10.
Users compalined that when they are trying to do order update the screen was freezing ( values not populating on menu ), which indicates it is due to database lock.
So we want to go back and say we have detected two blockers.Those blockers are Alpha process id ( RDBSERVER process )
From NCL using session control port we can detect the corresponding VAX process id. This VAX process id is the ACMS process id.But from ACMS process id we are not able to detect user process id.I know that ATR actually writes user process id only instead of ACMS process details.SWLUP logs show only ACMS process details and it wont show user process details.

I tried using analysis/system i could not find out user process details and acms/show users also wont help.

Basically we want to put in the root cause analysis the lock was caused due to that perticular user.So we are stuck here to identify the actual user process.

Can you please help.

Regarding to identify locks we have proposed customer to upgrade RDB version to RDB 7.1 or higher since they have tools to identify the locks accurately.
Can you please justify above statement.

abrsvc · ‎08-17-2010

Without knowing the exact application, we can only speak in generalities. What you have here is a design issue where there are race conditions with locking requests. I ran across a similar problem with an application that had coded ASTs to handle conditions but blocked AST delivery. A similar condition could exist here. It sounds like you have two threads requesting exclusive access to the database and neither gives in to the other. It sounds like RDB gives up, but the ACMS application does not. I think at this point that an examination of the application is required in order to find this. Others here may have some ideas though.

Can you check the application in the "stuck" area of code for lock requests that do not have timeouts? Perhaps a simple change to include a timer would solve this.

Dan

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: ACMS servers get corrupted/hang

ACMS servers get corrupted/hang