Operating System - HP-UX
1836450 Members
2515 Online
110100 Solutions
New Discussion

Re: Some Unknown Resource Issue

 
Ty Roberts
Frequent Advisor

Some Unknown Resource Issue

Looking for some help on this very strange problem. We have an overloaded A-Class development box that is running several Oracle Databases as well as an installation of JD Edwards. Eveyone will agree that this machine is very overloaded. There are certain Edwards jobs that can run on any machine but this one, even other A-Class Machines (the job just hangs). We ran a test today where we shut down everything on the box except Edwards and it's associated Oracle Database. Once we did this the job ran fine.

Even when everything is running on this machine we do not seem to be reaching any memory / open file / semaphore limits. Does anyone have any ideas of how / what we could monitor in terms of system resources that would help us to determine what limit we are hitting if any? We have been using Glance / PerfView but we do not see anything that is topping out.

Any input / help on this issue would be GREATLY appreciated!

Thanks in advance..
18 REPLIES 18
Ty Roberts
Frequent Advisor

Re: Some Unknown Resource Issue

Just FYI. The Edwards code / OS / OS Patch Level are all the same..
A. Clay Stephenson
Acclaimed Contributor

Re: Some Unknown Resource Issue

Don't overlook that the problem may have nothing to do with resource limits but instead two of your applications may be using common lock names (e.g. keys for ftok for use by sema4's, messages, or shared memory) or common files trying to acquire a file lock via fcntl() or lockf(). I would check the configurations very carefully.
If it ain't broke, I can fix that.
Roger Baptiste
Honored Contributor

Re: Some Unknown Resource Issue

hi,

What do you mean by the job just hangs? Does it have a log file, where you can check where it stops without apparently going further.
Another check would be get the process PID and drill it down in glance/gpm. See what files it opened, see what system calls it is calling and most importantly check its Wait states. IF it is waiting on a lock or a stream , then the issue is not system resource but something to do with application or elsewhere.

HTH
raj
Take it easy.
harry d brown jr
Honored Contributor

Re: Some Unknown Resource Issue


Let me get this straight:

(1) multiple oracle databases
(2) development machine
(3) JD Edwards stuff (heard of it)
(4) job hangs


OK, now how many of these can you answer:

(1) What kind of DISK are you using???
(2) What kind of IO cards ??
(3) Upgraded firmware??
(4) Developers compiling??
(5) Amount of memory?
(6) Number of CPU's
(7) Actual model of A-class
(8) OS level
(9) number of users
(10) number of processes
(11) file system layout
(12) any zombies
(13) 64bit?
(14) threaded applications?

live free or die
harry
Live Free or Die
Philip Chan_1
Respected Contributor

Re: Some Unknown Resource Issue

A job could be bounded by CPU, disk I/O or waiting on things such as terminal input, network calls to return etc. First you must identify which is your bounding factor!!! Glance is a good tool for determining bounding situation, it gives you a global view on CPU, memory, disk and network utilization at start, after that you can look into further details in report -> process list. Locate your job in this process list report then look at the "stop reason" column (I thought it was called "bound-by" before), check out what that is.

Rgds,
Philip
Ty Roberts
Frequent Advisor

Re: Some Unknown Resource Issue

Ok.. going to try to give as much info as I can without being at work (at home now). Will append additional info tomorrow in the AM.

When I say the jobs hangs it stops on a Standard JD Edwards cache call. We used to debigger to find this out. The log files reports no errors. Like I said, it alsmot just stops processing.

To answer you Harry:
OK, now how many of these can you answer:

(1) What kind of DISK are you using??? Attached to a VA 7100 Disk Array
(2) What kind of IO cards ??
Tachyon Tl/TS Fibre Cards
(3) Upgraded firmware?? Not sure if it is the exact latest firmware.. but it has been updates recently
(4) Developers compiling?? Yes it is a heavily used dev box.
(5) Amount of memory? 6GB
(6) Number of CPU's? 2
(7) Actual model of A-class?
9000/800/A500-44
(8) OS level? 11i
(9) number of users
Usually only 3 -5 direct users
(10) number of processes
Not close to any limit
(11) file system layout?? Not sure what you mean.. vxfs?
(12) any zombies? Nope
(13) 64bit? Yes
(14) threaded applications? Yes it is threaded

I do not rememver exactly what it is waiting on in Glance. I think it was System. But we have many times run this process while watching Glance. Acoring to glance we do not hit any system limits (Systems Table Graph). CPU, Memory, Network is not a a bottleneck. When we drill down on the PID it is not making any system calls since it stops dead at the JD Edwards System Cache Call.

What makes this so weird is that if we run the exact same process, just after shutting down any extra apps, databases it runs perfectly. Nothing else is different besides the other apps running. Let me stresss that even while other apps are running there is plenty of memory, CPU time is available, disks are not overly busy, nfiles, and all kernel limits shown in Glance are fine.

This is what is odd that NO alarms are going off in glance, and no matter how we drill down we do not see anything that would be holding it back. We have looked at this process in Glance tirelessly and still have come up with nothing.

As soon as it makes this JD Edwards call with other app running the process just stops dead in it's tracks..

Thanks for everyones input.
Bill Hassell
Honored Contributor

Re: Some Unknown Resource Issue

Unfortunately, it sounds like the Edwards program is either hitting a limit and rather than logging the problem, looping around waiting for a fix. A lot of middleware and database engines are very poorly written to diagnose problems. I've seen far too many programs try to allocate shared memory, fail and wait for a few seconds to try again (forever).

If the Edwards program is only a 32bit program, memory mapping is definitely a problem (get shminfo from: ftp://contrib:9unsupp8@hprc.external.hp.com/sysadmin/programs/shminfo/). Adding more RAM beyond 4Gb will not help 32 bit programs. 32bits is extremely limited with large RAM requirements.

It might also be a thread issue. Rather than report on thread allocation and termination issues, the program may be silent, leaving everyone to guess.

I'd ask JDE for an instrumented version so you can actually troubleshoot the issue.


Bill Hassell, sysadmin
Philip Chan_1
Respected Contributor

Re: Some Unknown Resource Issue


Do you think the problem would be related to application specific proprietary locking mechanisms? better consult your software vendor on this.

Rgds,
Philip
Ty Roberts
Frequent Advisor

Re: Some Unknown Resource Issue

Bill, thaks for the advice. JDE is indeed a 32 bit app. We already have shminfo, but now sure what I am looking for when I run it. Can you point me in the right direction for that??

It may have something to do with the app's locking mechanisms. But what could be an explanation for the process running fine when most of the other applications on the machine are not running??? That is what is stumping us, and is forcing us to go searching around for an answer rather than just blame it on the app like I would like to :-)
Paula J Frazer-Campbell
Honored Contributor

Re: Some Unknown Resource Issue

Hi

Do you have TUSC on this box ?

http://ftp.cerias.purdue.edu/pub/tools/unix/netutils/tcpdump-hpport/


If not install and use to monitor Edwards hung job - it should give you a pointer as to where the job is stuck/hanging/waiting.


HTH

Paula



If you can spell SysAdmin then you is one - anon
Ty Roberts
Frequent Advisor

Re: Some Unknown Resource Issue

Ok.. this may help a bit. I just had them run the job and I looked at it in detail through Glance. The thing that jumped out at me was a line in the "Process Resources" screen for this PID. The "Vir Faults" value was fluctuating greatly and seemd to be very very high. I saw numbers in excess of 2264744 and traps where up around "2018". Now I was looking up info on these values since I really do not understand them but maybe some of your Guru's may be able to help. Is this a bad thing??

While this was going on the following was happening.
* Process was waiting on Prioprity
* Heavily using CPU
* Was not making ANY system Calls
* Was not updating ANY file Pointers

Ran shminfo and also ipcs -a and did not see any glaring differences between b4 the process was running and then while it was hung.

Any other hints for me??

Thanks!
harry d brown jr
Honored Contributor

Re: Some Unknown Resource Issue

Is it possible to get a 64bit version, native to this OS from JD Edwards??

live free or die
harry
Live Free or Die
Ty Roberts
Frequent Advisor

Re: Some Unknown Resource Issue

No I do not think so.. we are one of only a few compaies who runs JDE on HPUX. They mostly run on AS/400's. So I am afriad we are stuck with this 32 bit version...
harry d brown jr
Honored Contributor

Re: Some Unknown Resource Issue

And I thought my company was the only one that ran on software converted to run on hpux (usually being the only client for years). I feel your pain!

Can you move this appl to it's own A-class??


live free or die
harry
Live Free or Die
Ty Roberts
Frequent Advisor

Re: Some Unknown Resource Issue

That is what we are pushing for right now. We have several A Classes that run this app in production and wehn we switch them over to run this DEV job they work fine. We are just being pushed by the Up and Ups to find out why this is.. DO not ask me why.. but it is getting frustrating.. I wish the excuse "Cause it don't" would be good enough.. :-)
Philip Chan_1
Respected Contributor

Re: Some Unknown Resource Issue


* Process was waiting on Prioprity

Have you tried increasing the priority of the process?

* Heavily using CPU
* Was not making ANY system Calls

Heavily utilized CPU with nil or low number of system calls imply the CPU time were spent INSIDE the process (non-system), perhaps some kind of application specific semaphore structure and the program itself will loop for re-checking the satisfying condition rather than going sleep to save CPU time.

Have you ever contacted the software vendor for your problem?




Ty Roberts
Frequent Advisor

Re: Some Unknown Resource Issue

No we have not contacted the program vendor becasue the process works fine on other boxes and even on this box when we shut down all over apps but JD Edwards (the app it's self) and it's Oracle DB. So it seems like it is running into a conflict with another one of the apps. SInce it is causing a lot of Virtual Faults it seems like it is making a lot of memory calls, but how can I find what / how it is conflicting with the other running apps?
Philip Chan_1
Respected Contributor

Re: Some Unknown Resource Issue


Knowing how the internal works would help a lot when explaining these kind of strange behaviour. The software maker has the best knowledge about their programs and how they behave that's why I suggest you go after the vendor. Also, they might have reported cases similiar to yours.

If you got a support contract with the vendor then why not give them a try? (I suppose this won't cost you extra money)

Rgds,
Philip