General
cancel
Showing results for 
Search instead for 
Did you mean: 

Oracle OEM being clobbered by HP EMS/STM

Coolmar
Esteemed Contributor

Oracle OEM being clobbered by HP EMS/STM

We have an rp7420 with two npars. Each npar has two vpars. On each vpar we have STM A.47 and EMS 1.04.00.02 running. We also have Oracle 10g OEM running on this system. Our problem is that when the process ia64_corehw is running (which has been upgraded to the Dec 04 release) along with its process, the OEM stuff (emagent and java) all spiral out of control. They start chewing up 100% cpu right away and stay up there until OEM is shutdown. When EMS is shutdown, OEM is fine. Any ideas?
2 REPLIES
Andrew Merritt_2
Honored Contributor

Re: Oracle OEM being clobbered by HP EMS/STM

Sounds odd; not sure what the connection between the OnlineDiags and Oracle would be, unless it's contention for a system resource. Have you tried selectively shutting down parts of the OnlineDiags to get an idea of which process might be the culprit? What are you doing to shut down EMS? Is there any change in the resource usage of the OnlineDiags when you start the OEM package?

One thing I have a memory of is that Oracle may have a conflict in the ports it is configured to use. You could use 'lsof' to find which ports Oracle is using, and compare with the OnlineDiags (the OnlineDiags use a system-allocated port in the range 49152 to 65535; if Oracle is configured to use one in this range there could be a conflict).

The ia64_corehw process should not be causing any problems, and its defunct child by definition isn't using any resources.

I'd concentrate on trying to diagnose exactly what the Oracle processes are spending their time doing, e.g. with tusc, glance or whatever tools you have. Once you know that, you might have an idea how the OnlineDiags might be involved.

Andrew
Andrew Merritt_2
Honored Contributor

Re: Oracle OEM being clobbered by HP EMS/STM

An update on this, a problem has been found with ia64_corehw on some vPar systems (the problem is actually in the vPar monitor code). This is currently under investigation by HP.

The symptom is that ia64_corehw is in a fairly tight loop reading the same record repeatedly and using 100% CPU on one processor. I'm not sure from your description, however, if this is the same problem you are seeing. Certainly the process is not a factor in this (and in fact if the process is present, then this problem is not being seen, since it's that child process that does the looping).

Restarting the ia64_corehw monitor doesn't clear the problem.

If you are seeing this, then currently the only work-around is to disable the ia64_corehw monitor when this occurs.

1. Login as user root.

2. Run monconfig:
# /etc/opt/resmon/lbin/monconfig

3. Select:
(K)ill (disable) monitoring
(Q)uit

4. Move the executable for the daemon to directory org. We can
restore the monitor if necessary by moving the executable back.

# cd /usr/sbin/stm/uut/bin/tools/monitor
# mkdir org
# mv ia64_corehw org

5. Move the dictionary entry to the directory org.
We can restore the monitor if necessary by moving the file back.

# cd /etc/opt/resmon/dictionary
# mkdir org
# mv ia64_corehw.dict org

6. Remove the .hwa file. The file will be recreated automatically
if all monitor files are copied back and the monitor is restarted.

# cd /var/stm/data/tools/monitor
# rm ia64_corehw.hwa

7. Move the startup, configuration and psm files to the directory org.

# cd /var/stm/config/tools/monitor
# mkdir org
# mv *ia64_corehw.* org

8. Run monconfig again:
# /etc/opt/resmon/lbin/monconfig
Select
(E)nable Monitoring
(Q)uit

9. Verify that the monitor is not running any more:
# ps -ef | grep ia64_corehw

Andrew