1821646 Members
2977 Online
109633 Solutions
New Discussion юеВ

ora_pmon and crsd.bin

 
John Jimenez
Super Advisor

ora_pmon and crsd.bin

early this morning users started having issues. On the system side the only thing that I could see abnormal was that ora_pmon was using 99% of one cpu on server#1

oracle 4115 1 237 Jan 19 ? 3722:48 ora_pmon_MXPROD2

The Oracle admin fixed it. I think he restarted the listener. I was still concerned about ora_pmon, but we have never seen this go high, and thought we would deal with this off hours instead of causing more issues on hours

but now an hour later users are getting errors again and crsd.bin are hogging all 4 cpu's on server#2. Also Zombies are increasing.

Has anyone ever seed this happen? do you think these two processes are related?
Hustle Makes things happen
14 REPLIES 14
John Jimenez
Super Advisor

Re: ora_pmon and crsd.bin

I am rebooting server1 which still had ora_mon using 100%. Once it comes up I am going to rboot server2, which crsd.bin is taking 350% (we have 4 cpu's). both of these servers are running hpux 11.23.
Hustle Makes things happen
John Jimenez
Super Advisor

Re: ora_pmon and crsd.bin

Something else new that I have not seen before. When I rebooted I got 3 swap reservation failures.

System shutdown time has arrived
reboot: CAUTION: some process(es) wouldn't die
Deferred swap reservation failure pid: 27556
Deferred swap reservation failure pid: 27556
Deferred swap reservation failure pid: 28522

sync'ing disks (0 buffers to flush):
0 buffers not flushed
0 buffers still dirty
Hustle Makes things happen
Robin T. Slotten
Trusted Contributor

Re: ora_pmon and crsd.bin

We have a similar issue
We have been battling very high CPU loads caused by crs_stat.bin. crsd.bin is usually near the top of the list of high CPU utilization as well. We have been incurring system panics at an increasing rate for several months. Last week we installed an additional processor in each of our systems. That relieved the system panics (Caused by Oracle calling for a TOC)to a degree. We believe the problem is load related, but have not been able to determine the problem so far. The DBA has opened a number of TAR's with Oracle with out any results. I under stand we will have an Oracle consultant on site within a week to see if (s)he can determine the cause of the system crashes. While I cannot offer any solutions, I can tell you that we have experienced very high loading by crs processes.

I'll update if I find anything else.
Rob...
IF you do it more than twice, write a script.
skt_skt
Honored Contributor

Re: ora_pmon and crsd.bin


we are using oracle 10g(10.2.0.2)/CRS cluster on HP-UX 11.11. But taking this much CPU by those oracle processes are not normal in our enviroment..

Also oracle had provided TOC patch(PHKL) which will help to create crash dump in case of TOC.

Robin T. Slotten
Trusted Contributor

Re: ora_pmon and crsd.bin

We are running 10G(10.2.0.2)/crs on 11.23. You don't happen to have the PHKL-number that I can cross-ref do you?

Rob... ( I'd give you point for that if I could.)
IF you do it more than twice, write a script.
John Jimenez
Super Advisor

Re: ora_pmon and crsd.bin

Yes if you have that patch number it would be apprciated. And if I have the patch, where do I look for the crash dump?
In past years I have only ran flat file databases on HP/UX. Almost Two years ago we got a new app on Oracle 10g with RAC on two RP7420's each running hpux (11.23) with 16 gig ram 8 gig of swap. patch bundles are of 6-2007. We seem to have this crsd.bin issue off and on. We were okay for about 6 months, but have gotten this problem for the 2nd time in two months.
Hustle Makes things happen
skt_skt
Honored Contributor

Re: ora_pmon and crsd.bin

TOC patch PHKL_36700 for hp-ux 11.11
Robin T. Slotten
Trusted Contributor

Re: ora_pmon and crsd.bin


The equivalent patch for 11.23 is PHKL_34941. The bad news for us is we already have that patch on. But thanks.
Rob...
IF you do it more than twice, write a script.
John Jimenez
Super Advisor

Re: ora_pmon and crsd.bin

I have that one too. This morning everyone was logged into server1 no one into server2 and crsd.bin went crazy on server1. I had to reboot both of them again this morning.

PHKL_34941 Oracle Clusterware, Setboot for PCI Express
Hustle Makes things happen
Robin T. Slotten
Trusted Contributor

Re: ora_pmon and crsd.bin

I forgot to respond to your crash question.

/etc/rc.config.d/savecrash

# SAVECRASH: Set to 0 to disable saving system crash dumps.
SAVECRASH=1

# SAVECRASH_DIR:Directory name for system crash dumps. Note: the filesystem
# in which this directory is located should have as much free
# space as your system has RAM.
# Default directory:
# SAVECRASH_DIR=/var/adm/crash

--------
If you have less space than RAM, you can sometimes pull some information out of the dump anyway.

You can do some self analysis using Q4. Look in the what.out.
------
ITRC DOCUMENT ID: OZBEKBRC00000611

USING Q4 TO ANALYZE SYSTEM DUMP FILES
(For HPUX 10.10-11.23 systems)

http://www12.itrc.hp.com/service/cki/docDisplay.do?docLocale=en&docId=emr_na-c01021636-8

Rob...
IF you do it more than twice, write a script.
John Jimenez
Super Advisor

Re: ora_pmon and crsd.bin

Thanks the info in Feb, Robin. I did not see it cime in last month. I will assign some points

Today we had oracle_pmon issues again. The Oracle Admin, said that some logs filled up. Scripts run durimg Data Protector backups that clean them up. But There has been lots of changes today, so it looks like some logs filled up.
Hustle Makes things happen
Yogeeraj_1
Honored Contributor

Re: ora_pmon and crsd.bin

> But There has been lots of changes today, so it looks like some logs filled up.

If you want to find out the interval during which this occurred, you can run the following query:
=========================================
prompt*****************************************************
prompt*** Redolog Switch Rate by Date and Hour ****
prompt*****************************************************
set heading on;
column day format a3
col Total for 99G990;
col h00 for 999;
col h01 for 999;
col h02 for 999;
col h03 for 999;
col h04 for 999;
col h05 for 999;
col h06 for 999;
col h07 for 999;
col h08 for 999;
col h09 for 999;
col h10 for 999;
col h11 for 999;
col h12 for 999;
col h13 for 999;
col h14 for 999;
col h15 for 999;
col h16 for 999;
col h17 for 999;
col h18 for 999;
col h19 for 999;
col h20 for 999;
col h21 for 999;
col h22 for 999;
col h23 for 999;
col h24 for 999;


break on report
compute max of "Total" on report
compute max of "h00" on report
compute max of "h01" on report
compute max of "h02" on report
compute max of "h03" on report
compute max of "h04" on report
compute max of "h05" on report
compute max of "h06" on report
compute max of "h07" on report
compute max of "h08" on report
compute max of "h09" on report
compute max of "h10" on report
compute max of "h11" on report
compute max of "h12" on report
compute max of "h13" on report
compute max of "h14" on report
compute max of "h15" on report
compute max of "h16" on report
compute max of "h17" on report
compute max of "h18" on report
compute max of "h19" on report
compute max of "h20" on report
compute max of "h21" on report
compute max of "h22" on report
compute max of "h23" on report


SELECT trunc(first_time) "Date",
to_char(first_time, 'Dy') "Day",
count(1) as "Total",
SUM(decode(to_char(first_time, 'hh24'),'00',1,0)) as "h00",
SUM(decode(to_char(first_time, 'hh24'),'01',1,0)) as "h01",
SUM(decode(to_char(first_time, 'hh24'),'02',1,0)) as "h02",
SUM(decode(to_char(first_time, 'hh24'),'03',1,0)) as "h03",
SUM(decode(to_char(first_time, 'hh24'),'04',1,0)) as "h04",
SUM(decode(to_char(first_time, 'hh24'),'05',1,0)) as "h05",
SUM(decode(to_char(first_time, 'hh24'),'06',1,0)) as "h06",
SUM(decode(to_char(first_time, 'hh24'),'07',1,0)) as "h07",
SUM(decode(to_char(first_time, 'hh24'),'08',1,0)) as "h08",
SUM(decode(to_char(first_time, 'hh24'),'09',1,0)) as "h09",
SUM(decode(to_char(first_time, 'hh24'),'10',1,0)) as "h10",
SUM(decode(to_char(first_time, 'hh24'),'11',1,0)) as "h11",
SUM(decode(to_char(first_time, 'hh24'),'12',1,0)) as "h12",
SUM(decode(to_char(first_time, 'hh24'),'13',1,0)) as "h13",
SUM(decode(to_char(first_time, 'hh24'),'14',1,0)) as "h14",
SUM(decode(to_char(first_time, 'hh24'),'15',1,0)) as "h15",
SUM(decode(to_char(first_time, 'hh24'),'16',1,0)) as "h16",
SUM(decode(to_char(first_time, 'hh24'),'17',1,0)) as "h17",
SUM(decode(to_char(first_time, 'hh24'),'18',1,0)) as "h18",
SUM(decode(to_char(first_time, 'hh24'),'19',1,0)) as "h19",
SUM(decode(to_char(first_time, 'hh24'),'20',1,0)) as "h20",
SUM(decode(to_char(first_time, 'hh24'),'21',1,0)) as "h21",
SUM(decode(to_char(first_time, 'hh24'),'22',1,0)) as "h22",
SUM(decode(to_char(first_time, 'hh24'),'23',1,0)) as "h23"
FROM V$log_history
group by trunc(first_time), to_char(first_time, 'Dy')
Order by 1;

clear breaks

set heading off;

=========================================
And from there you can investigate further.

hope this helps!

kind regards
yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Robin T. Slotten
Trusted Contributor

Re: ora_pmon and crsd.bin

UPDATE on our issues. We added another CPU to the 2 nodes and the load decreased from about 80% ambient load spiking at 100% for 15-20 minutes at a time, down to about 30-40%. Our crashes have mostly stopped. My DBA found an issue about accumulating too many logs in /u10/app/oracle/product/10.2/crs/log//client
Apparently as the number of logs increases, Oracle scans through them for some reason I didn't grasp. At some point crsd.bin cannot keep up and the load increases drastically and eventually will panic one of the nodes.
Rob...
IF you do it more than twice, write a script.
John Jimenez
Super Advisor

Re: ora_pmon and crsd.bin

Thanks for the script Yogeeraj. I guess the interface the Oracle Admin uses for monitering, stopped working a week ago, and he has not figured out how to make it work yet. that is why it filled up. I will see if I can find time today to run this script.
Robin,
Thank you for this information. We have 4 CPU's on both of our RP7420's. CPU is never an issue (usually 20%-25% at most, of course unless these oracle processes go crazy. But it sounds like cleaning up logs has been one of the issues oracle admin. has encountered.
Hustle Makes things happen