- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Service guard Oracle dataguard hang monitoring.
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-09-2012 11:51 PM
08-09-2012 11:51 PM
Service guard Oracle dataguard hang monitoring.
Hi
The database admin has asked me to turn off oracle hang monitoring as it is detecting the oracle in a hang state when we run our os backups with Tivoli.
I have set this to alert and monitor at 60 seconds. However we had the listener monitor cause a failover as below error. Can I set any times and alerts for the listener?
I have included my service that are in my package config file.
This seems to correlate when the backup hits the /orabin. What files should be excluded from the backups.
currently the exclusions are.
exclude.fs "/oradata*"
exclude.dir "/orarecovery/*"
exclude.dir "/orabin/diag/*"
exclude.dir "/orabin/product/agent11g/*"
Aug 8 20:13:25 - Node "alsop1" The database hang check script is not responding. Killing the process
/opt/cmcluster/toolkit/oracle/toolkit.sh[3]: 25563 Killed
Aug 8 20:13:25 - Node "alsop1" ERROR: Database hang detected. There is a possiblity that the database c
ould be hung.
08/08/12-20:13:25 SGAlert message sent to: unix_monitor@ialch.co.za
Aug 8 20:13:33 - Node "alsop1" Oracle Listener listener_smsp failure detected.
Aug 8 20:13:33 - Node "alsop1" Oracle Listener listener_smsp failed
Aug 8 20:13:36 - Node "alsop1" All listeners have failed
Aug 8 20:13:36 root@alsop1 master_control_script.sh[25780]: ###### Halting package smspdg_DB ######
Aug 8 20:13:36 root@alsop1 service.sh[25791]: Halting service oracle_service_smsp
Aug 8 20:13:36 root@alsop1 service.sh[25791]: Halting service oracle_listener_service_smsp
For sake of completeness.
service_name oracle_service_ftpz
service_cmd "$SGCONF/scripts/ecmt/oracle/tkit_module.sh oracle_monitor"
service_restart none
service_fail_fast_enabled no
service_halt_timeout 300
service_name oracle_listener_service_ftpz
service_cmd "$SGCONF/scripts/ecmt/oracle/tkit_module.sh oracle_monitor_listener"
service_restart none
service_fail_fast_enabled no
service_halt_timeout 300
service_name oracle_hang_service_ftpz
service_cmd "$SGCONF/scripts/ecmt/oracle/tkit_module.sh oracle_hang_monitor 300 alert"
service_restart none
service_fail_fast_enabled no
service_halt_timeout 300
service_name dataguard_service_ftpz
service_cmd "$SGCONF/scripts/tkit/dataguard/tkit_module.sh dataguard_monitor"
service_restart none
service_fail_fast_enabled no
service_halt_timeout 300
- Tags:
- Oracle
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-10-2012 03:48 AM
08-10-2012 03:48 AM
Re: Service guard Oracle dataguard hang monitoring.
Looking at your errors, I'm going to assume you have posted the first relevant entry in the log file and there is nothing else interesting on or around this time in your log file:
Aug 8 20:13:25 - Node "alsop1" The database hang check script is not responding. Killing the process
/opt/cmcluster/toolkit/oracle/toolkit.sh[3]: 25563 Killed
This tells us that a sqlplus call of "SELECT STATUS FROM V$INSTANCE" has been hung for 5 minutes - can your DBA explain why that might happen during an OS backup?
However that I don't think caused your failure, as it appears you have that set to only alert, rather than initiating a failover...
Aug 8 20:13:33 - Node "alsop1" Oracle Listener listener_smsp failure detected.
Aug 8 20:13:33 - Node "alsop1" Oracle Listener listener_smsp failed
Aug 8 20:13:36 - Node "alsop1" All listeners have failed
This is what is causing Serviceguard to halt the package - again here this looks like a call to "lsnrctl status listener_smsp" has returned a non-zero value... why would that ahppen during an OS backup?
So the big question is, what is Tivoli doing during an OS backup to cause Oracle to stop responding? All seems a bit odd to me.
Of course, this being a community support forum, it could be, you don't actually care about solving what is really going on here, you just want to get rid of the error and move on to the next issue in your queue ;o) - If that's the case, you should possibly consider having the Tivoli backup create a maintrenence flag before it starts and delete it after the backup is finished. This is pretty easy to do... I assume Tivloi has some sort of capability to insert a pre- and post- backup script? If so just have it touch a file called "oracle.debug" in the packages directory before the backup, and remove it after the backup. While the file <package dir>/oracle.debug exists, Serviceguard won't monitor the database.
But me, I'd want to understand what is going on...
I am an HPE Employee
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-10-2012 04:05 AM
08-10-2012 04:05 AM
Re: Service guard Oracle dataguard hang monitoring.
Hi
I would defnitley like to solve the problem.
However the DBA cannot tell me why the database is hanginging.
SELECT STATUS FROM V$INSTANCE According to him shows open. But it must at some stage not show this for it to hang.
Tivoli is just a bitch to work with and I cannot understand it at all.But that is all the customer has.
It defnitley seems to be when it is backing up /orabin but exactly which file i cannot tell. The timeouts were 60 seconds i have just changed them to 300 seconds as of today.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-10-2012 04:24 AM
08-10-2012 04:24 AM
Re: Service guard Oracle dataguard hang monitoring.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-13-2012 11:09 PM
08-13-2012 11:09 PM
Re: Service guard Oracle dataguard hang monitoring.
ok the listner is monitred through the halistener.mon which is actually just a call to the command .
lsnrctl status listener_id.
which gives me this output.
LSNRCTL for HPUX: Version 11.2.0.2.0 - Production on 14-AUG-2012 08:08:33
Copyright (c) 1991, 2010, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=alsop1.ialch.co.za)(PORT=1531))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=smsp.ialch.co.za)))
STATUS of the LISTENER
------------------------
Alias listener_smsp
Version TNSLSNR for HPUX: Version 11.2.0.2.0 - Production
Start Date 08-AUG-2012 20:31:56
Uptime 5 days 11 hr. 36 min. 36 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /orabin/product/11.2.0/dbhome_1/network/admin/listener.ora
Listener Log File /orabin/diag/tnslsnr/alsop1/listener_smsp/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=alsop1.ialch.co.za)(PORT=1531)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1531)))
Services Summary...
Service "smsp.ialch.co.za" has 2 instance(s).
Instance "smsp", status UNKNOWN, has 1 handler(s) for this service...
Instance "smsp", status READY, has 1 handler(s) for this service...
Service "smspXDB.ialch.co.za" has 1 instance(s).
Instance "smsp", status READY, has 1 handler(s) for this service...
The command completed successfully
what does the UNKNOWN mean is it normal?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-14-2012 01:56 AM
08-14-2012 01:56 AM
Re: Service guard Oracle dataguard hang monitoring.
At the time of failure I had this error. I have to state now I am not that familiar with Oracle yet. We did not have any network issues in my syslog at this time at all. And the network admin says he did not see anything at this time.
what was running at this time was the Tivoli backup. But I cannot say which file it was backing up at this time as it does not have tiem stamps for each file backed up.
08-AUG-2012 20:13:30 * <unknown connect data> * (ADDRESS=(PROTOCOL=tcp)(HOST=::1)(PORT=33917)) * status * <unknown sid> * 12525.
THe oracle error states below.
ORA-12525: TNS:listener has not received client"s request in time allowed
Cause: The listener disconnected the client because the client failed to provide the necessary connect information within the allowed time interval. This may be a result of network or system delays; or this may indicate that a malicious client is trying to cause a Denial of Service attack on the listener.
Action: If the error occurred because of a slow network or system, reconfigure INBOUND_CONNECT_TIMEOUT to a larger value. If a malicious client is suspected, use the address in listener.log to identify the source and restrict access. Turn on tracing for more information.