- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Oracle 10gR2 RAC + HP-UX 11.11: IPC Timeout
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-09-2008 11:47 AM
10-09-2008 11:47 AM
Oracle 10gR2 RAC + HP-UX 11.11: IPC Timeout
I´ve been having a rough time with a two node Oracle 10gR2 (10.2.0.4) RAC (no SGeRAC) running on HP-UX 11.11.
Sometimes (the problem is intermittent and does not seem to be related with the database load, but is very frequent - we even disabled the second instance) one (and only one) of the instances abort with "IPC timeout" errors, which follow below:
IPC Send timeout detected. Receiver ospid 9342
Tue Sep 23 19:30:46 2008
Errors in file /dbs/trace/snoffprd/bdump/snoffprd2_lms1_9342.trc:
Tue Sep 23 19:30:48 2008
Trace dumping is performing id=[cdmp_20080923193048]
Tue Sep 23 19:30:48 2008
Waiting for clusterware split-brain resolution
We have two databases on the same Cluster, only one of them suffers from this problem.
Oracle suggested we changed from a crossover setup to a gigabit switch, mentioning crossover interconnects were not supported. The case is still open and they´ve sent the problem to their development team.
It does not seem to be a physical media problem, since we have two NICs, which we tested. We also changed the MTU from 9000 to 1500 and vice-versa without success.
Have any of you seen anything like this? This is happening since we migrated the second instance into the Cluster. The first one has been running smoothly for several weeks, since it went to production.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-09-2008 11:56 AM
10-09-2008 11:56 AM
Re: Oracle 10gR2 RAC + HP-UX 11.11: IPC Timeout
Resolve the "supported" issue first, then you can look at this problem if it doesn't go away when you get to a "supported configuration"
HTH
Duncan
I am an HPE Employee
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-09-2008 12:01 PM
10-09-2008 12:01 PM
Re: Oracle 10gR2 RAC + HP-UX 11.11: IPC Timeout
Also, we´ve been running 9i RAC on the very same setup for several years. There is no sense blaming the crossover setup since it really doesn´t seem to be a physical media problem. Anyway, we´re aware crossover might have issues on autonegotiation, so we followed Oracle´s advice.
Oracle mentored the whole migration proccess and none of their personeel complained about the crossover setup.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-09-2008 10:35 PM
10-09-2008 10:35 PM
Re: Oracle 10gR2 RAC + HP-UX 11.11: IPC Timeout
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-09-2008 11:40 PM
10-09-2008 11:40 PM
Re: Oracle 10gR2 RAC + HP-UX 11.11: IPC Timeout
I won't comment on Oracle not mentioning the crossover cable during your migration.
So what does CPU utilisation look like at the point in time that you have the issue? Were the systems heavily loaded? (you need to look at both). Your particularly looking for a lot of sys mode utilisation.
Th problem as I see it with a pure Oracle cluster stack on any platform apart from Linux is that the clusterware operates completely in user space and as such the subsystems that handle "hung node detection" are always going to be more flakey than those in a product like Serviceguard which has access to kernel routines for this sort of stuff. The CRS processes need to get CPU time within certain boundaries to ensure that they can respond to heartbeats etc from other nodes. This means they usually run at a real-time priority - which can mean they effectively end up tied to just one processor. If that processor is busy doing something in kernel space, then you end up with these sorts of issues. On Linux, Oracle are able to introduce a kernel module (the hangcheck timer) to resolve this, and Serviceguard is able to do something similar on HP-UX.
So what to do... obviously you need to continue to pursue this with Oracle support as these sorts of issues are very complex (I doubt you'll get a fix on these forums as we're usually looking at the internals of Oracle CRS and the HPUX kernel)but in the meantime I would look at bringing myself as up-to-date as possible on OS patches - particularly kernel patches as anything that fixes kernel issues which caused large amounts of SYS cpu time could resolve the issue (e.g. spinlock contention).
Apart from that - the suggestion of more CPU/memory is of course a good one as that will reduce the chance of the event happening.
HTH
Duncan
I am an HPE Employee
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2008 04:00 AM
10-10-2008 04:00 AM
Re: Oracle 10gR2 RAC + HP-UX 11.11: IPC Timeout
Thank you for your answers.
We have observed no clear correlation between the split brains and the system´s load. Although this is a rather busy cluster, one of the machines (running 2 databases) is 70% idle on average and the other (in which we disabled the second instance) is now 95% idle.
The CRS processes are running with the default nice (20). We do now know clearly what´s the effect of lowering it, since we were told (although not why for certain classes of processes like (ora|asm)_lms*) that changing the nice of Oracle processes is not recommended.
We´re in contact with Oracle, and any progress we make I will update this thread.
Thank you again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2008 04:07 AM
10-10-2008 04:07 AM
Re: Oracle 10gR2 RAC + HP-UX 11.11: IPC Timeout
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2008 04:18 AM
10-10-2008 04:18 AM
Re: Oracle 10gR2 RAC + HP-UX 11.11: IPC Timeout
We have 2 databases on both machines (4 instances). The two databases are rather balanced in terms of load. Only one of them crashes. The other keeps running smoothly. This not a cluster-wide brainsplit.