ProLiant Servers (ML,DL,SL)
1753787 Members
7692 Online
108799 Solutions
New Discussion

Re: DL380 G4 RHEL 4 + Oracle RAC random freeze

 
Kris Vanden Berghe
New Member

DL380 G4 RHEL 4 + Oracle RAC random freeze

Hello,

I have a 2 node Oracle RAC 10.2 running on two DL380 G4 servers.
Since we went in production 2 years ago, we have random freezes on both servers.

One system just hangs all of a sudden, console is frozen but network layer (ping and tcp ports) still works. The other node notices there's a problem and tries to take over the crashed node ip, but fails offcourse. So my failover fails and half of my db connections hang (tcp port of oracle listener is still open but never responds back).

On the crashed server I see that the leds of his local disks are constantly lighted. Only thing I can do is a hard reset of the server. Nothing found in any log, everything is fine and the next line is my reboot.

It happens on both nodes at different times. Can happen 3 times a week and then sometimes we don't have any problem for over 6 months. I noticed when I cleanup the database the freezes disappear for a longer period.
We already upgraded to the latest firmware and drivers.

Setup:
2x HP Proliant DL380 G4 w/ 4GB RAM
RHEL 4 upd 7
Oracle RAC 10.2.0.2 (2nodes)
EVA4400

Since 2 months we are running on a EVA4400. Before it was on a MSA1000. Freezes are still there so I think I can exclude shared storage.

Contacted HP, Red Hat and Oracle about the crashes, but they just point to each other.

I found a similar thread with exact the same problem but no answer:
http://forums11.itrc.hp.com/service/forums/bizsupport/questionanswer.do?threadId=831929

I installed oswatcher from Oracle but doesn't see anything special right before the freeze. Cpu, memory, swap, io is normal or even low at that time, no long running processes, ...

I don't have a clue in which direction I need to look now.
Anybody got some idea's what to do next ?

Thanks,
Kris
1 REPLY 1
Jean-Yves Picard
Trusted Contributor

Re: DL380 G4 RHEL 4 + Oracle RAC random freeze

hello

just a tough, have you check this document :
http://www.redhat.com/f/pdf/rhel/Oracle-10-g-recommendations-v1_2.pdf ?

Jean-Yves