cancel
Showing results for 
Search instead for 
Did you mean: 

New Installation Red Hat

Dave La Mar
Honored Contributor

New Installation Red Hat

Let me start by saying Linux is completely new to our shop and I confess to have little knowledge of administration. Yes, I realize training is a must.
What we have is a four node cluster, Oracle RAC.
What we see is periodice reboots of each node. These reboots occur at the same time on each node but on different days, at different times of the day.
The log files are not helpful in determining the issue. On Friday and Saturday night we saw reboots at 21:16 and 20:10 respectively. One would think this is a result of "something" cron'd, but I do not see any related cron entries for these times.
Our DBA group has opened an issue with Oracle, but I'm hoping someone out there has encountered this and can point me in the right direction.

Regards,

-dl
"I'm not dumb. I just have a command of thoroughly useless information."
12 REPLIES
Ivan Ferreira
Honored Contributor

Re: New Installation Red Hat

>>> These reboots occur at the same time on each node but on different days, at different times of the day.

I don't understand this part correctly. But node reboots on Oracle RAC can be caused by Oracle OCR. You should find messages like "Rebooting for cluster integrity".

This may be an Oracle BUG, can be a problem with your nodes interconnects, timeout setting or access to the shared ocr/voting disk.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Dave La Mar
Honored Contributor

Re: New Installation Red Hat

Yes, Ivan, I agree it could be something to do with the voting disks. These are disks from our EVA 5000, but the difficulty is finding some information on the reason. The log files are not noting any eye catchers.
As for the reboots times - All four nodes seem to reboot at the same random time on any given day.

Thanks for any insight you may share.

-dl
"I'm not dumb. I just have a command of thoroughly useless information."
Dave La Mar
Honored Contributor

Re: New Installation Red Hat

I should have noted, as well, that the hardware is DL380R05.

-dl
"I'm not dumb. I just have a command of thoroughly useless information."
skt_skt
Honored Contributor

Re: New Installation Red Hat

what is the OracleRAC version?If nothing reported on the message file(ideal for an h/w related issue)then what does the alert logs for the instances reports? Do you see any strange messages on messages

Are you able to see the system statistics(like MEM/swap/CPU/load avaerag etc) around the problem window? was it stable earlier? or it is a brand new setup?
Don Vanco - Linux Ninja
Regular Advisor

Re: New Installation Red Hat

Dave - given that you're new to Linux, are you _sure_ you're looking at the right logs? Between the logs that the system has and the logs that Oracle adds I am amazed to see that you can find no compelling data.....

Also - have you looked at the logs on the EVA?

Given the apparent "time based" reboots - are there other processes that access the cluster at those times? There may be an external trigger....

Good luck -
DV
Brian M Welch
Frequent Advisor

Re: New Installation Red Hat

I've been working on this issue with Dave, and I can attest to the lack of clues in our Linux log files. ( /var/log/messages, and the like.)

We've packaged all of the log files including Oracle Cluster logs, and it appears we've stumped the folks at Metalink as well.. Last I had heard, they were looking at the OCFS and RAW device files for clues.

I'd come across some articles about log file rotation and syslog.d issues in RHEL RAC clusters. Anyone have any insight on this?
Rob Leadbeater
Honored Contributor

Re: New Installation Red Hat

Hi Dave,

I would probably focus on the shared storage... If all of the nodes reboot at the same time, that suggests to me a problem on the EVA and/or the SAN switches...

That said, you haven't really given us much to go on in terms of the hardware/software configuration...

Cheers,

Rob
Dave La Mar
Honored Contributor

Re: New Installation Red Hat

Rob -
We just had HP Support look at the EVA logs, but nothing was found. Yes, there is a switch involved between the machines and the EVA switch.
Our plan was to attack the switch next.
Thanks for the input.

Regards,

-dl
"I'm not dumb. I just have a command of thoroughly useless information."
Rob Leadbeater
Honored Contributor

Re: New Installation Red Hat

Hi Dave,

I was hinting that you might get more help, if you were to detail the setup more completely...

How are things connected together ?
What versions are you running ?

Cheers,

Rob
Brian M Welch
Frequent Advisor

Re: New Installation Red Hat

We are running Red Hat Enterprise Linux ver. 4.5 using a 64 bit SMP kernel. [2.6.9-55.0.0.2.EL SMP] As Dave mentioned before, there's 4 ProLiant DL380R05 boxes in the cluster. The interconnect switch is a HP Procurve. We are using Qlogic HBA's which plug directly into the SAN Brocade.

Additionally, the Oracle version is 10.2.0.3.
Robert_Boyd
Respected Contributor

Re: New Installation Red Hat

Are you running the HP System Management tools on the members of the cluster? Is ASR enabled? There is a known problem that won't be fixed until SmartStart 8.0 is available. Check and see if you're getting anything in your ILO logs on any of these servers.

There is another article available that discusses this problem.

http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1201035290165+28353475&threadId=1135440

Robert
Master you were right about 1 thing -- the negotiations were SHORT!
Brian M Welch
Frequent Advisor

Re: New Installation Red Hat

Robert,

Thanks for your reply. We've looked in the ILO logs, and /var/log/messages, and haven't seen anything that would indicate that we are having issues with ASM.