System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

kernel messages in /var/log/message

 
kelvinlnx
Advisor

kernel messages in /var/log/message

I have a oracle server running on Red Hat Linux that suddenly went down. Can someone please tell me what does the output of this extracted messages from /var/log/message tell me. Thanks.
4 REPLIES
Jeeshan
Honored Contributor

Re: kernel messages in /var/log/message

check the all oracle related kernel parameter and also stop the unnecessary services in Linux. (i.e. iptables, smartd, rhnd etc.)
a warrior never quits
kelvinlnx
Advisor

Re: kernel messages in /var/log/message

Have already checked the kernel parameters. They are already correct. This server has already been running for quite some time. I'm just wondering

1) Why is there a segfault for vgdisplay when there were no problem before. And everything works fine after reboot.
2) 3 seconds after the SysRq: Show Memory and the read write starts to have problem. (I didn't invoke the sysrq. is there a process that will invoke it?)
3) Why do I get the cpu .. hot/cold messages.
4) Is it possible that the cause of the problem is cause by the segfault (Q1) or the SysRq (Q2).
Prasu
Frequent Advisor

Re: kernel messages in /var/log/message

Pls confirm, whether the attached message is before reboot / or after reboot ?
If it is before reboot log, I am able to see

1. Oracle cluster monitoring (clsomon) failed with fatal status 12 bye dec 1 00:5:03
2. CRS failed & rebooted initiated by dec 1 00:5:04
3. After that the cluster didâ t come up fully â ¦ & was waiting on dependencies bye dec 1 00:5:10
4. And syslogd restrted by dec 1 00:15:42

We can assume that , the system reboot root cause is from oracle cluster side.

So please check

System kernel parameters value & verify with DBA
Sysm processor / memory utilization before reboot ( may mlsomon was consuming more resources)
Emcpower driver & filesystem status by the reboot time.


Regards
Prasu
kelvinlnx
Advisor

Re: kernel messages in /var/log/message

Hi Prasu,
The log was before reboot. Thanks for the reply.
This is a oracle RAC primary node in a 2 node cluster. And you were right, when the Oracle cluster detected the problem, that's when it rebooted this machine at 00:05 and the machine came back alive at 00:15 (after reboot). As for the kernel parameters, they are already correct according to the dba. Here's the settings:
128G Physical RAM
128G Swap

fs.file-max = 262144
kernel.msgmax = 16384
kernel.msgmnb = 32768
kernel.shmall = 33554432
kernel.shmmax = 137438953472
kernel.shmmni = 4096
net.ipv4.ip_local_port_range = 1024 65000
net.core.rmem_default = 1048576
net.core.rmem_max = 1048576
net.core.wmem_default = 262144
net.core.wmem_max = 262144
fs.aio-max-nr = 1048576
kernel.sem=250 32000 128 128

Too bad I can't really tell what's the memory utilization before that happened. Any reason why the SysRq was invoked by itself? and does it normally show the cpu-hot/cold when invoked? What about the vgdisplay segfaulting?