System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

OS / kernel / memory corruption?

SOLVED
Go to solution
Matt Shaffer_1
Regular Advisor

OS / kernel / memory corruption?

running rhel3u4 on hp ml350 g4. mirrored hard drives. 1G of mem and 1 CPU. after a reboot friday night we've been seeing strange things on server. example: when run I run ls -la in this one particular dir I get I/O errors and some of the files have time stamp of 1970.

Also, I can create a file in this dir and I can rsync to the dir but not all files get rsynced.

I see errors when I run dmesg and there are errors in /var/log/messages.1. I've attached the I/O errors and the errors from messages.1.


7 REPLIES
Ivan Ferreira
Honored Contributor
Solution

Re: OS / kernel / memory corruption?

The I/O errors is to be worried about. How is mirrored hardware/software? What is the current status of the mirrors? I have seen hardware RAID with a disk failure to produce file system corruption (No RAID 0). This could be a problem with the module or the controller itself.

You should take a backup of what you can, check the mirror status, and run fsck.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Matt Shaffer_1
Regular Advisor

Re: OS / kernel / memory corruption?

The mirror is on the hardware level. we have a smart array 641 card installed.
Matt Shaffer_1
Regular Advisor

Re: OS / kernel / memory corruption?

mirror looks good
macosta
Trusted Contributor

Re: OS / kernel / memory corruption?

I agree with Ivan - this smells of filesystem corruption.

Also, on a box with 8 CPUs, why are all disabled except CPU0 and CPU4?? According to the DOC you attached, this is the case.
Matt Shaffer_1
Regular Advisor

Re: OS / kernel / memory corruption?

this server only has 1 cpu.
macosta
Trusted Contributor

Re: OS / kernel / memory corruption?

Matt, are you sure you posted the right messages file then? I see a system startup, no obvious filesystem errors, and this:

May 17 00:23:21 7758 kernel: 8 CPUs total
May 17 00:23:21 7758 kernel: Local APIC address fee00000
...
May 17 00:23:21 7758 kernel: Processors: 2

This indicates 8 physical processors, of which 2 are enabled and available to the OS.

In the end, backup and either fsck or start with a fresh filesystem and restore.
Matt Shaffer_1
Regular Advisor

Re: OS / kernel / memory corruption?

mirror was corrupt. had to restore using HP DPX OBDR.