Serviceguard
cancel
Showing results for 
Search instead for 
Did you mean: 

SGX 11.18/XDC fs keeps going read-only !

Unix Team_6
Advisor

SGX 11.18/XDC fs keeps going read-only !

Hi all,

FYI, we've had another problem on our big linux serviceguard oracle db cluster. One of the 8 pkgs (Dbs) on it started going read-only (day after it went live);

We have an HP Proliant running Redhat 4.7 (64bit) with EMC SAN and EMC Powerpath and we have a volume group and lvols on this san and since we moved a database to it yesterday (oracle) twice our lvol with db archivelogs on has gone read-only. these are the errors from messages/dmesg;

FIRST ERROR;
journal_bmap: journal block not found at offset 1036 on dm-27
Aborting journal on device dm-27.
__journal_remove_journal_head: freeing b_committed_data
ext3_abort called.
EXT3-fs error (device dm-27): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only

SECOND ERROR;
EXT3-fs warning (device dm-27): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
EXT3-fs warning (device dm-27): ext3_clear_journal_err: Marking fs in need of filesystem check.
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
EXT3 FS on dm-27, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.

After a lot of talking to redhat it appears its a problem with our version of powerpath being incompatible with our kernel version. We run Redhat AS 4.7/64bit and Powerpath 4.5.3-003. Ths solution is to upgrade to Powerpath 5.

We had to fix it before we knew about upgrading PP so instead we replaced the san luns with new ones and no more read-only filesystem so it seems that maybe erroneously powerpath was picking up a san issue and passing the error on to the redhat kernel (despite no powerpath errors or logs indicating such, but all 6 other packages are fine and never experienced this issue, and the redhat kernel has some idiotic AI (artifical intelligence) to change it to read-only to protect it - when in fact is causes the problem by taking our db down!

Redhat had the same bug in their older kernels which was fixed by upgrading to 4.6 but now the bug also exists in their filesystem driver - lets hope they will remove it or allow us to disable it one day too.
2 REPLIES
Steven E. Protter
Exalted Contributor

Re: SGX 11.18/XDC fs keeps going read-only !

Shalom,

There are some recent problems found in the RHEL 4.7 kernel that might be relevant to this problem.

The root of the problem may be the filesystem. ext3 is not designed for clustered operations.

You might find that going with GFS will also prevent this problem.

You have been harmed by a bug that RH fixed and then re-introduced.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Unix Team_6
Advisor

Re: SGX 11.18/XDC fs keeps going read-only !

Thanks Steve, very interesting. Redhat denied this - blamed it on Powerpath.