System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

RHEL 5.5 Oracle 11G Host Hangs during Heay I.O - RMAN

Alzhy
Honored Contributor

RHEL 5.5 Oracle 11G Host Hangs during Heay I.O - RMAN

We've a Big Linux server that has the same OS, same I/O Config, same backend, same recent upgrade to 11G. It has been "hanging" exactly during the time when there is peak I/O and where the most evident process running was RMAN (to LTO4 drives in NetBackup).

The System becomes responsive, remains on network however but inaccessible and the console is flooded with messages:

INFO: task processname:5064 blocked for more than 120 seconds.
INFO: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Has anyone experienced this issue? We're clueless and all vendors are engaged.

RHEL came back with a suggestion to change our I/O Scheduler (ELEVEATOR) to DEADLINE from the default CFQ scheduling. Our backend SAN storage is an XP12K array.

The I/O Schedulers I thought is just a sugegstive setting depending on the Array used and load. Unddoubtedly we really should be using DEADLINE scheduler for the DB LUNs alright but I don't believe it should HANG a Linux system.

I am still poring through several Bugzillas that seem to match the kernel messages.

TIA for any ideas, comments, leads, etc.


Hakuna Matata.
7 REPLIES
TwoProc
Honored Contributor

Re: RHEL 5.5 Oracle 11G Host Hangs during Heay I.O - RMAN

I have heard that there are certain patches that Oracle recommends for running on RH5.5 to keep I/O hanging from occuring... but I've only heard from certain consultants that it does indeed exist. It came up in a meeting this morning, and I've asked for the list from the consultant - if I get more info, I'll post it here.
We are the people our parents warned us about --Jimmy Buffett
Alzhy
Honored Contributor

Re: RHEL 5.5 Oracle 11G Host Hangs during Heay I.O - RMAN

Thanks TP.. will appreciate it a LOT.

Right now, there are several theories:

- Boot Disk (SAS 15krpm disk) Firmware
- I.O Elevator default of CFQ caused it so we changed the DB disks elevators to DEADLINE

Our DBAs ceased running RMAN and we're about to hit that same period the past 2 days where we got the hit.



Hakuna Matata.
Randy Jones_3
Trusted Contributor

Re: RHEL 5.5 Oracle 11G Host Hangs during Heay I.O - RMAN

> Our backend SAN storage...

YMMV depending on the intelligence of your SAN controller, but I've never used anything other than "elevator=noop" for anything resident on our SAN.

We're using Oracle 10g and 11g on 32GB and 64GB RHEL5 systems. In our SAN the controller is caching and queuing based on its knowledge of actual data placement, so beyond basic bunching of adjacent requests I believe it's counterproductive to ask Linux to try to optimize I/O for some virtual device that looks nothing like how the data is placed in reality. I want the requests down the channel ASAP so the SAN can get to work on them.
Florian Heigl (new acc)
Honored Contributor

Re: RHEL 5.5 Oracle 11G Host Hangs during Heay I.O - RMAN

There is a good presentation about the various schedulers from one of the last redhat summits, search for it.

I didn't see that specific error, but experienced similar issues, i.e. one "imp" rendering the server completely useless, same with md_resync on a smaller box. I can just say that ionice is your friend, and that not all things that should not happen on a Unix server still hold true when using Linux.
yesterday I stood at the edge. Today I'm one step ahead.
TwoProc
Honored Contributor

Re: RHEL 5.5 Oracle 11G Host Hangs during Heay I.O - RMAN

Alzy, my favorite way to "tune" high I/O (after actually tuning the problematic statements that show up in statspack/AWR reports) - is with memory. If you're not already out there with ram, why not add more and configure the server and the database to take advantage of the resources?

I know that this isn't really the problem you're having, it's just a suggestion - but in general, I'd try to stay away from having big I/O problems in the first place...

We are the people our parents warned us about --Jimmy Buffett
Alzhy
Honored Contributor

Re: RHEL 5.5 Oracle 11G Host Hangs during Heay I.O - RMAN

Thanks so far Migz.

After implementing the I/O scheduler change to "deadline" - which I had doubts was really the crux of the matter -- we again had an episode of a hang on our RHEL 5.6 (not 5.5 after all -- 2.6.18-194.26.1 kernel system. Unfortunately -- we were not able to do a forced crash to capture image as we did not have sysrq turned on.

SO the issue is puzzling.

We now are adviced to go to the 2.6.18-194.32.1 kernel. We did so but moved the DB on to a different server (same model/specs) and the updated kernel.... SO far so good.

The old server -- we woll try to attempt to replicate the issue by running iozone and Swingbench or RHEL
s stress suite.



The old server we will try to replicate the issue.
Hakuna Matata.
Mike_Swift
Advisor

Re: RHEL 5.5 Oracle 11G Host Hangs during Heay I.O - RMAN

Alzhy

 

Was this ever resolved? We have a very similar issue only difference being HDS storage.

 

Thanks

 

Mike.