Operating System - Linux
1827258 Members
2330 Online
109716 Solutions
New Discussion

Is Oracle 11G such a Filehandle "hog"?

 
Alzhy
Honored Contributor

Is Oracle 11G such a Filehandle "hog"?

We recently upgraded our RHEL 5 envs (RHEL 5.5/5.6) to 11G. One of the requirements noted was the need for fs.file-max be AT LEAST 6815744 (6.8 million file handles!)

Our DBAs had in their profile and limits.conf settings for nofile to 64K files. This worked under Oracle 10G and we've never came close to Oracle ever getting close to 64K at peak usage.

After our 11G upgrade, we noticed one server (our Biggest DB and user base) just occacisonally would hang cold. At times it would just resume activities and at times we're forced to reset. There never are any indications the OS starved for available file handles though but we suspect it was due to Oracle "spinning out of control" due to likely a lack of file handles. Why- on another recent 11G upgragedee where I happened to have limits.conf set for oracle to have 256K maxfiles - file handle usage there has been in the 180Ks during peak hours whilst during the 10G era -- it barely reached 64K as well!

So -- if 11G really a file handle hog the fact that its install requirement sugegst a 6.8 million file handle limit? And would a lack of file handles for Oracle hung a system but leave no trace it was starved for such?

Unfortunately we're no longer able to replicate the issue and we were unable to capture a crash image during those hangs.

TIA for any insights and thoughts on the matter.

Hakuna Matata.
7 REPLIES 7
Alzhy
Honored Contributor

Re: Is Oracle 11G such a Filehandle "hog"?

Yes it seems to BE.
Hakuna Matata.
TwoProc
Honored Contributor

Re: Is Oracle 11G such a Filehandle "hog"?

The only way I could see this is if the server was busy spawning shadow processes and each was trying to read data files. However, you would have seen a huge increase in number of processes, an increase in swap, and certainly, at some point, errors in the Oracle alert log about not enough file handles. I would also expect, if the system ran this wild, you'd see errors for fork() commands. However, nothing you've said seem to indicate this. I don't see that this is your problem. The number of file handles at worst case is pretty simple to figure out, number of data files X number of guest connections (without connection pooling, naturally) + server connections.

So, if you have 100 data files, and 350 guests, you'd hve roughly 35,000 or so MAX file handles that could be called into use. Of course, that won't happen, unless your user continuously hit and use all tables at once, which they won't. It can add up quickly though.

However, we run a large site, and I've never needed more than 350K file handles at peak loads with lots of users and lots of concurrent jobs running. 6.8 million is A LOT.
We are the people our parents warned us about --Jimmy Buffett
Alzhy
Honored Contributor

Re: Is Oracle 11G such a Filehandle "hog"?

Thanks John.

If you look at the Oracle 11G System Requirements -- it does ask the kernel limits for max file handles to be at 6.8 Million Files. So theer must be a reason.

Collating open files on the same DB instances pre and post 11G upgrade - there seems to be more file handles opened by 11G.

BTW, our DB do not even deal with data files as it is ASM based.
Hakuna Matata.
Matti_Kurkela
Honored Contributor

Re: Is Oracle 11G such a Filehandle "hog"?

Note that network sockets count as file handles too.

When the process is hitting a file handle limit, writing to a log file would be troublesome: unless the log file is already open, opening it for writing an error message would require... a file handle.

MK
MK
TwoProc
Honored Contributor

Re: Is Oracle 11G such a Filehandle "hog"?

Matti - Oracle alert log files remain open while the database is up, in fact, if it can't open the alert log file, it won't come up.

Alzhy, I can't see why it needs so many file handles if the DB server is ASM. Makes no sense, those file handles are clearly assigned to the file system in use. I wonder if they recommend this many, just as in lots of their documentation, just as a matter of course, regardless of whether or not you're using cooked file systems. They've done more foolish & crazy things to me in the past. In fact 6.8 million - to me, sounds like a misprint.

HOWEVER, you did mention before that when your server was freezing up, it was during a backup? And, of course, if you're using RMAN to back the server, then you'll need additional file handles to get that done, number of files per channel X number of agents X number of channels, right?

Maybe that's it, you were close to 65K (though I can't see that either using ASM), and you went over for the backup?
We are the people our parents warned us about --Jimmy Buffett
Raynald Boucher
Super Advisor

Re: Is Oracle 11G such a Filehandle "hog"?

We have encountered something similar but it had nothing to do with file handles.

The problem was solved by increasing the number of allowed concurrent processes in the Oracle parameters.

http://download.oracle.com/docs/cd/B28359_01/server.111/b28310/create005.htm#i1014287

We attributed the additional processes to all the monitoring performed by Oracle Entreprise Manager Grid Control.

If the problem persists, you may investigate the "session_max_open_files" parameter. This is linked to the MAX_OPEN_FILES parameter of the OS.

http://download.oracle.com/docs/cd/B28359_01/server.111/b28320/initparams219.htm

Let us know how it goes.

RayB
Alzhy
Honored Contributor

Re: Is Oracle 11G such a Filehandle "hog"?

Thanks Matti, TP, Ray.

The DB environment was moved to a new server with a newer kernel and the hangs followed.

We and the vendors are still clueless.

What we've done so far is removed ulimits in Oracle's profile. Set Oracle's session limits for nofiles in limits.conf to a very large value (gt than the 64K).

We've also heeded the advice of implementing hugemem pages.

Everything is good so far.

In the past we thought it was RMAN hanging the system but then there were episodes where during our onlines the system would just hang and all stats gathering tools and logs would yield no clues. Unfortunate too that we were not ready for crash dump capture which could have helped diagnose the malaise.

On the old server - I've been able to stress the system using IOZONE and ORION and make it hang -- but I am not sure if the hung state is the same as what we experience with the DB load. I was able to capture vmcore and RHEL is analysing it in case the issue is with the I/O stack or something else.
Hakuna Matata.