Operating System - HP-UX
1753977 Members
7419 Online
108811 Solutions
New Discussion юеВ

Re: How do I get LSOF to see open files as they relate to the NFILE kernel parameter?

 
SOLVED
Go to solution
Greg OBarr
Regular Advisor

How do I get LSOF to see open files as they relate to the NFILE kernel parameter?

OK, I know this means I need to increase nfile and nflocks kernel parameters as a solution to this problem, but there's more.

This is an L2000 running HP-UX 11.00, 64-bit kernel and 64-bit Oracle. It's been running for 2 years without any probelms like this. The day before this happened, I ran an RMAN backup with about 10 channels to disk writing the backup data. It slowed the system down dramatically, filled the filesystem I was writing the backup to, and RMAN exited when the filesystem filled and it couldn't write more output. I think this is what caused the file table to fill... Maybe RMAN didn't release some file descriptors because of the abnormal exit.

To monitor the system, I have downloaded LSOF to see what files are being used by what processes. When I just run lsof with no arguments, there are 14777 files open in the system, but my "nfile" parameter is set to 8202, so this can't be talking about the same thing. How do I get lsof to see open files that relate to the value of the nfile parameter? To rephrase, "sar -v 1 1" shows that I have 6808/8202 file descriptors in use (file-sz) currently, but how can I get lsof to show me the 6808 files it's seeing? I am also concerned that some Oracle procedure may be leaking file descriptors, that's why I would like to do this.
9 REPLIES 9
Bill Hassell
Honored Contributor
Solution

Re: How do I get LSOF to see open files as they relate to the NFILE kernel parameter?

Many, many of those file descriptors may be sockets and not real files. However, you can see the in-use file descriptors with sar -v 1. The file-sz column will show in-use / maximum. There is no possibioity of the kernel being able to open more descriptors than nfile allows. nfile sets the size of a table permanently in RAM so there is no possibility to go beyond it's limits.


Bill Hassell, sysadmin
Greg OBarr
Regular Advisor

Re: How do I get LSOF to see open files as they relate to the NFILE kernel parameter?

Thanks, Bill.

So, this means that nfile is only talking about "real" files that reside on a disk somewhere. The LSOF man page tells me that files of the type "REG" are regular files, so I do "lsof|grep REG|wc" and I get:

# lsof|grep REG|wc
7484 67647 724617

But "sar -v 1" tells me:

# sar -v 1

HP-UX cadb01a B.11.00 A 9000/800 03/14/03

15:42:04 text-sz ov proc-sz ov inod-sz ov file-sz ov
15:42:05 N/A N/A 347/1620 0 1891/8192 0 4543/8202 0

It appears that every Oracle user has the DBF and IDX files open according to the schema they're in. I can understand that. But it also appears that every Oracle user has several library files open, such as:

oraclePRO 22687 oracle mem VREG 64,0x7 119416 13834 /usr/lib/pa20_64/libxti.2
oraclePRO 22710 oracle mem VREG 64,0x7 119416 13834 /usr/lib/pa20_64/libxti.2
oraclePRO 22719 oracle mem VREG 64,0x7 119416 13834 /usr/lib/pa20_64/libxti.2
oraclePRO 22743 oracle mem VREG 64,0x7 119416 13834 /usr/lib/pa20_64/libxti.2
oraclePRO 22801 oracle mem VREG 64,0x7 119416 13834 /usr/lib/pa20_64/libxti.2
oraclePRO 22937 oracle mem VREG 64,0x7 119416 13834 /usr/lib/pa20_64/libxti.2
oraclePRO 22978 oracle mem VREG 64,0x7 119416 13834 /usr/lib/pa20_64/libxti.2
oraclePRO 22986 oracle mem VREG 64,0x7 119416 13834 /usr/lib/pa20_64/libxti.2
oraclePRO 23014 oracle mem VREG 64,0x7 119416 13834 /usr/lib/pa20_64/libxti.2
oraclePRO 23055 oracle mem VREG 64,0x7 119416 13834 /usr/lib/pa20_64/libxti.2

The above is only a small section of all the users who have the same file open. The "FD" column shows "mem" indicating that this is a memory mapped file, so can I exclude these as open "real" files since they're not on the disk? I could also assume that /dev/null is not a "real" file. So, that being the case, I can use:

# lsof|grep oracle|grep -v mem|grep -v null|wc
4153 37535 415514

When I filter out the memory mapped files and the /dev/null, I get 4153 files open by oracle, and now "sar -v 1" shows 4394/8202 in the file-sz column. This is much closer to the mark and the OS and other apps could easily have another 200 files open.

Does this sound right to you?

What it looks like has been happening is that each time we add a datafile to a tablespace, every user that connects in a schema that accesses that tablespace is using a file descriptor for each of the .dbf and .idx files that make up that tablespace. So, for a tablespace that consists of 10 .dbf files, 50 users accessing that tablespace would take up 50*10=500 file descriptors. We were over the 7100/8202 mark for a while today in normal operation - no backups, nothing out of the ordinary running.

I think it's time to bump up the nfile and nflocks. Anything else I should increase at the same time?
Michael Steele_2
Honored Contributor

Re: How do I get LSOF to see open files as they relate to the NFILE kernel parameter?

You're using lsof incorrectly.

# lsof -p

To see all the open files associated with a particular command.

# lsof -c midaemon

User name.

# lsof -u
# lsof -u

processes being used via a socket.

# lsof -i tcp:23
# lsof -i udp:123

Also, nfile defines the maximum number of files that can be open at any one time, system-wide.

Every process uses at least three file descriptors per process (standard input, standard output, and standard error).
Support Fatherhood - Stop Family Law
Bill Hassell
Honored Contributor

Re: How do I get LSOF to see open files as they relate to the NFILE kernel parameter?

nfile covers all open files, even if it is the same file opened multiple times (ie, a unique descriptor each time required to handle writes). nfile should be set to 15,000 to 20,000 and since the system is growing, you might as well bump up nproc to 2,000. This will avoid an unnecessary reboot at a later time. Having nfile or nproc twice as large as needed doesn't do anything except reserve some kernel memory for the table space. Many applications and database programs do not handle table limits gracefully at all and some fail to even report the problem before crashing.


Bill Hassell, sysadmin
Greg OBarr
Regular Advisor

Re: How do I get LSOF to see open files as they relate to the NFILE kernel parameter?

Thanks everyone. It should be noted that "lsof -u oracle" still gives a much higher number of files open than is possible. Must be because some are sockets and memory mapped files, not files on disk.

lsof -u oracle|wc
10883 98774 1001016 (but nfile is set to 8192)

Bill, thanks for the suggestions on settings. I will make the following changes:

Param Current Value New Value
nfile 8,192 20,000
nflocks 800 2,400
nproc 1,620 2,500

There are some others that concern me too:

ninode 8,192
maxprc 600
maxusers 128

I'm going to look for a table of recommended settings for my system model, config, and it's function as a database server. Does anyone know of one?
Keely Jackson
Trusted Contributor

Re: How do I get LSOF to see open files as they relate to the NFILE kernel parameter?

 
Live long and prosper
Greg OBarr
Regular Advisor

Re: How do I get LSOF to see open files as they relate to the NFILE kernel parameter?

Thanks Keely,
Even though this is for 9i and I'm using 8i, I think they should be about the same in terms of kernel parameter settings. Some of these are a LOT higher than what I had them set to.
Bill Hassell
Honored Contributor

Re: How do I get LSOF to see open files as they relate to the NFILE kernel parameter?

> There are some others that concern me too:

> ninode 8,192
> maxprc 600
> maxusers 128

ninode is way too high. It is an inode cache just for HFS filesystems. All standard HP-UX systems use VxFS except for /stand so ninode=1000 is fine and will save some kernel memory space. Get rid of any formula there and set it to a fixed value.

maxprc is dependent on how you use your system. The value limits the maximum number of processes that a user can start at the same time. If you have dozens of processes running as user oracle, this value must be increased to be larger than what you require.

maxusers is not a kernel parameter, it is a macro value for the parameters which have a formula rather than a fixed value. The idea is that rasing the value will adjust related parameters at the same time. As long as you run an average system, it works OK (but no one runs a average system...)


Bill Hassell, sysadmin
Daniel M. Gonzales
Frequent Advisor

Re: How do I get LSOF to see open files as they relate to the NFILE kernel parameter?

Greg-

Here's the closest we have been able to come...

=============================================
OUTPUT=/tmp/lsof.`date +\%j.\%X`

lsof +ff | grep " 0x........ " | sort -uk 6,6 | tee $OUTPUT | wc -l >> $OUTPUT
sar -v 1 >> $OUTPUT
=============================================

This lists the open files, counts them, and compares to the SAR output. Results will vary depending on the utilization of the system (though sometimes we have hit it dead on...though rarely). Good luck.