1832871 Members
2729 Online
110048 Solutions
New Discussion

Re: File table Full

 
SOLVED
Go to solution
Kitty_6
Advisor

File table Full

I am currently running HP 11.0 with Oracle 80520. For some reason, after Feb 3 we noticed CPU spiking to 98% and sit there. Well I traced the SQL query causing the problem and notified programmers. They added an index to the query and now I'm noticing the following error message when connecting to the server:
Please wait...checking for disk quotas
crt0: ERROR couldn't open /usr/lib/dld.sl errno:000000023
/etc/profile[37]: cannot make pipe
.profile[11]: cannot make pipe
$

I then finally get in stop the database and restart the server. I checked syslogs and dmesg and see file table full. I searched several forums and all suggest to increase nfile kernel parameter.
For now -
maxuprc = 500
maxusers=300
nfile=4769
ninode=2888
nproc=2420
The system will come to a halt to again...and I will have to reboot...any assistance would be appreciated.
22 REPLIES 22
Sridhar Bhaskarla
Honored Contributor
Solution

Re: File table Full

Hi,

If it reports in syslog, then you need to increase the 'nfile' limit.

sar -v 2 10 and observe the file-sz column. If you are seeing the value reaching the denominator, then you will need to increase this limit.

Also check other values in sar output. Do not use formulae as they will increase parameters like ninode that you do not need. This is the procedure I follow to increase the parameters.

#cd /stand/build
#cp /stand/system /stand/system.(date)
#/usr/lbin/sysadm/system_prep -s /stand/system
vi system
(increase your nfile value to 30% than what you have now)
#mk_kernel -o /stand/vmunix
#kmupdate
#cd /
#shutdown -r now

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Michael Steele_2
Honored Contributor

Re: File table Full

This is HP's procedure:

A) sar -u 5 5
If %idle is low or 0 then CPU bottleneck

B) sar -u 5 5
If %wio > 15 then disk/tape bottleneck

C) sar -d 5 5
For most disks:
if > 50% busy then disk bottleneck
For small percentage of disks:
if > 20% busy then disk bottleneck

D) sar -d 5 5
if C) above true and avwait > avserv then:
I/O bottleneck

if C) above true and avwait < avwerve then:
memory bottleneck

E) vmstat 5 5
if po > 0 then paging.

F) swapinfo -tam
if total 85% or greater then add more swap

G) sar -b 5 5
if < 100% then caching problem.

Good Luck.


Support Fatherhood - Stop Family Law
T G Manikandan
Honored Contributor

Re: File table Full

The nfile parameter should still be increased.
It depends upon the number of open files on the system.

Also make sure that your npty value is also reasonable.

You can use sar -v to check the file-sz and its limits.

To temporarily resolve the problem try stopping processes which are not of priority at the moment on the system.


Thanks
Kitty_6
Advisor

Re: File table Full

This is what I see when I run...
# sar -v 2 10

HP-UX sk1 B.11.00 D 9000/859 03/05/03

22:24:41 text-sz ov proc-sz ov inod-sz ov file-sz ov
22:24:43 N/A N/A 108/2420 0 590/2888 0 765/4779 0
22:24:45 N/A N/A 108/2420 0 590/2888 0 765/4779 0
22:24:47 N/A N/A 108/2420 0 590/2888 0 765/4779 0
22:24:49 N/A N/A 108/2420 0 590/2888 0 765/4779 0
22:24:51 N/A N/A 108/2420 0 589/2888 0 765/4779 0
22:24:53 N/A N/A 108/2420 0 589/2888 0 765/4779 0
22:24:55 N/A N/A 108/2420 0 589/2888 0 765/4779 0
22:24:57 N/A N/A 108/2420 0 589/2888 0 765/4779 0
22:24:59 N/A N/A 108/2420 0 589/2888 0 765/4779 0
22:25:01 N/A N/A 111/2420 0 622/2888 0 769/4779 0
Bare with me please...I'm really new at this. I guess what your saying is watch 2888? and make sure the numerator is not equal or higher than the denominator?
Kitty_6
Advisor

Re: File table Full

npty is set to 60 and I have 1 GIG of memory
T G Manikandan
Honored Contributor

Re: File table Full

What is the value of ntpy on the system?
How much memory you have on the system?
Kitty_6
Advisor

Re: File table Full

I meant to say watch 4779..well at this point, with no one on the system, it would be hard to watch the numbers grow??...when running sar
Kitty_6
Advisor

Re: File table Full

 
Sridhar Bhaskarla
Honored Contributor

Re: File table Full

You mentioned that you saw filetable full messages in syslog and your sar -v did not show any overflows.

Then your maxfiles and maxfiles_lim may not be sufficient.

Do

kmtune -l -q maxfiles
kmtune -l -q maxfiles_lim

If maxfiles is too low, then you will have to increase it a bit. You can increase the parameter maxfiles on the fly.

kmtune -s maxfiles=512
kmtune -u

Run 'kmtune -l -q maxfiles' again to make sure the parameter is updated.

-Sri



You may be disappointed if you fail, but you are doomed if you don't try
Kitty_6
Advisor

Re: File table Full

Here is what it is right now...How do I know what is considered to be low?

# kmtune -l -q maxfiles
Parameter: maxfiles
Value: 60
Default: 60
Minimum: 0
Module: -


# kmtune -l -q maxfiles_lim
Parameter: maxfiles_lim
Value: 1024
Default: 1024
Minimum: 0
Module: -
Sridhar Bhaskarla
Honored Contributor

Re: File table Full

maxfiles = maximum number of files a process can open. 60 is the default value and it may not be sufficient.

However, since it is resulting from the profile, I am not sure if maxfiles is the problem too.

Then comes maxuprc - maximum number of processes a user can run. To verify if you are running, do the following

1. Login as another user. It should take you through without giving any error
2. While you are as another user do the following.

$ps -ef|grep 'my_user' |wc -l

my_user is the login name that you were trying to login and got 'cannot make pipe' message.

Exit back to root session. Run

#kmtune -l -q maxuprc

Compare the number you got with ps and the one you got above. If they are close, then time to increase maxuprc parameter.

This is also a dynamically changeable parameter.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Kitty_6
Advisor

Re: File table Full

Earlier when I attempted to telnet in as myself or another user...I received the same message.

Earlier Logged IN:
/>ps -ef|grep oracle|wc -l
53

Logged in as Root:
# ps -ef|grep root|wc -l
64

Kitty_6
Advisor

Re: File table Full

sorry I forgot to add...

# kmtune -l -q maxuprc
Parameter: maxuprc
Value: 500
Default: 75
Minimum: -
Module: -

T G Manikandan
Honored Contributor

Re: File table Full

Kitty,

i would also recommend to check
#dmesg
and /var/adm/syslog/syslog.log file.

There should be lot of messages regarding this.

This should help.

Revert with them too.
Sridhar Bhaskarla
Honored Contributor

Re: File table Full

Hi,

Did you reboot the box after you encountered the problem?. I am not seeing anything now that is indicative of an overflow.

If you rebooted the box, then you will need to keep an eye on sar -v and see if the nfiles parameter does not hit the limit.

-Sri
PS: You do not need to assign 10 pts unless your problem is solved. I know you are appreciative of our response but the purpose of the point system is to identify the correct answer and rest come later. In future if someone searches the forum for the same problem, he/she will get confused the way points were assigned. In fact, none of our earlier messages solved your problem and they are all carrying 10 pts each. Since most of the messages were only diagnostics, you could award 3 pts until you get the solution.
You may be disappointed if you fail, but you are doomed if you don't try
T G Manikandan
Honored Contributor

Re: File table Full

Also what is the output of ninode on the system?
Just do a kmtune -l for values and past it here.

Thanks
Kitty_6
Advisor

Re: File table Full

sorry about the points...I will remember next time. dmesg and syslogs both have file table full errors. The system was rebooted today after users could not get in. Since then, I watch top and if I see a process (usually a process from the oracle web server) consuming CPU for over 10 minutes at 90% or higher..I kill it and things are back to normal for awhile.


Well here is the output

Parameter: ninode
Value: ((NPROC+16+MAXUSERS)+32+(2*NPTY))
Default: ((NPROC+16+MAXUSERS)+32+(2*NPTY))
Minimum: -
Module: -

Kitty_6
Advisor

Re: File table Full

Well ..I've been working on this issue since 5am this morning and it's after 12am here. I'm logging off for now..and should be dialed back in by 5am. thanks again..any suggestions/resolutions would be greatly appreciated. Once again thanks for the quick responses.
Sridhar Bhaskarla
Honored Contributor

Re: File table Full

That explains it. Since you rebooted the box, the counters got cleared and that's the reason why you are not seeing any overflows in your sar -v.

Keep monitoring the file-sz column in sar -v to see if the numerator is gradually increasing.

Try enabling sar in cronjob. Look at "sadc" "sar1" man pages on setting up sar. You will need to have space in /var/adm/sa to enable sar. It will help you dig down the stats prior to reboot.

Since the messages are clear, you will need to increase 'nfile' kernel parameter when you get the downtime.

-Sri

You may be disappointed if you fail, but you are doomed if you don't try
Kitty_6
Advisor

Re: File table Full

nfile is now set to 4769. What should I increase it to? Is there a formula to go by?
Kitty_6
Advisor

Re: File table Full

Thanks - I setup a cronjob to run the following command every 5 minutes.
sar -v 2 5 > /tmp/savesar.`date -u +'%m%d%y%H%M%S'`

Kitty_6
Advisor

Re: File table Full

I've been monitoring the System and noticed once CPU hits 98% we start having problems. I let it sit a 98% for about 20 minutes and killed the process (oracle process). I checked the output from sar and below is what I see -
HP-UX sk1 B.11.00 D 9000/859 03/06/03

11:00:00 text-sz ov proc-sz ov inod-sz ov file-sz ov
11:00:02 N/A N/A 262/2420 0 1233/2888 0 2101/4779 0
11:00:04 N/A N/A 262/2420 0 1230/2888 0 2109/4779 0
11:00:06 N/A N/A 260/2420 0 1226/2888 0 2099/4779 0
11:00:08 N/A N/A 262/2420 0 1221/2888 0 2073/4779 0
11:00:10 N/A N/A 256/2420 0 1216/2888 0 2040/4779 0


Early this morning- it appeared as -
HP-UX sk1 B.11.00 D 9000/859 03/06/03

07:10:00 text-sz ov proc-sz ov inod-sz ov file-sz ov
07:10:02 N/A N/A 113/2420 0 1087/2888 0 782/4779 0
07:10:04 N/A N/A 113/2420 0 1082/2888 0 782/4779 0
07:10:06 N/A N/A 113/2420 0 1082/2888 0 782/4779 0
07:10:08 N/A N/A 113/2420 0 1082/2888 0 782/4779 0
07:10:10 N/A N/A 113/2420 0 1082/2888 0 782/4779 0

I had to kill the process or else, the system will eventually hang. Does this appear to be an issue related to the nfile kernal paramter? or should I let CPU stay at 98% and capture sar when the system begins to hang?