Operating System - HP-UX
1835215 Members
2343 Online
110078 Solutions
New Discussion

Re: HP-UX Open Files problem

 
Thanos Agelatos
Occasional Contributor

HP-UX Open Files problem

Hi there. We have two systems (believed identical) one devel one production. In the devel system we do not experience the open files problem, whereas the production one does... I'll try below to paste the system's kernel parameters (from /stand/system):

DEVEL * Tunable parameters
STRMSGSZ 65535
dnlc_hash_locks 512
max_thread_proc 3000
maxdsiz 2063835136
maxdsiz_64bit 0X80000000
maxfiles 2048
maxfiles_lim 2048
maxssiz 0X8000000
maxssiz_64bit 0X8000000
maxswapchunks 16384
maxtsiz 0X4000000
maxtsiz_64bit 0X40000000
maxuprc ((NPROC*9)/10)
maxusers 512
msgmni (NPROC)
msgseg 32767
msgtql (NPROC)
ncallout (((NPROC*7)/4)+16)
ncsize ((8*NPROC+2048)+VX_NCSIZE)
nfile (15*NPROC+8192)
nflocks 6000
ninode (8*NPROC+2048)
nproc 4096
nstrpty 60
semmni 4096
semmns (SEMMNI*2)
semmnu (NPROC-4)
shmmax (0X80000000-0X4000000)
shmmni 512
vps_ceiling 64

PROD * Tunable parameters
STRMSGSZ 65535
dbc_max_pct 15
dnlc_hash_locks 512
max_thread_proc 256
maxdsiz 2063835136
maxdsiz_64bit 0X80000000
maxfiles 8192
maxfiles_lim 10000
maxssiz 0X8000000
maxssiz_64bit 1073741824
maxswapchunks 16384
maxtsiz 0X4000000
maxtsiz_64bit 0X40000000
maxuprc ((NPROC*9)/10)
maxusers 512
msgmni (NPROC)
msgseg 32767
msgtql (NPROC)
nflocks 17744
ninode (2*((NPROC+16+MAXUSERS)+32+(2*NPTY)))
nproc 8192
nstrpty 60
semmni 4096
semmns (SEMMNI*2)
semmnu (NPROC-4)
shmmax 0X20000000
shmmni 512
timezone (-60)
vps_ceiling 64

The one system (top - development) does not experience the problem. We managed to find out the open file leak at the end (this is a J2EE Oracle Application Server java web application) and fix it. However, what puzzles us is that in the production system the problem manifested itself from as low as 520 open files (which include the DB connections, 150) whereas in the development system we managed to have more than 650 files open (leaked) and still no problems. We used a utility named "lsof" which we compiled on HPUX to find the processes' open files (for the JVM).
From the kernel parameters we've upped both the NINODE, NPROC, NFILE, NFLOCKS, MAXFILES and MAXFILES_LIM. Although in the production we have MAXFILES_LIM = 10.000, it still dies compared to the development which is as low as 2048 (see above).

Any ideas anyone? Does this configuration have something we've misdjudged in the production and we might need to rectify soon?? Why although the development server seems not as configured, still manages to handle more?

Thank you
8 REPLIES 8
John Poff
Honored Contributor

Re: HP-UX Open Files problem

Hi,

What is the setting for nfile on the production system? I don't see that value listed.

The 'lsof' utility should have given you a good idea of what the differences were. Did it show if the processes on the production system had more files open per process than on the development system?

JP
Ramkumar Devanathan
Honored Contributor

Re: HP-UX Open Files problem

hi,

some general pointers -

- Are both these machines at the same patch levels?

- are there any other applications running on the other machine which has an "open files" problem?

- what is lsof's output? that would help to analyze the problem.

use this script that i'd earlier posted in the forums, to find out total no. of files open by a process -

lsof | awk '{procct[$1]++;procname[$1]=$1;}END{for (i in procname) {printf ("%s, %d\n",procname[i],procct[i]);}}' | tr -s " " | sort -t" " -n -r -k 2,2

check the process that has too many files open on the system following the oracle jvm based app. see if that too has a memory leak anywhere.

FWIW.

- ramd.
HPE Software Rocks!
Michael Steele_2
Honored Contributor

Re: HP-UX Open Files problem

I don't see the same kernel parameters or values on both systems, 'dbc_max_pct 15' for example only appears in the bottom list.

What model servers, for both please?

This sounds more like a patching issue also, which can be corrected by cloning dev onto prod. Run some patch counts on both:

swlist -l fileset | wc

Use sar to measure usage of open files, inodes and processes:

sar -v 5 5
Support Fatherhood - Stop Family Law
Tim Sanko
Trusted Contributor

Re: HP-UX Open Files problem

First of all, do you have more people on production than development.

It may actually deal with a calculated field, but is appearing as a file problem.

I would suggest running sar and checking to see which of your metrics is really killing you.

Our /stand/system resembles this
* Tunable parameters

STRMSGSZ 65535
bufpages 0
dbc_max_pct 10
fs_async 1
maxdsiz 0X20000000
maxdsiz_64bit 0X20000000
maxfiles 2048
maxfiles_lim 2048
maxssiz 0X2000000
maxssiz_64bit 0X2000000
maxswapchunks 12288
maxtsiz 0X20000000
maxuprc ((NPROC*8)/10)
maxusers 1200
maxvgs 256
msgmap (MSGTQL+2)
msgmax 32768
msgmnb 65535
msgmni (NPROC)
msgseg (MSGTQL*4)
msgssz 128
msgtql (NPROC)
nfile (15*NPROC+2048)
nflocks (NPROC)
ninode (8*NPROC+2048)
nproc (((10*MAXUSERS)/3)+128)
nstrpty 60
nstrtel (MAXUSERS)
nswapdev 25
semmni (NPROC*2)
semmns (SEMMNI*2)
semmnu (NPROC-4)
semume 64
semvmx 32768
shmmax 0X40000000
shmmni 512
shmseg 32
swapmem_on 0
timeslice 1
unlockable_mem (MAXUSERS*10)

What I am suspecting is that you are actually bumping your head somewhere else.

Like I said, this is in probability a parallel problem...
Steven E. Protter
Exalted Contributor

Re: HP-UX Open Files problem

nflocks hits me initially. Its set pretty high and I'm wondering why.

Usually we set new kernel parms on the test box and make sure they work right before loosing them in production.

If this is a kernel issue at all, it might be useful to bring production back to where the test machine is where practical and see if the problem recurs. If not, there's your answer.

You can bump the test box up to here production is and see if the problem mysteriously appears. This would prove its a kernel problem.

Then its a matter of scaling back the test machine one(with other dependent parameters) at a time until the problem goes away.

It's painstaking and difficult, but I think you'd like to know which kernel parameter is actually causing the problem right?

Good Luck.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Thanos Agelatos
Occasional Contributor

Re: HP-UX Open Files problem

Hi all

Thanks for your prompt replies - I'll take the questions you asked one by one:

NFILE on production system:
14344 (based on formulae - I believe its the default:
(16*(NPROC+16+MAXUSERS)/10+32+2*(NPTY+NSTRPTY+NSTRTEL))

NFILE on development:
69632 (15*NPROC+8192)

dbc_max_pct on PROD (missing from the list) is 15 again. Dont know why it didnt show.

sar -v 5 5 on PROD:
root@appsrv2#sar -v 5 5

HP-UX appsrv2 B.11.11 U 9000/800 04/22/03

09:03:11 text-sz ov proc-sz ov inod-sz ov file-sz ov
09:03:16 N/A N/A 137/8192 0 768/17744 0 1853/14354 0
09:03:21 N/A N/A 137/8192 0 769/17744 0 1855/14354 0
09:03:26 N/A N/A 137/8192 0 769/17744 0 1856/14354 0

sar -v 5 5 on DEVEL:
# sar -v 5 5

HP-UX io B.11.11 U 9000/800 04/22/03

11:07:35 text-sz ov proc-sz ov inod-sz ov file-sz ov
11:07:40 N/A N/A 217/4096 0 1158/34816 0 1536/69642 0
11:07:45 N/A N/A 217/4096 0 1158/34816 0 1536/69642 0
11:07:50 N/A N/A 217/4096 0 1158/34816 0 1536/69642 0

about the patchlists:
"swlist -l fileset | wc "
on DEVEL: 1348
on PROD : 1319

machine specs:
DEVEL: rp2405, 2cpu, 2GB ram
PROD : rp5405, 2cpu, 4GB ram

Machines are different, but both have been patched with the GOLD patchset of December 2002, and also the Java-Out-Of-The-Box patchset, apparently this was installed on development initially and not production and just by putting it on we gained some 10-15% performance increase. One other reason why kernel param numbers are not so in sync is that I've been on the production site for the last 3 weeks and we were "sort of" applying changes direct to the servers there (these are 3 servers for production, same machine.) Still, the 3 prod servers managed to hit the mark MUCH SOONER than the development one which has also 3 DB _instances_ (oracle 9i) running on it.

Hope this information is more help - I've managed to find out the leaks on the application (it was some gif/jpg serving servlet that was not closing its files) using the lsof utility. Our question still remains: why the prod systems hit the mark sooner (apart from the traffic which was higher on production obviously). What other parameters do we need to tune/check for the production systems? Running java-out-of-the-box has already tuned the threads etc (or so it says). Any other pointers?

Finally I want to thank again all of you for the replies - this is my first post in these forums and I've got solaris and linux experience - never before touched the hp-ux beasts...

Thanos



Thomas Schler_1
Trusted Contributor

Re: HP-UX Open Files problem

Thanos,

in principle, if you are running one development system and one production system, and if your poduction system is critical, you should have the same hardware and the same software installations and configurations for both, production and development systems.

In your case, hardware is not the same. But, this seems not to be critical, since your production system looks better than the development system. Nevertheless, if it is possible, in the future, upgrade your development systems hardware, to have the same environment as on the production system.

Your software installations may be the same on both systems (I don't know), regardless of the small difference in installed filesets.

At least your kernel configuration is not the same. E.g. your development system has set max_thread_proc to 3000, but on your production system it is set to 256. This seems much too small. I did not check the remainder of your kernel parameter list. But you should do so. Don't try to figure out on the individual parameter, how that could influence your specific problem. Just be sure to have the *same* configuration on both systems. After doing so, go on on analyzing if your problem will still exist.
no users -- no problems
Michael Steele_2
Honored Contributor

Re: HP-UX Open Files problem

I'd say both of your servers are in danger of grinding to a halt from system memory fragmentation. Certainly performance will degrade as you get farther from a reboot. Consider these numbers for your production server:

sar -v 5 5 on PROD:
root@appsrv2#sar -v 5 5

HP-UX appsrv2 B.11.11 U 9000/800 04/22/03

09:03:11 text-sz ov proc-sz ov inod-sz ov file-sz ov
09:03:16 N/A N/A 137/8192 0 768/17744 0 1853/14354 0
09:03:21 N/A N/A 137/8192 0 769/17744 0 1855/14354 0
09:03:26 N/A N/A 137/8192 0 769/17744 0 1856/14354 0

137/8192 = 1.67% utilized for nproc

768/17744 = 4.3% utilized for ninode

1853/14354 = 12.9% utilized for nfile

If these numbers are indicative of your peak usage times during the day then this is a very lightly used server and you should adjust the above kernel parameters until 25% to 50% utilized numbers are indicated.

Your dev server belongs to the A class while your prod server belongs to an L class and they will require different configuring. You can't clone dev onto prod with ignite for instance. Consider this link about the HW differences:

http://www.hp.com/products1/servers/compare_pa-risc.html

Finally, your open file problem: From you description in your second posting its sounds like a rogue runaway process which is a cleanup issue. These are often hard to identify but if you're using shared memory then ipcs and ipcrm will help, as the owner and PID are indicated.

ipcs -ma

ipcrm -m key / shmid

Although this won't kill a rogue process that's gone to init (* reboot *) it will help you evaluate and release your shared memory.
Support Fatherhood - Stop Family Law