- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: HP-UX Open Files problem
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-21-2003 05:55 AM
04-21-2003 05:55 AM
HP-UX Open Files problem
DEVEL * Tunable parameters
STRMSGSZ 65535
dnlc_hash_locks 512
max_thread_proc 3000
maxdsiz 2063835136
maxdsiz_64bit 0X80000000
maxfiles 2048
maxfiles_lim 2048
maxssiz 0X8000000
maxssiz_64bit 0X8000000
maxswapchunks 16384
maxtsiz 0X4000000
maxtsiz_64bit 0X40000000
maxuprc ((NPROC*9)/10)
maxusers 512
msgmni (NPROC)
msgseg 32767
msgtql (NPROC)
ncallout (((NPROC*7)/4)+16)
ncsize ((8*NPROC+2048)+VX_NCSIZE)
nfile (15*NPROC+8192)
nflocks 6000
ninode (8*NPROC+2048)
nproc 4096
nstrpty 60
semmni 4096
semmns (SEMMNI*2)
semmnu (NPROC-4)
shmmax (0X80000000-0X4000000)
shmmni 512
vps_ceiling 64
PROD * Tunable parameters
STRMSGSZ 65535
dbc_max_pct 15
dnlc_hash_locks 512
max_thread_proc 256
maxdsiz 2063835136
maxdsiz_64bit 0X80000000
maxfiles 8192
maxfiles_lim 10000
maxssiz 0X8000000
maxssiz_64bit 1073741824
maxswapchunks 16384
maxtsiz 0X4000000
maxtsiz_64bit 0X40000000
maxuprc ((NPROC*9)/10)
maxusers 512
msgmni (NPROC)
msgseg 32767
msgtql (NPROC)
nflocks 17744
ninode (2*((NPROC+16+MAXUSERS)+32+(2*NPTY)))
nproc 8192
nstrpty 60
semmni 4096
semmns (SEMMNI*2)
semmnu (NPROC-4)
shmmax 0X20000000
shmmni 512
timezone (-60)
vps_ceiling 64
The one system (top - development) does not experience the problem. We managed to find out the open file leak at the end (this is a J2EE Oracle Application Server java web application) and fix it. However, what puzzles us is that in the production system the problem manifested itself from as low as 520 open files (which include the DB connections, 150) whereas in the development system we managed to have more than 650 files open (leaked) and still no problems. We used a utility named "lsof" which we compiled on HPUX to find the processes' open files (for the JVM).
From the kernel parameters we've upped both the NINODE, NPROC, NFILE, NFLOCKS, MAXFILES and MAXFILES_LIM. Although in the production we have MAXFILES_LIM = 10.000, it still dies compared to the development which is as low as 2048 (see above).
Any ideas anyone? Does this configuration have something we've misdjudged in the production and we might need to rectify soon?? Why although the development server seems not as configured, still manages to handle more?
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-21-2003 06:05 AM
04-21-2003 06:05 AM
Re: HP-UX Open Files problem
What is the setting for nfile on the production system? I don't see that value listed.
The 'lsof' utility should have given you a good idea of what the differences were. Did it show if the processes on the production system had more files open per process than on the development system?
JP
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-21-2003 06:07 AM
04-21-2003 06:07 AM
Re: HP-UX Open Files problem
some general pointers -
- Are both these machines at the same patch levels?
- are there any other applications running on the other machine which has an "open files" problem?
- what is lsof's output? that would help to analyze the problem.
use this script that i'd earlier posted in the forums, to find out total no. of files open by a process -
lsof | awk '{procct[$1]++;procname[$1]=$1;}END{for (i in procname) {printf ("%s, %d\n",procname[i],procct[i]);}}' | tr -s " " | sort -t" " -n -r -k 2,2
check the process that has too many files open on the system following the oracle jvm based app. see if that too has a memory leak anywhere.
FWIW.
- ramd.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-21-2003 06:08 AM
04-21-2003 06:08 AM
Re: HP-UX Open Files problem
What model servers, for both please?
This sounds more like a patching issue also, which can be corrected by cloning dev onto prod. Run some patch counts on both:
swlist -l fileset | wc
Use sar to measure usage of open files, inodes and processes:
sar -v 5 5
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-21-2003 06:18 AM
04-21-2003 06:18 AM
Re: HP-UX Open Files problem
It may actually deal with a calculated field, but is appearing as a file problem.
I would suggest running sar and checking to see which of your metrics is really killing you.
Our /stand/system resembles this
* Tunable parameters
STRMSGSZ 65535
bufpages 0
dbc_max_pct 10
fs_async 1
maxdsiz 0X20000000
maxdsiz_64bit 0X20000000
maxfiles 2048
maxfiles_lim 2048
maxssiz 0X2000000
maxssiz_64bit 0X2000000
maxswapchunks 12288
maxtsiz 0X20000000
maxuprc ((NPROC*8)/10)
maxusers 1200
maxvgs 256
msgmap (MSGTQL+2)
msgmax 32768
msgmnb 65535
msgmni (NPROC)
msgseg (MSGTQL*4)
msgssz 128
msgtql (NPROC)
nfile (15*NPROC+2048)
nflocks (NPROC)
ninode (8*NPROC+2048)
nproc (((10*MAXUSERS)/3)+128)
nstrpty 60
nstrtel (MAXUSERS)
nswapdev 25
semmni (NPROC*2)
semmns (SEMMNI*2)
semmnu (NPROC-4)
semume 64
semvmx 32768
shmmax 0X40000000
shmmni 512
shmseg 32
swapmem_on 0
timeslice 1
unlockable_mem (MAXUSERS*10)
What I am suspecting is that you are actually bumping your head somewhere else.
Like I said, this is in probability a parallel problem...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-21-2003 08:36 AM
04-21-2003 08:36 AM
Re: HP-UX Open Files problem
Usually we set new kernel parms on the test box and make sure they work right before loosing them in production.
If this is a kernel issue at all, it might be useful to bring production back to where the test machine is where practical and see if the problem recurs. If not, there's your answer.
You can bump the test box up to here production is and see if the problem mysteriously appears. This would prove its a kernel problem.
Then its a matter of scaling back the test machine one(with other dependent parameters) at a time until the problem goes away.
It's painstaking and difficult, but I think you'd like to know which kernel parameter is actually causing the problem right?
Good Luck.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-21-2003 11:08 PM
04-21-2003 11:08 PM
Re: HP-UX Open Files problem
Thanks for your prompt replies - I'll take the questions you asked one by one:
NFILE on production system:
14344 (based on formulae - I believe its the default:
(16*(NPROC+16+MAXUSERS)/10+32+2*(NPTY+NSTRPTY+NSTRTEL))
NFILE on development:
69632 (15*NPROC+8192)
dbc_max_pct on PROD (missing from the list) is 15 again. Dont know why it didnt show.
sar -v 5 5 on PROD:
root@appsrv2#sar -v 5 5
HP-UX appsrv2 B.11.11 U 9000/800 04/22/03
09:03:11 text-sz ov proc-sz ov inod-sz ov file-sz ov
09:03:16 N/A N/A 137/8192 0 768/17744 0 1853/14354 0
09:03:21 N/A N/A 137/8192 0 769/17744 0 1855/14354 0
09:03:26 N/A N/A 137/8192 0 769/17744 0 1856/14354 0
sar -v 5 5 on DEVEL:
# sar -v 5 5
HP-UX io B.11.11 U 9000/800 04/22/03
11:07:35 text-sz ov proc-sz ov inod-sz ov file-sz ov
11:07:40 N/A N/A 217/4096 0 1158/34816 0 1536/69642 0
11:07:45 N/A N/A 217/4096 0 1158/34816 0 1536/69642 0
11:07:50 N/A N/A 217/4096 0 1158/34816 0 1536/69642 0
about the patchlists:
"swlist -l fileset | wc "
on DEVEL: 1348
on PROD : 1319
machine specs:
DEVEL: rp2405, 2cpu, 2GB ram
PROD : rp5405, 2cpu, 4GB ram
Machines are different, but both have been patched with the GOLD patchset of December 2002, and also the Java-Out-Of-The-Box patchset, apparently this was installed on development initially and not production and just by putting it on we gained some 10-15% performance increase. One other reason why kernel param numbers are not so in sync is that I've been on the production site for the last 3 weeks and we were "sort of" applying changes direct to the servers there (these are 3 servers for production, same machine.) Still, the 3 prod servers managed to hit the mark MUCH SOONER than the development one which has also 3 DB _instances_ (oracle 9i) running on it.
Hope this information is more help - I've managed to find out the leaks on the application (it was some gif/jpg serving servlet that was not closing its files) using the lsof utility. Our question still remains: why the prod systems hit the mark sooner (apart from the traffic which was higher on production obviously). What other parameters do we need to tune/check for the production systems? Running java-out-of-the-box has already tuned the threads etc (or so it says). Any other pointers?
Finally I want to thank again all of you for the replies - this is my first post in these forums and I've got solaris and linux experience - never before touched the hp-ux beasts...
Thanos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-22-2003 01:53 AM
04-22-2003 01:53 AM
Re: HP-UX Open Files problem
in principle, if you are running one development system and one production system, and if your poduction system is critical, you should have the same hardware and the same software installations and configurations for both, production and development systems.
In your case, hardware is not the same. But, this seems not to be critical, since your production system looks better than the development system. Nevertheless, if it is possible, in the future, upgrade your development systems hardware, to have the same environment as on the production system.
Your software installations may be the same on both systems (I don't know), regardless of the small difference in installed filesets.
At least your kernel configuration is not the same. E.g. your development system has set max_thread_proc to 3000, but on your production system it is set to 256. This seems much too small. I did not check the remainder of your kernel parameter list. But you should do so. Don't try to figure out on the individual parameter, how that could influence your specific problem. Just be sure to have the *same* configuration on both systems. After doing so, go on on analyzing if your problem will still exist.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-22-2003 06:55 AM
04-22-2003 06:55 AM
Re: HP-UX Open Files problem
sar -v 5 5 on PROD:
root@appsrv2#sar -v 5 5
HP-UX appsrv2 B.11.11 U 9000/800 04/22/03
09:03:11 text-sz ov proc-sz ov inod-sz ov file-sz ov
09:03:16 N/A N/A 137/8192 0 768/17744 0 1853/14354 0
09:03:21 N/A N/A 137/8192 0 769/17744 0 1855/14354 0
09:03:26 N/A N/A 137/8192 0 769/17744 0 1856/14354 0
137/8192 = 1.67% utilized for nproc
768/17744 = 4.3% utilized for ninode
1853/14354 = 12.9% utilized for nfile
If these numbers are indicative of your peak usage times during the day then this is a very lightly used server and you should adjust the above kernel parameters until 25% to 50% utilized numbers are indicated.
Your dev server belongs to the A class while your prod server belongs to an L class and they will require different configuring. You can't clone dev onto prod with ignite for instance. Consider this link about the HW differences:
http://www.hp.com/products1/servers/compare_pa-risc.html
Finally, your open file problem: From you description in your second posting its sounds like a rogue runaway process which is a cleanup issue. These are often hard to identify but if you're using shared memory then ipcs and ipcrm will help, as the owner and PID are indicated.
ipcs -ma
ipcrm -m key / shmid
Although this won't kill a rogue process that's gone to init (* reboot *) it will help you evaluate and release your shared memory.