Operating System - HP-UX
1833777 Members
2586 Online
110063 Solutions
New Discussion

Re: Sun iplanet web server crash and hang hpux 11i

 
SOLVED
Go to solution
Jason Whitener_2
New Member

Sun iplanet web server crash and hang hpux 11i

Our web server is experiencing restarts, whereby our users are being dropped from sessions, but the service reactivates a second later and they are able to log in again.

Occasionally (once a week or so), the server hangs up completely, and usually the process must be killed and then started back up.

The errors look like this:

/mnt/eva/luminis/products/ws/https-cp/logs => grep -i "crash" errors.*
errors.050204:[04/Feb/2005:07:55:39] catastrophe (20311): Server crash detected (signal SIGBUS)
errors.050204:[04/Feb/2005:07:55:39] info (20311): Crash occurred in NSAPI SAF NSServletService
errors.050204:[04/Feb/2005:07:55:43] catastrophe (21640): Server crash detected (signal SIGSEGV)

We are running HPUX 11i, Sun iplanet web server 6.0 sp9, and

java version "1.3.1.13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1.13-040210-15:41)
Java HotSpot(TM) Server VM (build 1.3.1 1.3.1.13-_10_feb_2004_17_15 PA2.0, mixed mode)

We also get these throughout the day:

[07/Feb/2005:12:05:17] failure ( 6902): Error accepting connection (Insufficient resources)

We have concluded that our java heap is running out of room in some cases, however, its currently set at 2.4 gigs. While we can raise that number, we have asked our vendor to look at ways to reduce memory use of the application using the iplanet webserver first. We suspect that the application running under the web service has some sort of memory leak.

What I don't understand, is why the web service can't handle a full heap without crashing?

In addition, there is no particular user load that causes these session-dropping instant restarts, or the the hard crashes. For instance, we see the same restart and crash behavior at odd times, like Sunday morning (very low use day for us).

That is what leads us to believe that throwing more heap at the problem isn't going to resolve anything.

Are there any tips to help improve the stability of sun's webserver on HPUX that you all have encountered?

Or have any of you encountered the specific SIGBUS type errors and been able to resolve them?

thanks,
Jason Whitener
13 REPLIES 13
harry d brown jr
Honored Contributor

Re: Sun iplanet web server crash and hang hpux 11i


What is the patch level of your HP-ux box?

Can you post your kernel parameters: kmtune

Can you give a basic description of your box? (model, memory, disks, IO cards, ...)

How many users?

Is this an internal or External web server?

If External, then Is it protected by firewalls? Is it a bastian host?

What other services are activated? ftp, NFS, telnet, r-cmds, ...


live free or die
harry d brown jr
Live Free or Die
Kent Ostby
Honored Contributor

Re: Sun iplanet web server crash and hang hpux 11i

After the catastrophe line, is there another error line that shows which FUNCTION the crash occurred in ?


"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
rick jones
Honored Contributor
Solution

Re: Sun iplanet web server crash and hang hpux 11i

SIGBUS and SIGSEGV in the NSAPI stuff suggests that maybe some of the add-on NSAPI is buggy.

The error accepting connection message is one you can probably safely ignore - on HP-UX 11 and later, accept() can return an ENOBUF when the remote client terminates the connection before the server gets around to calling accept(). You _might_ see those go away if you disable the early connection indication in ndd - tcp_early_conn_ind

As for tips to improve the stability of Sun's webserver.... well, there are always Apache or Zeus :) Actually, Zeus may be able to run your existing NSAPI modules as it offers NSAPI support.
there is no rest for the wicked yet the virtuous have no pillows
Jason Whitener_2
New Member

Re: Sun iplanet web server crash and hang hpux 11i

The crash is occuring in the "NSAPI SAF NSServletService".

Here's the Kmtune output:
---------------------------------
Parameter: acctresume(Current:)=4
Parameter: acctsuspend(Current:)=2
Parameter: aio_listio_max(Current:)=256
Parameter: aio_max_ops(Current:)=2048
Parameter: aio_physmem_pct(Current:)=10
Parameter: aio_prio_delta_max(Current:)=20
Parameter: allocate_fs_swapmap(Current:)=0
Parameter: alwaysdump(Current:)=1
Parameter: bcvmap_size_factor(Current:)=2
Parameter: bootspinlocks(Current:)=-
Parameter: bufcache_hash_locks(Current:)=128
Parameter: bufpages(Current:)=103020
Parameter: chanq_hash_locks(Current:)=256
Parameter: core_addshmem_read(Current:)=0
Parameter: core_addshmem_write(Current:)=0
Parameter: create_fastlinks(Current:)=0
Parameter: dbc_max_pct(Current:)=50
Parameter: dbc_min_pct(Current:)=5
Parameter: default_disk_ir(Current:)=0
Parameter: desfree(Current:)=-
Parameter: disksort_seconds(Current:)=0
Parameter: dmp_rootdev_is_vol(Current:)=0
Parameter: dmp_swapdev_is_vol(Current:)=0
Parameter: dnlc_hash_locks(Current:)=512
Parameter: dontdump(Current:)=0
Parameter: dskless_node(Current:)=-
Parameter: dst(Current:)=1
Parameter: effective_maxpid(Current:)=-
Parameter: eisa_io_estimate(Current:)=-
Parameter: enable_idds(Current:)=1
Parameter: eqmemsize(Current:)=15
Parameter: executable_stack(Current:)=1
Parameter: fcp_large_config(Current:)=0
Parameter: file_pad(Current:)=-
Parameter: fs_async(Current:)=0
Parameter: ftable_hash_locks(Current:)=64
Parameter: hdlpreg_hash_locks(Current:)=128
Parameter: hfs_max_ra_blocks(Current:)=8
Parameter: hfs_max_revra_blocks(Current:)=8
Parameter: hfs_ra_per_disk(Current:)=64
Parameter: hfs_revra_per_disk(Current:)=64
Parameter: hp_hfs_mtra_enabled(Current:)=1
Parameter: hpux_aes_override(Current:)=-
Parameter: initmodmax(Current:)=50
Parameter: io_ports_hash_locks(Current:)=64
Parameter: iomemsize(Current:)=-
Parameter: ksi_alloc_max(Current:)=64160
Parameter: ksi_send_max(Current:)=32
Parameter: lotsfree(Current:)=-
Parameter: max_async_ports(Current:)=50
Parameter: max_fcp_reqs(Current:)=512
Parameter: max_mem_window(Current:)=0
Parameter: max_thread_proc(Current:)=10000
Parameter: maxdsiz(Current:)=4080218931
Parameter: maxdsiz_64bit(Current:)=3221225472
Parameter: maxfiles(Current:)=8192
Parameter: maxfiles_lim(Current:)=10240
Parameter: maxqueuetime(Current:)=-
Parameter: maxssiz(Current:)=268435456
Parameter: maxssiz_64bit(Current:)=0x40000000
Parameter: maxswapchunks(Current:)=16384
Parameter: maxtsiz(Current:)=0x4000000
Parameter: maxtsiz_64bit(Current:)=0x40000000
Parameter: maxuprc(Current:)=7218
Parameter: maxusers(Current:)=1000
Parameter: maxvgs(Current:)=10
Parameter: mesg(Current:)=1
Parameter: minfree(Current:)=-
Parameter: modstrmax(Current:)=500
Parameter: msgmap(Current:)=4098
Parameter: msgmax(Current:)=8192
Parameter: msgmnb(Current:)=16384
Parameter: msgmni(Current:)=4096
Parameter: msgseg(Current:)=32767
Parameter: msgssz(Current:)=8
Parameter: msgtql(Current:)=4096
Parameter: nbuf(Current:)=51510
Parameter: ncallout(Current:)=20016
Parameter: ncdnode(Current:)=150
Parameter: nclist(Current:)=16100
Parameter: ncsize(Current:)=71328
Parameter: ndilbuffers(Current:)=30
Parameter: netisr_priority(Current:)=-
Parameter: netmemmax(Current:)=-
Parameter: nfile(Current:)=122348
Parameter: nflocks(Current:)=8192
Parameter: nhtbl_scale(Current:)=0
Parameter: ninode(Current:)=66208
Parameter: nkthread(Current:)=20000
Parameter: nni(Current:)=-
Parameter: no_lvm_disks(Current:)=0
Parameter: nproc(Current:)=8020
Parameter: npty(Current:)=60
Parameter: NSTRBLKSCHED(Current:)=-
Parameter: NSTREVENT(Current:)=50
Parameter: nstrpty(Current:)=60
Parameter: NSTRPUSH(Current:)=16
Parameter: NSTRSCHED(Current:)=0
Parameter: nstrtel(Current:)=60
Parameter: nswapdev(Current:)=10
Parameter: nswapfs(Current:)=10
Parameter: nsysmap(Current:)=16040
Parameter: nsysmap64(Current:)=16040
Parameter: o_sync_is_o_dsync(Current:)=0
Parameter: page_text_to_local(Current:)=-
Parameter: pfdat_hash_locks(Current:)=128
Parameter: public_shlibs(Current:)=1
Parameter: region_hash_locks(Current:)=128
Parameter: remote_nfs_swap(Current:)=0
Parameter: rtsched_numpri(Current:)=32
Parameter: scroll_lines(Current:)=100
Parameter: scsi_max_qdepth(Current:)=8
Parameter: scsi_maxphys(Current:)=1048576
Parameter: select_enh(Current:)=0
Parameter: sema(Current:)=1
Parameter: semaem(Current:)=16384
Parameter: semmap(Current:)=258
Parameter: semmni(Current:)=256
Parameter: semmns(Current:)=512
Parameter: semmnu(Current:)=8016
Parameter: semmsl(Current:)=2048
Parameter: semume(Current:)=10
Parameter: semvmx(Current:)=32767
Parameter: sendfile_max(Current:)=0
Parameter: shmem(Current:)=1
Parameter: shmmax(Current:)=0x40000000
Parameter: shmmni(Current:)=200
Parameter: shmseg(Current:)=120
Parameter: st_ats_enabled(Current:)=0
Parameter: st_fail_overruns(Current:)=0
Parameter: st_large_recs(Current:)=0
Parameter: st_san_safe(Current:)=0
Parameter: STRCTLSZ(Current:)=1024
Parameter: streampipes(Current:)=0
Parameter: STRMSGSZ(Current:)=65535
Parameter: swapmem_on(Current:)=0
Parameter: swchunk(Current:)=2048
Parameter: swsp_trc_flags(Current:)=-
Parameter: sysv_hash_locks(Current:)=128
Parameter: tcphashsz(Current:)=0
Parameter: timeslice(Current:)=10
Parameter: timezone(Current:)=420
Parameter: unlockable_mem(Current:)=0
Parameter: vas_hash_locks(Current:)=128
Parameter: vnode_cd_hash_locks(Current:)=128
Parameter: vnode_hash_locks(Current:)=128
Parameter: vol_checkpt_default(Current:)=10240
Parameter: vol_dcm_replay_size(Current:)=262144
Parameter: vol_default_iodelay(Current:)=50
Parameter: vol_fmr_logsz(Current:)=4
Parameter: vol_max_bchain(Current:)=32
Parameter: vol_max_nconfigs(Current:)=20
Parameter: vol_max_nlogs(Current:)=20
Parameter: vol_max_nmpool_sz(Current:)=4194304
Parameter: vol_max_prm_dgs(Current:)=1024
Parameter: vol_max_rdback_sz(Current:)=4194304
Parameter: vol_max_vol(Current:)=8388608
Parameter: vol_maxio(Current:)=256
Parameter: vol_maxioctl(Current:)=32768
Parameter: vol_maxkiocount(Current:)=2048
Parameter: vol_maxparallelio(Current:)=256
Parameter: vol_maxspecialio(Current:)=256
Parameter: vol_maxstablebufsize(Current:)=256
Parameter: vol_min_lowmem_sz(Current:)=524288
Parameter: vol_mvr_maxround(Current:)=256
Parameter: vol_nm_hb_timeout(Current:)=10
Parameter: vol_subdisk_num(Current:)=4096
Parameter: vol_vvr_transport(Current:)=1
Parameter: vol_vvr_use_nat(Current:)=0
Parameter: volcvm_cluster_size(Current:)=16
Parameter: volcvm_smartsync(Current:)=1
Parameter: voldrl_max_drtregs(Current:)=2048
Parameter: voldrl_min_regionsz(Current:)=512
Parameter: voliomem_chunk_size(Current:)=65536
Parameter: voliomem_maxpool_sz(Current:)=4194304
Parameter: voliot_errbuf_dflt(Current:)=16384
Parameter: voliot_iobuf_default(Current:)=8192
Parameter: voliot_iobuf_limit(Current:)=131072
Parameter: voliot_iobuf_max(Current:)=65536
Parameter: voliot_max_open(Current:)=32
Parameter: volraid_rsrtransmax(Current:)=1
Parameter: vps_ceiling(Current:)=64
Parameter: vps_chatr_ceiling(Current:)=1048576
Parameter: vps_pagesize(Current:)=4
Parameter: vx_bc_bufhwm(Current:)=0
Parameter: vx_fancyra_enable(Current:)=0
Parameter: vx_maxlink(Current:)=32767
Parameter: vx_ncsize(Current:)=1024
Parameter: vx_ninode(Current:)=0
Parameter: vxfs_max_ra_kbytes(Current:)=1024
Parameter: vxfs_ra_per_disk(Current:)=1024
Parameter: vxtask_max_monitors(Current:)=32
--------------------------------

I'll get the patch state in a bit.



rick jones
Honored Contributor

Re: Sun iplanet web server crash and hang hpux 11i

what might be more useful would be a stack trace :) does a core file get left behind somewhere that you could load-up into a debugger?
there is no rest for the wicked yet the virtuous have no pillows
Jason Whitener_2
New Member

Re: Sun iplanet web server crash and hang hpux 11i

Yes, our vendor is working with Sun atm analyzing core dumps, but I thought I'd post here to see if anything obvious stood.

The problem has been going on for many months now without a resolution in sight.

harry d brown jr
Honored Contributor

Re: Sun iplanet web server crash and hang hpux 11i

You show

dbc_max_pct=50
dbc_min_pct=5

Usually you set these to lower, like 10 & 5.


bufpages=103020
nbuf=51510

Usually bufpages and nbuf should be set to 0. Where these settings suggested by the vendor(s) ??

live free or die
harry d brown jr
Live Free or Die
Peyman Javaheri
Frequent Advisor

Re: Sun iplanet web server crash and hang hpux 11i

Could you please describe the reasoning behind why buf should be setup that way?

peyman;
TwoProc
Honored Contributor

Re: Sun iplanet web server crash and hang hpux 11i


Question: What I don't understand, is why the web service can't handle a full heap without crashing?

I've run into this very same question with a large java app also. The answer I got back from developers, etc was that the Java engine itself needs some of that room for it's own work - so if you fill it - there's no room for java itself to put stuff in the heap. You need to leave headroom. I did get java a bit bigger than what you're running at, but it doesn't get much bigger!

In the end - the real solution that we had was get development to rewrite the program to reduce the amount of heap necessary to run. As simple an answer and as complicated a solution as that.
We are the people our parents warned us about --Jimmy Buffett
rick jones
Honored Contributor

Re: Sun iplanet web server crash and hang hpux 11i

toss-up question - is iPlanet (or whatever Sun calls it now) available as a 64-bit application for HP-UX?
there is no rest for the wicked yet the virtuous have no pillows
Peyman Javaheri
Frequent Advisor

Re: Sun iplanet web server crash and hang hpux 11i

No iPlanet 6.0 SP9 is 32 bit.
Jason Whitener_2
New Member

Re: Sun iplanet web server crash and hang hpux 11i

"You show

dbc_max_pct=50
dbc_min_pct=5

Usually you set these to lower, like 10 & 5.


bufpages=103020
nbuf=51510

Usually bufpages and nbuf should be set to 0. Where these settings suggested by the vendor(s) ??

live free or die
harry d brown jr"

I had to ask why this was done. Our sys admin explained that when bufpages and nbuf are not zero, dbc_max_pct and dbc_min_pct are ignored. The resulting amount, rather than being a percent, is bufpages x nbuf, or, if I did my math right, about 600 megs for our system.
Jason Whitener_2
New Member

Re: Sun iplanet web server crash and hang hpux 11i

"After the catastrophe line, is there another error line that shows which FUNCTION the crash occurred in ?"

Sorry, missed what you were referring to. No, there are no lines below NSServletService referring to a specific funtion. I find that odd. Perhaps the NSServletService itself is experiencing a problem.