Operating System - HP-UX
1752482 Members
6115 Online
108788 Solutions
New Discussion юеВ

Panic reboot on 9000/785/C360 (HP-UX 10.20)

 
Vlad Kluew
Occasional Contributor

Panic reboot on 9000/785/C360 (HP-UX 10.20)

Hi Everybody,

We have a number of 9000/785/C360s running HP-UX 10.20. It has been noticed that system could perform a spontaneous reboot. All machines configured identically. Each machine has FDDI LAN card in addition to built-in LAN interface. The built-in is not up for all the time, but it does not matter. It has been noticed that C360s reboots happen at the time when there is an excessive network traffic: a some kind of flood of ICMP and/or SNMP datagramms. Here it is what the crash dump analysis showed:
--------------------------------------------------------------------------------------------
System Name: HP-UX
Node Name: xxxxxxx01
Release: 10.20
Version: A
Model: 9000/785/C360
Machine ID: 2014383000
Processors: 1
Architecture: PA-RISC 2.0
CPU is a: PCXU/PA-8000
Physical Mem: 256.00 MBytes

The system had been up for 9.55 days (82528525 ticks).
Load averages: 3.66 2.14 2.08.

System went down at: Mon Sep 22 06:14:51 2003

+--------------------------------------+
| Message Buffer |
+--------------------------------------+

vuseg=173d000
inet_clts:ok inet_cots:ok nfs_init3 added vfs type nfs3 at slot 5
NOTICE: cachefs_link(): File system was registered at index 6.
NOTICE: autofs_link(): File system was registered at index 7.
8 ccio
8/0 GSCtoPCI
BCX ATTACH entered vendor id 0x105d dev id 0x2339
8/0/1/0 img
8/0/3/0 ifi
8/0/19/0 c720
8/0/19/0.6 tgt
8/0/19/0.6.0 sdisk
8/0/19/0.7 tgt
8/0/19/0.7.0 sctl
8/0/20/0 btlan3
8/0/63 asio0
8/16 bus_adapter
8/16/4 asio0
8/16/5 c720
8/16/5.7 tgt
8/16/5.7.0 sctl
8/16/0 CentIf
8/16/1 audio
8/16/7 ps2
8/16/10 fdc
8/16/10.1 pflop
10 ccio
10/0 GSCtoPCI
32 processor
49 memory

System Console is on the Built-In Serial Interface
btlan3: Initializing 10/100BASE-TX card at 8/0/20/0....
Unable to allocate all equivalent entries (0)
Networking memory for fragment reassembly is restricted to 20393984 bytes
Logical volume 64, 0x3 configured as ROOT
Logical volume 64, 0x9 configured as SWAP
Logical volume 64, 0x9 configured as DUMP
Swap device table: (start & size given in 512-byte blocks)
entry 0 - major is 64, minor is 0x9; start = 0, size = 1572864
Dump device table: (start & size given in 1-Kbyte blocks)
entry 0 - major is 31, minor is 0x6000; start = 629599, size = 262145
Starting the STREAMS daemons.
B2352B HP-UX (B.10.20) #1: Sun Jun 9 08:03:38 PDT 1996

Memory Information:
physical page size = 4096 bytes, logical page size = 4096 bytes
Physical: 262144 Kbytes, lockable: 177876 Kbytes, available: 190236 Kbytes

interrupt type 15, pcsq.pcoq = 0.3e5ab0, isr.ior = 0.51
savestate ptr = 0x42ca90, savestate return ptr = 0x3e5a0c
B2352B HP-UX (B.10.20) #1: Sun Jun 9 08:03:38 PDT 1996
panic: (display==0xb800, flags==0x0) Data page fault

PC-Offset Stack Trace (read across, most recent is 1st):
0x002467bc 0x0021ee00 0x0021f8d0 0x00229300 0x003e5ab0 0x003ddecc
0x003dcc68 0x003e7018 0x003e5ed0 0x003ddd40 0x003d9b74 0x003e7730
0x001e3988 0x00050c4c 0x00128600 0x001115e8 0x000d9f64 0x000d8d0c
0x0004e808
End Of Stack

NOT sync'ing disks (on the ICS) (0 buffers to flush):
0 buffers not flushed
90 buffers still dirty


+--------------------------------------+
| Processor activity |
+--------------------------------------+
Processor 0 started it by panic'ing. Here is the stack trace:
stack trace for event 0
crash event was a panic
The Save State registers for this level are:

r0 /r1 /r2 0x0 0x1 0x246790
r3 /r4 /r5 0x2 0xf 0x31000
r6 /r7 /r8 0x42ca90 0x51 0x4a9810
r9 /r10/r11 0xe0 0x48c8e8 0x40
r12/r13/r14 0x40 0x3d 0x1aca000
r15/r16/r17 0x1aca000 0x1 0x1ac87c0
r18/r19/r20 0x60c040 0x3fe 0x0
r21/r22/r23 0x726e2070 0x450668 0xa
r24/r25/r26 0x0 0xa 0x3148c
r27/r28/r29 0x48e0e8 0x7 0x2
r30/r31/r32 0x42d048 0x3e
sr0 /sr1 /sr2 0x0 0x7437400 0x0
sr3 /sr4 /sr5 0x0 0x0 0x2793000
sr6 /sr7 /sr8 0x1e00800 0x0
LEVEL FUNC ARG0 ARG1 ARG2 ARG3
lev 0) panic+0x10 n/a n/a n/a n/a
lev 1) report_trap_or_int_and_panic+0xe8 0x2 0xf 0x42ca90 n/a
lev 2) interrupt+0x458 n/a 0x42ca90 n/a n/a
lev 3) $ihndlr_rtn+0x0 n/a n/a n/a n/a
lev 4) IPh8_2_1_bdm_rcv_buf_alloc+0x218 0x1ace700 0x1000 n/a n/a
lev 5) IPh8_2_1_motofsi_postreads+0x5c 0x1aca000 n/a n/a n/a
lev 6) IPh8_2_1_motofsi_rcv_done_cb+0x1f8 0x1aca000 0x1ac87c0 n/a n/a
lev 7) IPh8_2_1_ipcommon_cache_sync+0x130 0x1ac9200 0x1a0c580 0x1b55003 0x3d
lev 8) IPh8_2_1_bdm_rcv_buf_sync+0x1c0 0x1ace700 0x1ac87c0 n/a 0x10001
lev 9) IPh8_2_1_motofsi_rcv_done+0x7e8 0x1aca000 n/a n/a n/a
lev 10) IPh8_2_1_motofsi_isr_pci+0x84 0x1aca000 n/a n/a n/a
lev 11) IPh8_2_1_ipcommon_isr_func+0x38 0x1742a00 0x1ac9200 0x0 0x0
lev 12) dino_isr+0x230 n/a 0x42c030 n/a n/a
lev 13) inttr_emulate_save_fpu+0xf0 n/a n/a n/a n/a
lev 14) soo_select+0x28 0x121d320 0x1 n/a n/a
lev 15) selscan+0x128 0x7ffe6a70 0x7ffe6a7c 0x7f n/a
lev 16) select+0x6bc n/a n/a n/a n/a
lev 17) syscall+0x75c n/a n/a n/a n/a
lev 18) $syscallrtn+0x0 n/a n/a n/a n/a

Processor 0: servicing interrupt
-------------
can not find unwind or stub descriptor for
pc==0x0`0053ffcc
-------------


Attention, immediate reporting for WSIO disks switched on!!

We have logged 1 request to deallocate memory pages.
Please check the diagnostic logs, it is most likely that a memory board
needs to be exchanged.

--------------------------------------
I checked WSIO and memory and found certain problems with them. But I still have a few questions:

1) What is a functional description of IPh8_2_1_xxx functions?
2) Could the "Data page fault" be a not having appropriate patch installed problem? If it could what patches should I check?
3) Could you please share any idea what happened on this particular crash?

Any help would be highly appreciated.

Thanks,
Vlad

 

 

P.S. This thread has been moved from Workstations - Itanium-Based, hp9000, Visualize to HP-UX > sysadmin - HP Forums moderator

3 REPLIES 3
Eugeny Brychkov
Honored Contributor

Re: Panic reboot on 9000/785/C360 (HP-UX 10.20)

Vlad,
are all these 360s reboot/panic the same way? If yes, then I would not think that all have bad memory. But anyway I advise you to get into PDC and check PDT (page deallocation table) and PIM (processor internal memory) in service menu and MEmory information in information menu. Get all these outputs out of there and attach to your next reply
Eugeny
Vlad Kluew
Occasional Contributor

Re: Panic reboot on 9000/785/C360 (HP-UX 10.20)

One part is common: trace event 0:

lev 4) IPh8_2_1_bdm_rcv_buf_alloc+0x218 0x1ace700 0x1000 n/a n/a
lev 5) IPh8_2_1_motofsi_postreads+0x5c 0x1aca000 n/a n/a n/a
lev 6) IPh8_2_1_motofsi_rcv_done_cb+0x1f8 0x1aca000 0x1ac87c0 n/a n/a
lev 7) IPh8_2_1_ipcommon_cache_sync+0x130 0x1ac9200 0x1a0c580 0x1b55003 0x3d
lev 8) IPh8_2_1_bdm_rcv_buf_sync+0x1c0 0x1ace700 0x1ac87c0 n/a 0x10001
lev 9) IPh8_2_1_motofsi_rcv_done+0x7e8 0x1aca000 n/a n/a n/a
lev 10) IPh8_2_1_motofsi_isr_pci+0x84 0x1aca000 n/a n/a n/a
lev 11) IPh8_2_1_ipcommon_isr_func+0x38 0x1742a00 0x1ac9200 0x0 0x0

that's why I'm looking for information about this group of functions. Machines reboot only after "bombardment" by ICMP/SNMP packets, otherwise there is no problem at all. C360 stays up and does not think about any reboot. That's why I'm thinking about patches. But thanks for your information. I'll definitely do that.

Vlad
Bill Hassell
Honored Contributor

Re: Panic reboot on 9000/785/C360 (HP-UX 10.20)

The panic (system crash) is caused by a data page fault, essentially, the kernel made a mistake in referncing a memory address such as an odd address for an index or pointer. This is always a patch issue. Although 10.20 is completely obsolete, you can (and should) download the last set of patch bundles. There are 3: the HardWare Enablement (HWE), the General Release (QGR) and the DIAGNOSTICS which includes the online monitoring. For HWE, see

http://www.software.hp.com/SUPPORT_PLUS/hwcr.html

For GR, see

http://www.software.hp.com/SUPPORT_PLUS/gr.html

I would pick a single machine, add the HWE first (it's smaller), then GR patches and finally the DIAGNOSTICS. This will fix hundreds of problems, many of which have existed but you did not see them or have been working around the issues.


Bill Hassell, sysadmin