General
cancel
Showing results for 
Search instead for 
Did you mean: 

New 11.11 kernel bug behind do_remap_large_page2_0: freeing unaligned large page

mvpel
Trusted Contributor

New 11.11 kernel bug behind do_remap_large_page2_0: freeing unaligned large page

With the outstanding assistance of WTEC S.T., and a crash dump from last November on one of my Superdomes, a new HP-UX 11.11 kernel bug has been identified.

Panic string: do_remap_large_page2_0: freeing unaligned large page

do_remap_large_page2_0+0x674
pdremap+0x554
hdl_user_protect+0x540
vmprotectpageout+0x48
nfspageout+0x234
unmapvnode+0xf8
dispreg+0xl40
exit+0x578
psig+0x304
trap+0x6f4
thandler+0xd20

The problem occurs when a large-page memory region, used for executable and shared libs, becomes dirty for some reason. In our case, it may have been a mmap() of the executable getting written to for some reason we haven't pinned down yet.

The dirty large page then needs to be paged out (nfspageout(), above, though it can happen with non-NFS too), and the large page winds up needing two IO operations to write back to disk.

But the data structure involved (vfspage_t) only has room for one of the two IO records, and the second one overwrites the "pgout_start" and "pgout_end" variables in the structure.

Then, when the do_remap starts, the corrupted "pgout_start" is probably not aligned to the beginning of a superpage, and thus the system panics as a result. They were able to replicate the behavior in the patch lab fairly easily.

As it turned out, it's the same basic problem as was fixed in 11.23 PHKL_34546 (JAGaf65980).

The tentative date for the UNOF patch is March 10, and the PHKL is slated for April 15.

The workaround suggested by S.T. is to insure that the executable or shared library is not writable (chmod 555), which would prevent a PROT_WRITE mmap() from succeeding and thus prevent the problematic large page from becoming dirty and requiring a page-out.
6 REPLIES
mvpel
Trusted Contributor

Re: New 11.11 kernel bug behind do_remap_large_page2_0: freeing unaligned large page

The unofficial patch has been released today, and at this point it is planned to supersede both the PHKL_40651 and PHKL_36133 trees.

---
Creation Date: 11/03/15
Post Date: 11/03/15
Hardware Platforms - OS Releases:
s700: 11.11
s800: 11.11

Filesets:
OS-Core.CORE-KRN,fr=B.11.11,fa=HP-UX_B.11.11_32/64,v=HP
ProgSupport.C-INC,fr=B.11.11,fa=HP-UX_B.11.11_32/64,v=HP
OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_32,v=HP
OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_64,v=HP

Automatic Reboot?: Yes
Critical: Yes
UNOF_shared_lgpg: PANIC
PHKL_40651: ABORT

Symptoms:
UNOF_shared_lgpg:
( QX:QXCR1001107143 )
System panics with the following stack trace and panic string.
panic+0x6c
do_remap_large_page2_0+0x5e8
pdremap+0x24c
hdl_user_protect+0x494
vm_protect_pageout+0x48
vx_do_pageout+0x218
vx_pageout+0xf8
flush_vnode+0x68
vx_flush_vnode+0x3c
vx_unmap+0x168
unmapvnode+0x88
do_munmap+0x368
foreach_pregion+0xf4
munmap+0x74
syscall+0x738
syscallinit+0x55c

Defect Description:
UNOF_shared_lgpg:
( QX:QXCR1001107143 )
unable to handle large pages in page out path

Resolution:
Corrected page out path to handle large pages.
---

The planned release date for the PHKL is still April 15, as far as I know.
mvpel
Trusted Contributor

Re: New 11.11 kernel bug behind do_remap_large_page2_0: freeing unaligned large page

The release date has been bumped to April 30.
mvpel
Trusted Contributor

Re: New 11.11 kernel bug behind do_remap_large_page2_0: freeing unaligned large page

The new patch was released on April 28:

Patch Name:
PHKL_42072

Patch Description:
s700_800 11.11 Cumulative VM, Psets, Preemption, PRM, MRG

Creation Date: 11/04/21

Post Date: 11/04/28

Hardware Platforms - OS Releases:
s700: 11.11
s800: 11.11

Symptoms:
PHKL_42072:
( QX:QXCR1001107143 )
For the files mmaped as MAP_SHLIB, large pages are not correctly being handled in page out path. Because of this system may panic.
The stack trace for the panic may look as follows:
0) panic+0x6c
1) do_remap_large_page2_0+0x5e8
2) pdremap+0x24c
3) hdl_user_protect+0x494
4) vm_protect_pageout+0x48
5) vx_do_pageout+0x1d8
6) vx_pageout+0xf8
7) flush_vnode+0x68
8) vx_flush_vnode+0x3c
9) vx_unmap+0x168
10) unmapvnode+0x88
11) do_munmap+0x128
12) foreach_pregion+0xf4
13) munmap+0x74
14) syscall+0x204
15) syscallinit+0x55c

Defect Description:
PHKL_42072:
( QX:QXCR1001107143 )
For files mmaped as "MAP_SHLIB", certain assumptions about large pages in page out code lead to panic.

Resolution:
The large pages for "MAP_SHLIB" are correctly handled in page out code to prevent this panic.
---

Enjoy!
mvpel
Trusted Contributor

Re: New 11.11 kernel bug behind do_remap_large_page2_0: freeing unaligned large page

Looks like the extra two weeks delay in the release of this patch was to eliminate the supersession of PHKL_40651 - it now only supersedes PHKL_33372 from June 2, 2005.
mvpel
Trusted Contributor

Re: New 11.11 kernel bug behind do_remap_large_page2_0: freeing unaligned large page

Oops... there's apparently a missing supersession in the text file - PHKL_42072 actually supersedes PHKL_36133 (which superseded PHKL_33372), according to the INDEX file and the patch family tree tool.

...
supersedes PHKL_32619.CORE2-KRN,fr=*
supersedes PHKL_33372.CORE2-KRN,fr=*
supersedes PHKL_36133.CORE2-KRN,fr=*
mvpel
Trusted Contributor

Re: New 11.11 kernel bug behind do_remap_large_page2_0: freeing unaligned large page

PHKL_42072