Itanium服务器
1826071 成员
3506 在线
109690 解答
新建帖子

HP-UX 11i v3 - Fuser Hang or Very Slow ISSUE

 
huangzhen00
见习投稿人

HP-UX 11i v3 - Fuser Hang or Very Slow ISSUE

HP-UX 11i v3 - Fuser Hang or Very Slow ISSUE解决:

 

故障现象:

  1. 周六做切机演练,umount一个文件系统速度很慢,一般的花3分钟左右,用#fuser -cu对当前的挂载点进行进程扫描列出进程也很慢,一般也得花3分钟。

    HP-UX 11i v3 - Fuser Hang or Very Slow

Issue

Fuser takes a very long time to complete on all files across all file systems. End-user claiming that only a reboot would solve the issue and then eventually it would happen again. End-user also saw a spike in CPU utilization to 90% during the fuser command execution.

End-user originally said that fuser hung and he had to terminate it. A tusc was ran on the command with the folllowing syntax and the file was analyzed.

/var/tmp/tusc.out -p -e -E -T "" fuser <file> 

Solution

Looking at the tusc file that the end-user sent, focusing on the lines where the pstat(PSTAT_PROC_VM...) is seen and being called as a problem.

First, here's what 1 of the lines looks like:

1292431568.288187 [24910] pstat(PSTAT_PROC_VM, 0x7fffba40, 104, 0, 3) ... = 1     

Where: 104 is the size

0 is the PID

3 is the number of process regions.

Now one of the entries seems out of place:

1292431731.546178 [24910] pstat(PSTAT_PROC_VM, 0x7fffba40, 104, 4012, 65686)= 0     

PID 4012 has 65686 process regions open.

Have end-user look up this pid and it turns out to be an Enterprise Manager Agent for Oracle (Process name = emagent). DBA shutdown the agent and fuser works just fine.

Keywords : fuser hung, slow oracle, server, enterprise, agent, tusc

Here is a sample script that can be used to find which process is using the most pregions:

 

  1. 打电话咨询RCRC给个脚本列出当前的进程中占pregions最多的进程:

 

  #!/usr/bin/sh     # For use on 11iv3 only.  #  # This script uses tusc to determine how many pregions a process has.  # For example, in the following tusc trace:  # pstat(PSTAT_PROC_VM, 0x7fffb620, 104, 17649, 6000) ....... = 1  #                                         ^      ^  #                                         |      |  #                                    PID -       |  #                                                |  #                                    pregion ----  # The script sorts the output and displays it in an easy to read format.  # Keep in mind, if fuser takes 10 minutes to return, so will this script.  #     TUSC=/usr/local/bin/tusc  OUT=/var/tmp/tusc.fuser     ${TUSC} -o ${OUT} -s pstat fuser /stand  PROC=$(awk '{ print $4 $5 }' ${OUT} | sort -nk 2,2 -t , | tail -1) PID=$(echo $PROC | cut -d , -f 1) NUM=$(echo $PROC | cut -d , -f 2 | sed 's/)//')     echo "PID with most pregions is ${PID}"  echo "PID ${PID} has ${NUM} pregions"  echo ""     ps -p ${PID} 

Here is an example of the output:

root:clunk:/tmp> ./preg  /stand:   PID with most pregions is 2408  PID 2408 has 1146 pregions        PID TTY       TIME COMMAND    2408 ?        135:54 mxdomainmgr 

 

  1. 当时系统执行结果:

 

   EDOCDB01[/usr/contrib/bin/tools]#sh /tmp/fuser3.sh

/stand: 

PID with most pregions is 10531

PID 10531 has 35026 pregions

    PID TTY       TIME COMMAND

 10531 ?        3557:38 db2sysc

 

  1. 由执行的结果看:(db2sysc进程号10531最大的pregions数是35026,导致了fuser慢)。

 

5. RC最后发过来一个Patch说可以修复这个Issue

  

 

Symptoms:

       PHCO_43253:

       ( QX:QXCR1001225006 )

       fuser(1m) is very slow if one or more process has

   thousands of memory mapped regions

 

6.Patch分析:从patch的描述看,fuser有两种访问方法,一种是扫描mmaped regions就会花费很长时间在进程的map区导致fuser和umount慢,第二种就是lighter interfaces,不去扫描进程的map区,尽管这些进程的map区存在,执行fuser很快。

 

Defect Description:

       PHCO_43253:

       ( QX:QXCR1001225006 )

       fuser(1m) uses pstat_getproc(2) and pstat_getprocvm(2)

       interfaces to query the information related to

       processes and their mmaped regions. pstat(2) also provides

       a lighter versions.

  1. pstat(PSTAT_PROC_LITE,..) and

                  pstat(PSTAT_PROC_VM_LITE, ..)

       for performance benefits. fuser(1m) is not using

       the lighter interfaces.

 

 

  1. 解决方法;

     打上patch PHCO_43253把fuser默认访问map区的interfaces修改为lighter interfaces方式来提高fuser访问进程的效率。

 

       Resolution:

       fuser(1m) has been modified to use the lighter pstat(2)

       interfaces for better performance.

 

案例总结:fuser默认就是扫描mmaped regions的访问方法,有时候系统umount慢就是重启一下操作系统也可以释放相关进程的map区(其实重启一下相关进程的应用就可以,本例子中的就是DB2的应用),但是最好的方法就是修改fuser的访问方式,变为lighter versions的访问方式,fuser的时候不扫描map区就可以解决问题。

 

2 条回复2
652817
兼职顾问

回复: HP-UX 11i v3 - Fuser Hang or Very Slow ISSUE

tusc厉害;

652817
兼职顾问

回复: HP-UX 11i v3 - Fuser Hang or Very Slow ISSUE

tusc厉害;