System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

EMC Process doesn't die. HP-UX 11.00

 
likid0
Honored Contributor

EMC Process doesn't die. HP-UX 11.00

Hi,

 

We have a hp-ux 11.00 server on a V2600 system. I have a process that is not diying, so I can't deactivate the VG and I have to reboot the server, not nice with this old hardware..

 

 

> vgchange -a n /dev/vgp_bcv
vgchange: Couldn't deactivate volume group "/dev/vgp_bcv":
Device busy
> fuser /dev/vgp_bcv
/dev/vgp_bcv: 4833o
> ps -ef | grep 4833
root 4833 1 0 Oct 23 ? 234442:10 /opt/ecc/exec/MHR520/mhragent
root 25627 6226 1 08:27:41 pts/4 0:00 grep 4833

 

I tried stoping from the init.d script sbin/init.d> ls -l eccmad, didn't work

 

a kill or Kill -9 signal didn't working, with lsof, it has a FD, 26, that looks really bad, it has no type or device:

 

> lsof -p 4833
COMMAND   PID USER   FD   TYPE      DEVICE  SIZE/OFF  NODE NAME
mhragent 4833 root  cwd   VDIR 64,0x120003      2048  1092 /opt/ecc (/dev/vgOracle/lvecc)
mhragent 4833 root  txt   VREG 64,0x120003  18931712  1137 /opt/ecc (/dev/vgOracle/lvecc)
mhragent 4833 root  mem   VREG     64,0x14     49152  3867 /opt (/dev/vg00/lvol5)
mhragent 4833 root  mem   VREG 64,0x120003   4603904  1136 /opt/ecc (/dev/vgOracle/lvecc)
mhragent 4833 root  mem   VREG     64,0x14    139264  3742 /opt (/dev/vg00/lvol5)
mhragent 4833 root  mem   VREG     64,0x14    413696  3748 /opt (/dev/vg00/lvol5)
mhragent 4833 root  mem   VREG     64,0x14    176128  3762 /opt (/dev/vg00/lvol5)
mhragent 4833 root  mem   VREG     64,0x14    323584  3766 /opt (/dev/vg00/lvol5)
mhragent 4833 root  mem   VREG     64,0x14     53248  3754 /opt (/dev/vg00/lvol5)
mhragent 4833 root  mem   VREG     64,0x14    176128  3738 /opt (/dev/vg00/lvol5)
mhragent 4833 root  mem   VREG     64,0x14    151552  3718 /opt (/dev/vg00/lvol5)
mhragent 4833 root  mem   VREG     64,0x14    135168  3714 /opt (/dev/vg00/lvol5)
mhragent 4833 root  mem   VREG     64,0x14   1335296  3730 /opt (/dev/vg00/lvol5)
mhragent 4833 root  mem   VREG     64,0x14    102400  3724 /opt (/dev/vg00/lvol5)
mhragent 4833 root  mem   VREG     64,0x14     53248  3758 /opt (/dev/vg00/lvol5)
mhragent 4833 root  mem   VREG     64,0x14    241664  3734 /opt (/dev/vg00/lvol5)
mhragent 4833 root  mem   VREG     64,0x16     20480   115 /usr (/dev/vg00/lvol7)
mhragent 4833 root  mem   VREG     64,0x16     40960 20473 /usr (/dev/vg00/lvol7)
mhragent 4833 root  mem   VREG     64,0x16     15912 12152 /usr/lib/tztab
mhragent 4833 root  mem   VREG     64,0x16    147456 22971 /usr (/dev/vg00/lvol7)
mhragent 4833 root  mem   VREG     64,0x16    143360  4505 /usr (/dev/vg00/lvol7)
mhragent 4833 root  mem   VREG     64,0x16    282624 22576 /usr/lib/libm.2
mhragent 4833 root  mem   VREG     64,0x16     24576 20786 /usr/lib/libdld.2
mhragent 4833 root  mem   VREG     64,0x16   1572864 18871 /usr/lib/libc.2
mhragent 4833 root  mem   VREG     64,0x16    258048 20784 /usr/lib/dld.sl
mhragent 4833 root  mem   VREG     64,0x17       532 15915 /var/spool/pwgr/status
mhragent 4833 root    0r  VCHR       3,0x2       0t0 13570 /dev/null
mhragent 4833 root    2w  VCHR       3,0x2 0x9422491 13570 /dev/null
mhragent 4833 root    4w  PIPE  0xb2121c08         0  2113
mhragent 4833 root    6uw VREG     64,0x15         0    29 /tmp (/dev/vg00/lvol6)
mhragent 4833 root    8w  VREG 64,0x120003   1718514  1026 /opt/ecc (/dev/vgOracle/lvecc)
mhragent 4833 root   10r  VDIR     64,0x1f      4096 11727 /dev/vgprodmiscdmx_bcv
mhragent 4833 root   12u  unix  0xb2138800       0t0       /var/spool/sockets/pwgr/client4833
mhragent 4833 root   14r  VDIR     64,0x1f      4096 11727 /dev/vgprodmiscdmx_bcv
mhragent 4833 root   16u  inet  0xb2543468       0t0   TCP *:11021 (LISTEN)
mhragent 4833 root   18w  VREG 64,0x120003   3015509  1027 /opt/ecc (/dev/vgOracle/lvecc)
mhragent 4833 root   20u  VREG 64,0x120003     15356  1000 /opt/ecc (/dev/vgOracle/lvecc)
mhragent 4833 root   22w  VREG 64,0x120003         0   984 /opt/ecc (/dev/vgOracle/lvecc)
mhragent 4833 root   24u  VREG 64,0x120003     15356  1037 /opt/ecc (/dev/vgOracle/lvecc)
mhragent 4833 root   26r                                   0x5a4f1b98 file struct, ty=0, op=0xb1c4c0  ---> this one here!
mhragent 4833 root   28w  VREG 64,0x120003         0  1058 /opt/ecc (/dev/vgOracle/lvecc)
mhragent 4833 root   30r  VBLK     64,0x1f       0t0     0 / (/dev/vg00/lvol3)
 

I image thats why the process can't die, it can't close that file descriptor, do you know how a FD can end like this?:

 

mhragent 4833 root   26r                                   0x5a4f1b98 file struct, ty=0, op=0xb1c4c0

 

I also tried taking a look at the process, with tusc but it errors out:

 

/> tusc -f -p 4833
tusc: retrying attach to process 4833 ("/opt/ecc/exec/MHR520/mhragent"): Interrupted system call

 

 

 

A little more info on the process:

 

> file /opt/ecc/exec/MHR520/mhragent

/opt/ecc/exec/MHR520/mhragent:    PA-RISC1.1 shared executable dynamically linked
> what /opt/ecc/exec/MHR520/mhragent
/opt/ecc/exec/MHR520/mhragent:
    $Revision: 92453-07 linker linker crt0.o B.11.33 020617 $
    HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP32) $Revision: 75.04 $
     $Id: emcpvapi_defs.h,v 1.55 2003/09/23 20:41:57 srackley Exp $
     $Id: emcpvapi.h,v 1.97.2.1 2003/10/10 15:38:26 zjiang Exp $
     $Id: emcpvapi_defs.h,v 1.55 2003/09/23 20:41:57 srackley Exp $
     $Id: emcpvapi.h,v 1.97.2.1 2003/10/10 15:38:26 zjiang Exp $
     $Id: emcpvapi_defs.h,v 1.55 2003/09/23 20:41:57 srackley Exp $
     $Id: emcpvapi.h,v 1.97.2.1 2003/10/10 15:38:26 zjiang Exp $
     $Id: emcpvapi_defs.h,v 1.55 2003/09/23 20:41:57 srackley Exp $

/> ps -elf | grep -i mhragent
  1 R     root  4833     1  0 152 20         b2229500 22762                -  Oct 23  ?        234442:10 /opt/ecc/exec/MHR520/mhragent
  1 S     root  8403 26001  1 154 20        11f31c800   24         ae56fb2e 09:04:02 pts/7     0:00 grep -i mhragent
  1 I     root 22698     0  0 182 20                0    0                - 04:49:28 ?         0:00 /opt/ecc/exec/MHR520/mhragent

 

 

Any Idea what can cause this?, is there I way to kill the process?, any idra what the mhragent does?

 

Thanks

Windows?, no thanks
6 REPLIES
Dennis Handly
Acclaimed Contributor

Re: EMC Process doesn't die. HP-UX 11.00

>a kill or kill -9 signal didn't working,  is there a way to kill the process?

 

If SIGKILL doesn't work, there is nothing you can do but reboot.  It's blocked on I/O.

 

>I image that's why the process can't die, it can't close that file descriptor

 

This may be unrelated.

Kris_Knigga
Trusted Contributor

Re: EMC Process doesn't die. HP-UX 11.00

If memory serves, it's an EMC ControlCenter agent.

 

Version 5.20 sounds old, I think the last time I used ECC we were using version 6+ (over a year ago).  You might look into upgrading.  You (or your SAN folks) should be able to push the upgrade from ECC.


Kris Knigga
likid0
Honored Contributor

Re: EMC Process doesn't die. HP-UX 11.00

Yes, as you say no way of ending the process.

 

Just curious what you make out of this:

 

mhragent 4833 root   26r                                   0x5a4f1b98 file struct, ty=0, op=0xb1c4c0  ---> this one here!

 

seen it before in the output of lsof ?

 

Also this looks bad:

 

> ps -ef | grep -i ecc
    root  4833     1  0  Oct 23  ?        234442:10 /opt/ecc/exec/MHR520/mhragent
    root 22698     0  0 04:49:28 ?         0:00 /opt/ecc/exec/MHR520/mhragent

 

 

Could it be that 22698 is a child of 4833, and 4833 is waiting for its child processes to exit?

 

ppid of 0 for a userland process?

 

l

Windows?, no thanks
Dennis Handly
Acclaimed Contributor

Re: EMC Process doesn't die. HP-UX 11.00

>Just curious what you make out of this:

 >mhragent 4833 root   26r                         0x5a4f1b98 file struct, ty=0, op=0xb1c4c0

 

Not sure, would have to look at the lsof source?

The man page doesn't mention spaces for TYPE and DEVICE.  It does mention a kernel reference address for FIFOs.

 

>Also this looks bad:
    root  4833     1  0  Oct 23  ?        234442:10 /opt/ecc/exec/MHR520/mhragent
    root 22698    0  0 04:49:28 ?                0:00 /opt/ecc/exec/MHR520/mhragent

 

 >Could it be that 22698 is a child of 4833, and 4833 is waiting for its child processes to exit?

 >ppid of 0 for a userland process?

 

No, PID 0 is the swapper process.  Or the PPID of a bunch of kernel processes.

If you're thinking of a zombie, they have a name of -3 and their PPID still points to their PPID.

rariasn
Honored Contributor

Re: EMC Process doesn't die. HP-UX 11.00

Hi:

 

# ps -ef | grep -i ecc |grep -v grep

 

    root  8009     1  0  Mar  1  ?        185:47 /usr/ecc/exec/MHR600/mhragent
    root  7809     1  0  Mar  1  ?        27:07 /usr/ecc/exec/mstragent
    root  7880  7809  0  Mar  1  ?         6:02 /usr/ecc/exec/mstragent -s

 

#/sbin/init.d/eccmad stopall

 

 **********************************************************************
 *          Sending STOPALL to EMC ControlCenter ALL Agents           *
 *--------------------------------------------------------------------*
 *    ALL Sub-Agents will be stopped with the Master Agent.           *
 *    Run with 'stop' to stop only Master Agent.                      *
 **********************************************************************

EMC ControlCenter Master Agent stopping ...
EMC ControlCenter Master Agent stopped.
EMC ControlCenter Sub Agents stopping ...

EMC ControlCenter Sub Agents stopped.

 

# ps -ef | grep -i ecc |grep -v grep | awk '{print $2}' | xargs kill -9

 

rgs,

 

 

likid0
Honored Contributor

Re: EMC Process doesn't die. HP-UX 11.00

Yes, Dennis that's my doubt how can a userlan process as mhragent end up with the PPID of 0(swapper), when only kernel processes have the PPID of 0
Windows?, no thanks