cancel
Showing results for 
Search instead for 
Did you mean: 

strange behavior of ES80

Palima
Occasional Advisor

strange behavior of ES80

Hello Gentles

I have a strange system behavior regarding my system, here is my system information

The system is Es80 4drawers and 8 1Ghz Cpu's, Memory is 24Gb, 16G installed on two drawers and 8 on the other two drawers, it is primary node of cluster that is connected through memory channel, the OS is tru64 5.1b, it is just about one month that I start to work on this environment,

The problem occurs at night not at specific time, the first I discovered the problem was while trying to execute top command, it return with "Memory fault" and it returns the prompt, some times it give this output and other times it works normally, this stay for about 30 minutes then every thing get back to normal,

Also during the problem I faced strange outpout of ps command, as follow
# ps -ef

UID PID PPID C STIME TTY TIME CMD
root 524288 0 30.7 Mar 02 ?? 4-19:31:09 [kernel idle]
root 524289 524288 0.0 Mar 02 ?? 8:58.33 /sbin/init -a
root 524290 524288 0.0 Mar 02 ?? 0:02.99 [kproc_creator_da]
root 524291 524290 0.0 Mar 02 ?? 0:00.00 [icssvr_nomem_dae]
root 524292 524290 0.0 Mar 02 ?? 0:00.00 [icssvr_throttle_]
root 524293 524290 0.0 Mar 02 ?? 0:00.42 [icssvr_daemon_fr]
root 524294 524290 0.0 Mar 02 ?? 0:00.00 [icssvr_daemon_fr]

All fields of TTY are ??, and when executing ps –ef –o pcpu,pmem,comm I get the following output

# ps -ef -o pcpu,pmem,comm | more

%CPU %MEM COMMAND
?.? 4.3 kernel idle
?.? 0.0 init
?.? 0.0 kproc_creator_da
?.? 0.0 icssvr_nomem_dae
?.? 0.0 icssvr_throttle_
?.? 0.0 icssvr_daemon_fr
?.? 0.0 icssvr_daemon_fr
?.? 0.0 icssvr_nanny
?.? 0.0 icscli_throttle_
?.? 0.0 CFS daemon
?.? 0.0 icssvr_daemon_pe
?.? 0.0 icssvr_daemon_pe
?.? 0.0 icssvr_daemon_pe
?.? 0.0 icssvr_daemon_pe
?.? 0.0 icssvr_daemon_pe
?.? 0.0 icssvr_daemon_pe
?.? 0.0 icssvr_daemon_pe
?.? 0.0 vold
?.? 0.0 kloadsrv
?.? 0.0 hotswapd
?.? 0.0 esmd
?.? 0.0 update
?.? 0.0 evmd
?.? 0.0 evmlogger
?.? 0.0 evmchmgr
?.? 0.0 niffd
?.? 0.0 syslogd
?.? 0.0 binlogd
?.? 0.0 icssvr_daemon_pe
?.? 0.0 aliasd
?.? 0.0 aliasd_niff
?.? 0.0 gated

The Cpu usage field also contains ?? , now the worst thing of this is that during the problem no cron jobs works? They just do not start for that period,

Do you have any idea?

Thanks for help
4 REPLIES
Martin Moore
HPE Pro

Re: strange behavior of ES80

What patch kits are installed? Can you post the output of "dupatch -track -type kit" ?

Martin
I work for HP
A quick resolution to technical issues for your HP Enterprise products is just a click away HP Support Center Knowledge-base
See Self Help Post for more details

Palima
Occasional Advisor

Re: strange behavior of ES80

Here is the output


Patches installed on the system came from following software kits:
------------------------------------------------------------------

- T64KIT0019505-V51BB22-E-20030802 OSF540
- T64KIT0019662-V51BB22-E-20030818 OSF540
- T64KIT0025601-V51BB26-E-20050513 OSF540
- T64KIT0026246-V51BB26-E-20050819 OSF540
- T64KIT0026447-V51BB26-ES-20050914 OSF540
- T64KIT1000237-V51BB26-E-20051219 OSF540
- T64KIT1001138-V51BB27-E-20070228 OSF540
- T64KIT1001143-V51BB27-ES-20070305 OSF540
- T64KIT1001176-V51BB27-E-20070328 OSF540
- T64KIT1001178-V51BB27-E-20070330 OSF540
- T64KIT1001187-V51BB27-E-20070404 OSF540
- T64KIT1001188-V51BB27-ES-20070404 OSF540
- T64KIT1001259-V51BB27-E-20070717 OSF540
- T64KIT1001268-V51BB27-ES-20070806 OSF540
- T64KIT1001279-V51BB27-E-20070817 OSF540
- T64KIT1001398-V51BB27-ES-20071207 OSF540
- T64KIT1001449-V51BB27-E-20080304 OSF540
- T64KIT1001450-V51BB27-E-20080305 OSF540
- T64KIT1001460-V51BB27-ES-20080310 OSF540
- T64KIT1001509-V51BB27-E-20080611 OSF540
- T64V51BB22-C0018300-19155-ES-20030702 OSF540
- T64V51BB22-C0019200-19212-E-20030710 OSF540
- T64V51BB22AS0002-20030415 OSF540
- T64V51BB22AS0002-20030415 TCR540
- T64V51BB26AS0005-20050502 OSF540
- T64V51BB26AS0005-20050502 TCR540
- T64V51BB27AS0006-20061208 OSF540
- T64V51BB27AS0006-20061208 TCR540
- TCRKIT1001339-V51BB27-E-20071008 TCR540
- TCV51BB22-C0002701-19121-E-20030630 TCR540



I want to mention something regarding the kernel Variables on my system, I found that the proc field is:
proc:
max_per_proc_address_space = 8589934592
max_proc_per_user = 1024
max_threads_per_user = 4096
maxusers = 8192
per_proc_address_space = 8589934592
per_proc_data_size = 1073741824

I know that some values recommended to be at least the same as Memory (24G), but I don't know if it could be the problem?

Thanks Martin for fast attention,





Martin Moore
HPE Pro

Re: strange behavior of ES80

It sounds like the system is temporarily running low on memory, or at least on memory available for certain operations. This could possibly explain all the symptoms you report: top failing to start because it can't allocate memory for its data structures; ps returning empty fields because it couldn't allocate data structures for reading info from the kernel; cron jobs failing to start because there's insufficient memory to fork/exec a new job. Let me emphasize that this is speculation after the fact, so the above is not a certainty, but I've seen similar things happen before.

Do you run "collect"? If not, enabling it may help identify what's happening in the time leading up to the problem.

Martin
I work for HP
A quick resolution to technical issues for your HP Enterprise products is just a click away HP Support Center Knowledge-base
See Self Help Post for more details

Palima
Occasional Advisor

Re: strange behavior of ES80

Marten,
I had the same suspicion, so that I run collect command the next night after the problem last occurrence, I don't have collect output during the problem, however I have difficulty analyzing the output, I run it with â s cmp option, so now in order to analyze the memory field I have a lot of columns that I don't know on which I have to focus regarding this problem, I think the one is the "Free" field, true? I've read that this means the free memory in megabyte but I always find it very small relative to my memory size, even when I run it at noon, I have 8 rows values and I think every value row related to one cpuâ ¦ oooh my God,

Could you help me regarding the output analysis?

Thanks