1756762 Members
2593 Online
108852 Solutions
New Discussion юеВ

a nice enigma!

 
SOLVED
Go to solution
Stefan Farrelly
Honored Contributor

Re: a nice enigma!


Hi Tim,

aha, so they do have different numbers of filesets (patches+software) installed. The only way to ensure the software install is identical is to start by ensuring the same number of installed filesets. Just curious - how many filesets different were they ?
Im from Palmerston North, New Zealand, but somehow ended up in London...
Tim D Fulford
Honored Contributor

Re: a nice enigma!

sn1c --> 1361
sn2b --> 1734

I have checked "patches" which is probably more important and there are many a difference. I'm not wholy convinced of the patch stuff, but I will dig a bit deeper.

On a slightly different tack, I looked at another cluster running similar (but different version) software and found that despite the fact it had been running for some 7-8 weeks the priorities are 20.

My current favorite is the background process as ALL the processes that are fathered by pmd have a nice value of 24 even the ones with a priority of 0 (zero)...

Any more thoughts, any one, generosity is my middle name....

Tim
-
Mladen Despic
Honored Contributor

Re: a nice enigma!

Tim,

From your 'top' and 'glance' samples, pmd is only active on the 2nd node. It may be interesting to know which processes on the 1st node are consuming CPU "nicely".

Also, if the patches on the two nodes are different, that *may* be the cause. Have you also checked with 'swlist -l fileset -a state' if all patches are configured?

Mladen
Mark van Hassel
Respected Contributor
Solution

Re: a nice enigma!

Hi Tim,

Nice value don't change over time. They are set when a process start or ,indeed, inherited from the parent.
The thing that does change is priority (see top).When (Time Shared) processes run, they loose priority and regain priority as they wait their turn to run. A process's nice value is used as a factor in calculating how fast a process regains priority.
Priority queues:
-32 - -1 : Real time (POSIX)
0 - 127 HPUX real time (rtprio)
128 - 251 Time Share procs
252 - 255 Swapped processes

HtH,

Mark
The surest sign that life exists elsewhere in the universe is that none of it has tried to contact us
Mladen Despic
Honored Contributor

Re: a nice enigma!

Tim,

You can also check the differences between the files /var/adm/sw/swagent.log on the two nodes.
Another useful check may be the output from 'kmtune'. Any differences may point you further in terms of how the two nodes are different.

As for the CPU utilization, can you list top 2 or 3 processes that consume most of the CPU on each system?

Mladen
Paula J Frazer-Campbell
Honored Contributor

Re: a nice enigma!

Tim

Two identical machines running the same jobs for the usr / sys and nice to match would have to have "Exactly" the same processes running at the the same point of execuation at the same time.

Even this is unlikly as the hardware throughput of devices CPU/ MEMORY/ETC whilst rated the same is not.


So if you have processes "NICED" exactly the same on both machines the value of nice from top or glance will never match.

Paula
If you can spell SysAdmin then you is one - anon
Tim D Fulford
Honored Contributor

Re: a nice enigma!

pmd is running on both nodes if not (believe me) we would be in deeeeep do-do's. I do apriciate that pmd does not run continously, it does very little (spawns, re-spawns, starts, halts & monitors it's children). It may well not show much in the first Glance (immidiate) but you will see that it has consumed some 91.3 seconds of CPU since 11 May

As far as the configured state of the software everything is "configured", there are a few items in the "installed" state, but I can explain these, nothing is "partial" or "corrupt"

regards

Tim
-
Tim D Fulford
Honored Contributor

Re: a nice enigma!

Mladen - I do not need to do a top to tell you that it is fsdlexe | errord, they ALWAYS are top-of-the-pops, if not we're doing no work.

I have however done the following
# ps -el | awk '$8=="24"{print $0}'

This shows that ALL the processes started by pmd have a nice value of 24. As I believe nice is an inhereted value I think this is damming evedence that someone either re-niced pmd or started the application as a background process.

Paula - I'm not sure what you are saying.
a) No two machines are alike therefore you would not expect to see usr/nice the same. I partially agree, but I would not expect to see the pattern in the nice.txt file which is totally reversed.
b) The machines are different, so the nice values will be different. I disagree, I would expect to see a nice value of 20 across the board, it is the same software/binaries (with some minor exceptions)

I'm still figuring that someone started the application in the background or re-niced pmd.

Regards

Tim
-
Paula J Frazer-Campbell
Honored Contributor

Re: a nice enigma!

Tim

How about nicing the process to what it should be and monitor it.


I use a script that picks up certain logings and nices them down as their routines can cause load problems on my main server.

I am sure that you can modify it to monitor the nice value of your process.

--------------------------------------------
#!/bin/ksh
# Automatically nice down the ftpbbs universe routines
######################################################
# PJFC 2001
######################################################
# Get parent pids
######################################################
q=`who -u | grep ftpbbs `
p=`who -u | grep ftpbbs | awk '{print $7, $15 }'`
######################################################
# Seperate each pid to a string
######################################################
a=`echo $p | awk '{print $1}'`
b=`echo $p | awk '{print $2}'`
######################################################
# Pick up pid of universe process and nice value
######################################################
y=`ps -efl | grep $a | grep -v grep | grep -v sh | grep root | grep uv | awk '{print $4}'` # PID
z=`ps -efl | grep $a | grep -v grep | grep -v sh | grep root | grep uv | awk '{print $8}'` # Nice value
######################################################
# Check nice value
if [ $z = 20 ]
######################################################
# If nice value = 20 then a restart has occured so nice it down
######################################################
then
renice -n 19 $y
fi
######################################################
# Do it all again for other ftpbbs login
######################################################
w=`ps -efl | grep $b | grep -v grep | grep -v sh | grep root | grep uv | awk '{print $4}'`
x=`ps -efl | grep $b | grep -v grep | grep -v sh | grep root | grep uv | awk '{print $8}'`
######################################################
# Check nice value
if [ $x = 20 ]
######################################################
# If nice value = 20 then a restart has occured so nice it down
######################################################
then
renice -n 19 $w
fi
echo "Renice ran "
exit 1

---------------------------------------------

HTH

Paula
If you can spell SysAdmin then you is one - anon
John Bolene
Honored Contributor

Re: a nice enigma!

No processes just get niced for any reason.

They have to be niced when they start by the command line or if someone changes them after they have started running.

Nicing is a people thing, HPUX does not just nice processes because it feels like it.
It is always a good day when you are launching rockets! http://tripolioklahoma.org, Mostly Missiles http://mostlymissiles.com