- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- Operating System - Tru64 Unix
- >
- Re: Performance problem on a GS1280
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2005 01:55 AM
03-30-2005 01:55 AM
Performance problem on a GS1280
I have some troubles with a GS1280 (32CPU 64Go RAM) that is heavy loaded and present some problem of performance. This computer is used to run large parallel application that used from 1Go to 2Go per process. I have observed that the system TRU64 V5.1 switch often the applications from one process to another that is not good at all for the performance. I would expect the process to be "nearly bound" to one processor during the whole execution time (at least to not switch too often) in order to avoid the time spend by moing the data in cache and mempry (am I clear ?).
So, should I changed something in the sysconfigtab file to avoid such a behavior :
the round_robin_switch_rate is 25 and sched-min-idle is not defined. I did a vmstat 1
and the output is in attachement.
For instance, the problem I had is even if the computer is not fully loaded (10 CPU free for instance) I do not have the same elapse time if I run twice the same parallel application. The difference can be of the order of 20-30%.
The parallel application I use are very demanding on memory band with but not too much on communication between processors.
If some one can help me or at least tell me what to look in order to solve the problem it will be fine.
Regards
Florent Boucher
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2005 02:34 AM
03-30-2005 02:34 AM
Re: Performance problem on a GS1280
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2005 03:34 AM
03-30-2005 03:34 AM
Re: Performance problem on a GS1280
at the moment there are 4 different applications that are running on the cluster and all of them are using MPI. I put in attachement the output from top and also from the ps command with specific options.
I found in the present case that from the output from ps that the cpu #10 is not used and the #12 is used twice. That just means the the system have switch the processes from one processor to another. However, there is from my point of view, no reason to do this and this is not at all efficient when the process are using lot of memorey. Does a way exist to reduce this switch from one processor to another in order to keep the optimum cache and memory performance ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2005 03:45 AM
03-30-2005 03:45 AM
Re: Performance problem on a GS1280
vm:
replicate_user_text
vm_bigpg_enabled - neat big pages feature, seems to be Your case.
(see man sys_attrs_vm)
generic:
sched_distance
(man sys_attrs_generic)
2) Try runing sys_check and healthcheck - they can give some ideas.
3) there is a small tool by Hein van den Heuvel, which can show how memory is allocated for a particular process.
See attachment.
4) Run xmesh utility.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2005 04:14 AM
03-30-2005 04:14 AM
Re: Performance problem on a GS1280
My first step would be to investigate using 'runon -r'
This tells the system to have a command run on a selected rad and stop movement that way.
You can specify multiple rads, and of course you would select those to be adjacent.
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2005 04:28 AM
03-30-2005 04:28 AM
Re: Performance problem on a GS1280
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2005 04:46 AM
03-30-2005 04:46 AM
Re: Performance problem on a GS1280
If you have "a GS1280 (32CPU 64Go RAM) " as you indicate then you will have normally have 32 single-cpu RADs. It is possible to set up the system with 16 double-cpu rads, but that is rarely done/justified.
When I was toying with an application needing about 3 CPUs perf instance I used a stript to launch it and the script parameters looked like:
database_start_prefix = runon -r 0 -r 2 -r 4
central_start_prefix = runon -r 6 -r 7
01_start_prefix = runon -r 1 -r 3 -r 5
02_start_prefix = runon -r 8 -r 10 -r 12
03_start_prefix = runon -r 9 -r 11 -r 13
04_start_prefix = runon -r 16 -r 18 -r 20
05_start_prefix = runon -r 17 -r 19 -r 21
06_start_prefix = runon -r 24 -r 26 -r 28
07_start_prefix = runon -r 25 -r 27 -r 29
08_start_prefix = runon -r 14 -r 15 -r 22
09_start_prefix = runon -r 23 -r 30 -r 31
If you double-check that, then you'll see near-adjacent CPUs being used.
Unlike PSETS, the runon -r is NOT exclusive.
So a single rad can be assigned to mutliple application chunks.
Unfortunately I know nothing about MPI, so I'll have to defer from comment. If you can not split the application large cunks per system there may be no hope. But if you have a choice between one 'solution' using a clump of 32 threads, and a subdivision into 4 - 8 clumps of 8 - 4 cpus then this may lead to happiness.
Check out "vmstat -P", and of course 'man numa_intro'.
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2005 07:18 AM
03-30-2005 07:18 AM
Re: Performance problem on a GS1280
I have used the small program you sent to me.
At the moment on the computer I have 32 process running. Using xmesh, I can see that I have a very large transfert betwen cpu #10 and cpu #12. When I use the ps command with the option given in the attachement, the CPU #10 is not seen as running (but xmesh show that it runs) and top shows 32 process running with a load average close to 100% for all the process. Furthermore, the CPU 12 seems to have two process that is quite strange. It seems to me that ps is reporting the CPU number as the one that has the maximum page allocated for this process. I have use the program you sent to me and it is clear that two process have their maximum memory usage on the same CPU (#12).Do you now how it can happen ? And why the system is not able to switch the memory of one process to the CPU #10 that is not used ?
We use an LSF scheduling policy that suspend certain job when higher priority jobs want to run and then restart them when some CPU are avalaibe. Can it be the problem ?
Concerning the large page memory, can you give me more details about the way to manage ?
Regards
Florent
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2005 07:32 AM
03-30-2005 07:32 AM
Re: Performance problem on a GS1280
I have seen an other process that share now the memory on two processors (#21 and #23) ! So xmesh is showing lot of transfert between this two. Do you think this is an expected behavior ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2005 11:59 PM
03-30-2005 11:59 PM
Re: Performance problem on a GS1280
Could CPU 10 be the home of process 644579 which seems to have started the 7 castepexe_mpi.exe worker processes? If so, woudl it not be the source for 'cow' pages and so on?
>> It seems to me that ps is reporting the CPU number as the one that has the maximum page allocated for this process.
NO. ps reports whatever cpu it is running on. But the Tru64 scheduler tries is utmost to keep teh cpu and memory togehter. Your observations confirm that the scheduler/swapper is doing a good job!
Processes have a 'home rad' and a 'current rad'.
The system 'reluctantly' moves processes away from home. It is the idle thread on idle cpus whichs pulls in / moves over processes if an other rad is seen as being overloaded.
Is the ps command not case of the measuring influencing the measurement? When it runs on a cpu, nothing else runs on that cpu.
Looking back to your original vmstat, I would really think your system is doing az fine job. There may be a few % more here or there, but in general what you have shown looks pretty good.
Have you gotten a change to experiment with runon? You could use that to take an 8-thread job and make sure is stays in a 4 hop zone, and such. These jobs run for a while do they not? You could also use runon to force a child process to stay on a selected cpu after the fact.
Like the vm/rad program?
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-01-2005 03:39 AM
04-01-2005 03:39 AM
Re: Performance problem on a GS1280
>>Do you now how it can happen ? - No...
>>And why the system is not able to switch the memory of one process to the CPU #10 that is not used ? - As far as I know, Tru64 does it best to scedule process close to it's memry. But I newer heard that Tru64 Re-locates proces'es memory to another RAD...
>>Concerning the large page memory, can you give me more details about the way to manage ?
1) #man sys_attrs_vm, read all around vm_bigpg_*
2) Run Kernel tuner, section vm, set vm_bigpg_enabled=1
and reboot.
Your application seems to be memory-intensitive - i.e. good candidate for the feature. Please tell us if You've got performance benefits from big pages.
I've just enabled the feature, and got results (see attachment, it's Oracle dbwriter process).
But this will not resolve 'Foreign RAD' problem.
3) have seen an other process that share now the memory on two processors (#21 and #23) ! So xmesh is showing lot of transfert between this two. Do you think this is an expected behavior ? - Yes, definitely.
4) if You want to pin process to memory, then either go for 'runon' or sched_distance (but sched_distance<=1 can harm performance).
5) Sorry, I am not an HP person, I am just selling lipsticks for Avon :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-02-2005 06:19 AM
04-02-2005 06:19 AM
Re: Performance problem on a GS1280
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-04-2005 03:26 AM
04-04-2005 03:26 AM
Re: Performance problem on a GS1280
I did not had time for the moment to test the vm_bigpg option. For this, I have to reboot the system and I should sent a notification to the users. I think I will do this change in the midle of the week. In the mid time, I would like to come back to the difference between "home rad" and "current rad". On our system, it often happen that job are submitted for many hours (days). So, using the scheduler policy that can suspend one job to start another, it seems possible that two (or even more) heavy job have the same "home rad". Am I rigth ? Of course, unix will try to have different "current rad" for every very demanding process.
Does a way exist to optimize the way the "home rad" are distributed ? One can immagine that unix could move in my case the "home rad" of process 688877 to rad#10 in order to avoid the large transfert between the processors #12 and #10 ?
Concerning the runon, it is impossible to use with mpi jobs. So I do not think I will kept this solution.
Regards
Florent
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-05-2005 03:32 AM
04-05-2005 03:32 AM
Re: Performance problem on a GS1280
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-17-2005 11:26 PM
04-17-2005 11:26 PM
Re: Performance problem on a GS1280
for system and 1% for user if the free memory
of that RAD is to low. I obvserved also,
that 2 different jobs get memory from the same RAD. May be you have a similar problem but not
fully evolved. Unfortunatly I dont see the vmstat -R nor the ps output
mentioned in this thread.
Please have a short look to
http://www.uni-magdeburg.de/urzs/marvel/vmbug3.html
to see what I am talking about.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-17-2005 11:27 PM
04-17-2005 11:27 PM
Re: Performance problem on a GS1280
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-18-2005 01:09 AM
04-18-2005 01:09 AM
Re: Performance problem on a GS1280
it seems to me that we have exactly the same problem. For the moment, no news at all from the HP support. I put in attachement the first output from vmstat -R 5 and the information about memory allocation for the two process that have problem.
I hope somebody will give us some "good" answer to solve the problem.
Regards
Florent
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-18-2005 01:43 AM
04-18-2005 01:43 AM
Re: Performance problem on a GS1280
The tool for the analysis of vm allocation has been given by Alexey. You can find it at the beginning of the thread.
I put it again in attachement again.
Regards
Florent
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-18-2005 02:01 AM
04-18-2005 02:01 AM
Re: Performance problem on a GS1280
I am happy that I am not alone. Thanks for the attachement. I just overlooked the paperclip
symbol on the replies.
I will try out the program together with my testprogram tomorrow on the empty machine. Today its to late for long experiments.
Best regards,
Joerg.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-20-2005 12:55 AM
04-20-2005 12:55 AM
Re: Performance problem on a GS1280
tests. I did some bad things. First I called
date, vmstat and ps by the program using system call. As I remember that is not very clever because usually fork + exec is called and that means, the big GB memory process is
(virtually) doubled for a short time.
I saw that date, ps, etc. took long time instead of short response. So the outcome of my tests are not optimal.
I try to give more details on the mentioned page and on another forum thread (subject: slow down (swapping) on a GS1280 with lot of free memory). As you can understand, its
not my task to use our expensive machine as
testmachine and reboot it all the days.
For first I saw system becoming very slow
if free pages from one RAD was below 3000 down to 10, which was usually at the 16th GB the case (with and without swap).
Today swap was growing very slowly, and speed was not as bad as some days ago which could be a result of the other users (I did not reboot before the new tests).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-20-2005 01:27 AM
04-20-2005 01:27 AM
Re: Performance problem on a GS1280
The program was originall posted in:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=644238
I found it back using google: "+tru64 +rad +gs1280 +site:itrc.hp.com"
imho this currently is (unfortunately) the best way to search the ITRC forum:
Google: +
Regards,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-11-2005 02:35 AM
05-11-2005 02:35 AM
Re: Performance problem on a GS1280
If you read the comments on my thread please have in mind that I dont use bigpages which probably makes the thing more complicate.
If you look at your vmstat -R output you see, that RAD0.free=491 RAD5.free=500 and RAD6.free=3299. At least the values for RAD0 and RAD5 are to low and trigger the paging (high acti and pin/pout greater 0). That causes the performance loss.
I guess that following stupid things happened:
First RAD0 has some memory usage for all the
unecessary deamons, may be running java etc.
If you start MPI first job will be probably started on RAD0 consuming the rest of the memory and steal memory from its neighbor which gives you only a little performance loss. May be this process goes waiting for other MPI threads. Next (or later) MPI thread
is started also on RAD0 because RAD0 is ideling and dont know that the new thread also needs memory, which is not available local. No problem, it takes it again from the neighbours. And so on. Now the Managment of the stolen Memory needs also memory (wired memory) and RAD0.free is low enough to trigger paging. Same happens on other RADs.
So you have RSS
If the system starts stealing from other pages before the memory is so low everything would be fine but not perfect.
I think you get the optimum speed if you
be able to tell each MPI thread where to start. On a good MPI implementation I would expect that the system/library should do that for you. In that case each process would consume local memory and never have to steal pages from neighbours. But if the page stealing would work fine, you had only the
dataflow between processorlinks, which is
also very fast.
Do you use an MPI library delivered by HP?
Try to check, where each job is started
(CPU + RAD) and if some jobs are started on the same processor ask HP what the hell was thinking the designer of the HP MPI implementation as he adapted it to HPs-NUMA.
I would not wonder if the MPI package has no adaptions to GS1280 NUMA technology *sigh*.
Probably they think that the loadmanager
will do the job of the MPI manager.
But you could probably use a trick to outwit
that balancing. Add a CPU consuming function
to each MPI thread. For example calculate PI
for 10seconds giving the loadbalancer time enough to put each job on another RAD and
after that start to consume memory.
May be that fails too, buts an easy test.