- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- Operating System - Tru64 Unix
- >
- Re: Performance problem on a GS1280
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2005 01:55 AM
тАО03-30-2005 01:55 AM
Performance problem on a GS1280
I have some troubles with a GS1280 (32CPU 64Go RAM) that is heavy loaded and present some problem of performance. This computer is used to run large parallel application that used from 1Go to 2Go per process. I have observed that the system TRU64 V5.1 switch often the applications from one process to another that is not good at all for the performance. I would expect the process to be "nearly bound" to one processor during the whole execution time (at least to not switch too often) in order to avoid the time spend by moing the data in cache and mempry (am I clear ?).
So, should I changed something in the sysconfigtab file to avoid such a behavior :
the round_robin_switch_rate is 25 and sched-min-idle is not defined. I did a vmstat 1
and the output is in attachement.
For instance, the problem I had is even if the computer is not fully loaded (10 CPU free for instance) I do not have the same elapse time if I run twice the same parallel application. The difference can be of the order of 20-30%.
The parallel application I use are very demanding on memory band with but not too much on communication between processors.
If some one can help me or at least tell me what to look in order to solve the problem it will be fine.
Regards
Florent Boucher
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2005 02:34 AM
тАО03-30-2005 02:34 AM
Re: Performance problem on a GS1280
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2005 03:34 AM
тАО03-30-2005 03:34 AM
Re: Performance problem on a GS1280
at the moment there are 4 different applications that are running on the cluster and all of them are using MPI. I put in attachement the output from top and also from the ps command with specific options.
I found in the present case that from the output from ps that the cpu #10 is not used and the #12 is used twice. That just means the the system have switch the processes from one processor to another. However, there is from my point of view, no reason to do this and this is not at all efficient when the process are using lot of memorey. Does a way exist to reduce this switch from one processor to another in order to keep the optimum cache and memory performance ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2005 03:45 AM
тАО03-30-2005 03:45 AM
Re: Performance problem on a GS1280
vm:
replicate_user_text
vm_bigpg_enabled - neat big pages feature, seems to be Your case.
(see man sys_attrs_vm)
generic:
sched_distance
(man sys_attrs_generic)
2) Try runing sys_check and healthcheck - they can give some ideas.
3) there is a small tool by Hein van den Heuvel, which can show how memory is allocated for a particular process.
See attachment.
4) Run xmesh utility.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2005 04:14 AM
тАО03-30-2005 04:14 AM
Re: Performance problem on a GS1280
My first step would be to investigate using 'runon -r'
This tells the system to have a command run on a selected rad and stop movement that way.
You can specify multiple rads, and of course you would select those to be adjacent.
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2005 04:28 AM
тАО03-30-2005 04:28 AM
Re: Performance problem on a GS1280
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2005 04:46 AM
тАО03-30-2005 04:46 AM
Re: Performance problem on a GS1280
If you have "a GS1280 (32CPU 64Go RAM) " as you indicate then you will have normally have 32 single-cpu RADs. It is possible to set up the system with 16 double-cpu rads, but that is rarely done/justified.
When I was toying with an application needing about 3 CPUs perf instance I used a stript to launch it and the script parameters looked like:
database_start_prefix = runon -r 0 -r 2 -r 4
central_start_prefix = runon -r 6 -r 7
01_start_prefix = runon -r 1 -r 3 -r 5
02_start_prefix = runon -r 8 -r 10 -r 12
03_start_prefix = runon -r 9 -r 11 -r 13
04_start_prefix = runon -r 16 -r 18 -r 20
05_start_prefix = runon -r 17 -r 19 -r 21
06_start_prefix = runon -r 24 -r 26 -r 28
07_start_prefix = runon -r 25 -r 27 -r 29
08_start_prefix = runon -r 14 -r 15 -r 22
09_start_prefix = runon -r 23 -r 30 -r 31
If you double-check that, then you'll see near-adjacent CPUs being used.
Unlike PSETS, the runon -r is NOT exclusive.
So a single rad can be assigned to mutliple application chunks.
Unfortunately I know nothing about MPI, so I'll have to defer from comment. If you can not split the application large cunks per system there may be no hope. But if you have a choice between one 'solution' using a clump of 32 threads, and a subdivision into 4 - 8 clumps of 8 - 4 cpus then this may lead to happiness.
Check out "vmstat -P", and of course 'man numa_intro'.
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2005 07:18 AM
тАО03-30-2005 07:18 AM
Re: Performance problem on a GS1280
I have used the small program you sent to me.
At the moment on the computer I have 32 process running. Using xmesh, I can see that I have a very large transfert betwen cpu #10 and cpu #12. When I use the ps command with the option given in the attachement, the CPU #10 is not seen as running (but xmesh show that it runs) and top shows 32 process running with a load average close to 100% for all the process. Furthermore, the CPU 12 seems to have two process that is quite strange. It seems to me that ps is reporting the CPU number as the one that has the maximum page allocated for this process. I have use the program you sent to me and it is clear that two process have their maximum memory usage on the same CPU (#12).Do you now how it can happen ? And why the system is not able to switch the memory of one process to the CPU #10 that is not used ?
We use an LSF scheduling policy that suspend certain job when higher priority jobs want to run and then restart them when some CPU are avalaibe. Can it be the problem ?
Concerning the large page memory, can you give me more details about the way to manage ?
Regards
Florent
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2005 07:32 AM
тАО03-30-2005 07:32 AM
Re: Performance problem on a GS1280
I have seen an other process that share now the memory on two processors (#21 and #23) ! So xmesh is showing lot of transfert between this two. Do you think this is an expected behavior ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2005 11:59 PM
тАО03-30-2005 11:59 PM
Re: Performance problem on a GS1280
Could CPU 10 be the home of process 644579 which seems to have started the 7 castepexe_mpi.exe worker processes? If so, woudl it not be the source for 'cow' pages and so on?
>> It seems to me that ps is reporting the CPU number as the one that has the maximum page allocated for this process.
NO. ps reports whatever cpu it is running on. But the Tru64 scheduler tries is utmost to keep teh cpu and memory togehter. Your observations confirm that the scheduler/swapper is doing a good job!
Processes have a 'home rad' and a 'current rad'.
The system 'reluctantly' moves processes away from home. It is the idle thread on idle cpus whichs pulls in / moves over processes if an other rad is seen as being overloaded.
Is the ps command not case of the measuring influencing the measurement? When it runs on a cpu, nothing else runs on that cpu.
Looking back to your original vmstat, I would really think your system is doing az fine job. There may be a few % more here or there, but in general what you have shown looks pretty good.
Have you gotten a change to experiment with runon? You could use that to take an 8-thread job and make sure is stays in a 4 hop zone, and such. These jobs run for a while do they not? You could also use runon to force a child process to stay on a selected cpu after the fact.
Like the vm/rad program?
Hein.