- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Performance slower on RX8640 then Blade BL860c...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-24-2009 09:29 PM
тАО11-24-2009 09:29 PM
Performance slower on RX8640 then Blade BL860c and rx2620
we have 2 RX8640 Servers, each with 3 fully populated cell boards ( 24cpu/192Gig in one npar) runing vpars and HPUX 11.23.
Oracle RAC 10G was running at 60% CPU load.
To increase capacity HP added a fourth cell board to each RX8640 (took it from a working Dev Rx8640 - same model number).
Servers booted fine, but when users started to use the database, all CPUs hit 100%, and i/o dropped to almost zero. We had to
delete the cell board from the npar to restore normal operation. Has anybody else had this problem?
As these are production servers it is hard to reproduce the error. So we created a test 64bit C program that adds the
contents of 2 large arrays and uses 1 Gig of memory. Average run times are as follows:
BL870c 10 seconds
BL860c 12 seconds
rx2620 12.2 sec
rx8640 with once cell board 17.2 sec
rx8640 with once cell board 23.5 sec
rx8640 with once cell board 28.5 sec
Do you think we have missed something when adding the cell board ( parmodify ├в p 0 ├в a 3:base:y:ri )? It this a memory
interleaving problem ?
This is the code. I am enclosing the executible. Can anybody else benchmark their servers please.
/* memtest.c */
#define BIG 100000000
int a[BIG],b[BIG],c[BIG];
int main()
{
int i,j;
for(j=0;j<10;++j)
for(i=0;i a[i]=b[i]+c[i];
return 0;
}
# cc +DD64 -o memtest memtest.c
# time ./memtest
Isaac Loven
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-24-2009 10:21 PM
тАО11-24-2009 10:21 PM
Re: Performance slower on RX8640 then Blade BL860c and rx2620
Check the firmware revision of the MP. Log to MP and run the command sysrev.
Rgds-Kranti
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-24-2009 10:26 PM
тАО11-24-2009 10:26 PM
Re: Performance slower on RX8640 then Blade BL860c and rx2620
Hope this helps!
Regards
Torsten.
__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.
__________________________________________________
No support by private messages. Please ask the forum!
If you feel this was helpful please click the KUDOS! thumb below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-25-2009 12:28 AM
тАО11-25-2009 12:28 AM
Re: Performance slower on RX8640 then Blade BL860c and rx2620
Yes, most likely. Though "all CPUs hit 100%, and I/O dropped to almost zero" seem rather extreme.
Those blade and non-cell based systems can run rings around a cell based system if you don't get interleaving right.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-25-2009 03:52 AM
тАО11-25-2009 03:52 AM
Re: Performance slower on RX8640 then Blade BL860c and rx2620
Your program has no threads and I assume you run it such that only a single instance is instantiated at a time (multiple runs for the average, but as multiple executions sequentially, not a massive parallel execution). [I'd also comment that your program uses 100,000,000 * 3 64-bit integers, so you're really looking at 24 * 100,000,000 bytes or 2.235Gb [not 1Gb]].
So -- what you have is the following scenarios.
Assume a 1 to 4 cell nPartition.
The cost of accessing memory in the same cell as you are running is X.
The cost of accessing memory in an adjacent cell is Y (where X < Y).
For a single cell rx8640 -- all accesses are X.
When you add a cell, if all memory is configured to be interleaved, your private object has a 50% chance of getting a cache line in the same cell and a 50% chance of getting a remote cell (assuming balanced ILV, each cell contributes equal cache lines). Hence your application accesses at .5*X + .5*Y (which since Y is strictly greater than X is greater than X).
When you add _another_ cell, your accesses become .33*X + .66*Y [approx, 1/3 and 2/3 really].
And when you add the fourth cell, your accesses become .25*X + .75*Y.
As you can tell -- you approach Y instead of X (and if you went 8 cell this actually gets more interesting in that usually there's a higher cost for some cells rather than others). Hence why you see your output climbing with additional cells (that's how I interpret your "with once cell board" since otherwise, I'm not sure what you're saying).
Now if your application were moving processor context such that your accesses were also across the machine, you'd have a better chance of any given access being local -- this is what ILV is meant for, objects shared across the entire platform.
What you would want for your application to perform here is Cell Local memory (you'd configure each cell to only give 75% [or 50%, etc.]) to the interleave). Then all accesses in your program would stay X regardless of how many cells were in the system (assuming sufficient CLM in the cell the program is executing in, of course).
With 64Gb per cell (and a need for 2.235Gb for your program) configuring each cell to have 1/8th of memory cell local would probably be enough [assuming little else running to steal your CLM in the given execution context] to see more performant behavior. Since Oracle does work with a large shared memory set which is typically accessed from everywhere in the partition -- you certainly want to leave significant ILV configured, but you may want to consider reducing ILV. At a bare minimum you'd want to have enough for the SGA, your SYS memory load (since this is v2) and a reasonable extra for things like binaries, shared libraries, etc. Having some CLM available will help the process-private data accesses.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-25-2009 06:19 AM
тАО11-25-2009 06:19 AM
Re: Performance slower on RX8640 then Blade BL860c and rx2620
So yes, 1Gb -- or close enough. Kind of irrelevant, but worth precluding a post where you have to tell me I screwed up.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-25-2009 06:23 AM
тАО11-25-2009 06:23 AM
Re: Performance slower on RX8640 then Blade BL860c and rx2620
Hope this helps!
Regards
Torsten.
__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.
__________________________________________________
No support by private messages. Please ask the forum!
If you feel this was helpful please click the KUDOS! thumb below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-25-2009 08:42 AM
тАО11-25-2009 08:42 AM
Re: Performance slower on RX8640 then Blade BL860c and rx2620
Well then, the big thing to check would be that your cpu and memory align reasonably in the vPar.
Most especially, if you have a vPar with only a cell's (or sub-cell or really, really close to a cell) worth of processors and the I/O hubs are off of the same cell -- it would be worth your while to configure CLM such that the vPar can have either a little ILV and almost all memory as CLM. Effectively, you want a vPar on a multi-cell nPar to look like the smallest number of cells as possible if you want performance.
And with the caveat that this isn't official "This is supported" doctrine -- I configure IPF vPars with no ILV all the time. I swear every time folks ask this there's some firmware or vPar reason to keep some ILV around -- but in my opinion it is worth it if you have a vPar running in a single or sub-cell to be 100% local to that cell. And hence, it is worth a try to see if you can configure a vPar that way (since if the vPar won't load, you can just add some ILV [keep the nPar with some] and reload the vPar).
Any vPars you have which require more than a cell or two of resources (say you use 3 vPars, 2 fit in less than a cell apiece [maybe the same cell] and the other requires the other 3 cells), you can plan your nPar for it in a way that's good for the vPars but not the usual pattern for nPar mode. (In this case, you could configure the nPar with only 3 cells contributing to the ILV by 50%, and the 4th cell 100% CLM and place the two sub-cell vPars in the 4th cell with the multi-cell vPar getting all the ILV and resources of the other 3 cells).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-12-2010 08:01 PM
тАО01-12-2010 08:01 PM
Re: Performance slower on RX8640 then Blade BL860c and rx2620
So we organised to replicate the problem at DR ( identical HW, adding 4th cell board to rx8640). With 3 cell boards, oracle ran fine. With 4 cell boards, the RAC slowed enormoursly and CPUs hit 100%.
After much testing we found the huge load is during the "sqlplus as /" connection, even before an sql is run ( we had to run 100 in parallel to see the problem).
Finally Oracle identified Bug 9205576: CONNECTION TAKES MORE TIME WITH PRE_PAGE_SGA=TRUE IN 4 CELL COMPARED TO 3 CELLS. The SGA was fully scanned on each sqlplus connection. Poblem is solved by changing pre_page_sga to false.
Thanks to all for taking the time to respond to my question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-12-2010 08:03 PM
тАО01-12-2010 08:03 PM