- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Loading an Array (Fortran 90 on Alpha systems)
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-10-2009 09:57 AM
тАО11-10-2009 09:57 AM
This is a question of curiosity.
On an Alpha ES40 with OpenVMS V8.3, is there a difference in performance if using
a) DO II = 1, MAX_VALUE
IVALUE(II) = JVALUE(II)
ENDDO
Or...
b) IVALUE(:) = JVALUE(:)
There are often cases where I would be loading up an array of hundreds of thousands of elements. I sometimes see cases where the latter approach would visibly run faster (clock time), but am not able to maintain that result.
Test evaluations are on Alpha ES40s with 2x 667Mhz EV67 cpus. I generally will compile with the following options....
/CHECK=(BO,UN,OV)
/ALIGN=(COMM=NATU,RECO=PACK)
/OPT=(TUNE=EV67,LEVEL=5)
/REAL=64/EXT
/WARN=(NOALIGN,DECL,NOUSAGE)
/FLOAT=G
Or, is it simply a matter of presenting the same thing two ways...both resulting in the same underlying instruction sets? And my occasional apparent faster executions of method #B are really a matter of cachings, timing with other processes, etc?
Thank you in advance.
-Howard-
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-10-2009 10:19 AM
тАО11-10-2009 10:19 AM
SolutionAs for your question, look at the generated machine code, and see what instruction streams get generated for each.
Look at what the process is doing, too. Is it incurring paging overhead, for instance?
On OpenVMS I64 (and to a rather lesser extent on OpenVMS Alpha), look around for alignment faults, too. Alignment faults can really hit OpenVMS I64 application performance hard.
There are also cases when inlining code is actually slower than size-optimized code, as longer code sequences can knock the code out of L1 or sometimes even L2 processor cache.
And depending on how much you want to look at the low-level activities if your code, have a look at DCPI:
http://h71000.www7.hp.com/openvms/products/dcpi/dcpid.html
Traditional variable-based loading and RMS record-based coding practices do tend to have lower performance. Where permissible by the application and within the available algorithms, I generally look to load whatever I can from a cache or a database or whatever using one or a few big wads of data.
Alternatives to implementing your own copies can include double-mapping data (I've posted source code that double-maps between 32- and 64-bit space, for instance), or (when you really have to move big blocks) using the system block move primitives within OpenVMS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-10-2009 10:41 AM
тАО11-10-2009 10:41 AM
Re: Loading an Array (Fortran 90 on Alpha systems)
That is quite a bit of info to digest, so it may take me a little time to take it all in. Allow me to respond at this time with the following:
"look at the generated machine code" I understand and agree that such would likely shed light on my question. However, I cannot say that I have ever done that, and welcome info on methods to do so.
All to familiar with the alignment fault concerns with the Itanium, and understand that such can also pop up on Alpha. I will certainly check that to see if there is any difference between the two methods.
"cases when inlining code is actually slower that size-optimized code" Very interesting....very.
Thank you for the link to DCPI. I had not heard of that and will look into it....if not for just this question, for future projects.
In this particular case, the loads are occurring from locations in the VM footprint, not from a file or anywhere on disk. As I recall, faulting is similar for both instances...but will confirm. In fact, as I read this info from you, I feel that I could step through in debug and watch the process activity from another session while stepping. (I may have asked my question prematurely...I may already have what is needed to answer it.)
Appreciate the info.
-H-
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-10-2009 11:30 AM
тАО11-10-2009 11:30 AM
Re: Loading an Array (Fortran 90 on Alpha systems)
When you're doing production development, keep the compiler listings and the maps around.
Maintaining these listings and map files is helpful for tracking application crashes and is (also) useful for cases such as this. The former is discussed directly here:
http://labs.hoffmanlabs.com/node/800
And the generated instruction streams are shown in the compiler listings. Which is what you want to see, but for other reasons here.
The best way to go faster is to avoid copying the data altogether where you can (possibly through double-mapping or through the use of indirection and pointers), to work with and to transfer the bigger wads of data as a unit (usually), to look at and implement data caching where that helps, to use system block transfers, and only as a last resort read individual file records or otherwise copy smaller wads of data around. The classic OpenVMS programming techniques are comparatively slow.
Using the source code debugger generally doesn't help all that much with questions involving performance monitoring; that tool is very useful in getting the code running stably and reliably. Then come tools such as the PCS and FLT tools "within" ANALYZE /SYSTEM and the DECset PCA tool and (at the lowest levels) DCPI; tools that specifically target performance monitoring and application profiling.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-10-2009 11:45 AM
тАО11-10-2009 11:45 AM
Re: Loading an Array (Fortran 90 on Alpha systems)
http://labs.hoffmanlabs.com/node/407
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-10-2009 11:46 AM
тАО11-10-2009 11:46 AM
Re: Loading an Array (Fortran 90 on Alpha systems)
RE:"look at the generated machine code" I understand and agree that such would likely shed light on my question. However, I cannot say that I have ever done that, and welcome info on methods to do so."
Perhaps you know the following already.
To get the compiler to tell you what machine code it is generating:
$ fortran /list[=listing_file]/machine_code
Try with a small program that does nothing but load the array, and then at least reference the copied array (otherwise the complier may optimize the code so it does not even load the array)
Then compile with different optimization levels and use the /list and /machine code qualifiers in the compile command.
Then look at the listing.
If Ivalue and Jvalue have the same dimensions, you may be able to copy the whole array with a block move, I would expect that to be near optimal (assuming aligned arrays). If MAX_VALUE can have a value less than the size of the array dimensions, a block move should still work given that both arrays are single dimension.
If you have never looked at the Alpha instruction set, it is quite a bit different than the VAX (in some respects it is more like the pdp-8 than the VAX)
You can download the Alpha architecture manual in pdf format (use Google). It has complete descriptions of the instruction set, but I wouldn't consider it to be tutorial in nature. It's more of a technical specification than a user's guide.
Actually, doing this exercise is a reasonably good way to learn the AXP instructions. Sometimes, the compilers use non-standard mnemonics for the machine instructions, so you may need to look at the Hex opcodes (at left) to be able to map what the instruction is called in the Architecture manual. They also make the machine code listing more human friendly by using the variable names instead of register names. The comments on the right have the actual registers used.
Also, the f90 and f77 compilers are quite different in the code they generate.
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-10-2009 01:45 PM
тАО11-10-2009 01:45 PM
Re: Loading an Array (Fortran 90 on Alpha systems)
>is there a difference in performance?
That's an easy question to answer - time it yourself. Look at CPU time, Clock time and pagefaults. It might help to page in both arrays first, then perform your timing runs.
In theory, the "whole array" copy gives the compiler the option to do a block memory transfer, possibly avoiding the overhead of the loop (increment, test, branch). For small MAX_VALUEs the compiler may unroll a loop, but probably won't coalesce the elements.
If you want to force a block move, try using one of the library routines:
CALL OTS$MOVE3(MAX_VALUE*ValueSize,JVALUE,IVALUE)
On the other hand, if they're large enough, the arrays may need to be moved in and out of memory, in which case the dominant cost to the operation will probably be pagefaults. Plenty of memory and plenty of working set might help.
Note that multi dimensional arrays are a whole different story. If you use nested loops, make sure you use the correct order!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-11-2009 12:53 AM
тАО11-11-2009 12:53 AM
Re: Loading an Array (Fortran 90 on Alpha systems)
That said: My experience on a XP1000 with a EV67 processor is that in most cases the F90 array operations are much faster than the equivalent do-loops.
Jouk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-11-2009 06:27 AM
тАО11-11-2009 06:27 AM
Re: Loading an Array (Fortran 90 on Alpha systems)
At the expense of seeming out of touch and dating myself... In the dark ages array initilization was very sensitive to the order of initialization. I suspect this was probably architecture-specific (VAX) but I haven't deliberately tested it myself since I have usually just ported working HLL code directly to newer platforms. If you're checking code then compare slow to fast and see if one is initializing rows first and the other is initializing columns first. On VAX the difference was rather dramatic.
bob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-12-2009 07:01 AM
тАО11-12-2009 07:01 AM
Re: Loading an Array (Fortran 90 on Alpha systems)
Some of the responses re-sparked stuff in my head. The rest contained information that is new to me; this has been a learning experience. Very rewarding to say the least.
I will be playing with this again at some time shortly. (As I said, this was simply a question of curiosity.) I will keep this thread open for a couple days so that I can post the results...in case any of you are as curious.
Thanks again
-H-