Operating System - HP-UX
1752509 Members
5130 Online
108788 Solutions
New Discussion юеВ

Using profile-guided optimization with shared libraries and multiple executables

 
SOLVED
Go to solution
Alan Lehotsky
Occasional Advisor

Using profile-guided optimization with shared libraries and multiple executables

I have a complex application with a large number of executables sharing several shared libraries.

The root executable will exec(1) other executables which will then call into the shared library.

Do I need to build EACH executable main program with +Oprofile=collect or does building just the root executable with profile collection enabled cause all subsequent callers of the shared library to collect profiling information within the shared library?
10 REPLIES 10
Dennis Handly
Acclaimed Contributor

Re: Using profile-guided optimization with shared libraries and multiple executables

You need to compile everything with +Oprofile=collect to instrument everything (that's important for performance).

And then you use +Oprofile=use to recompile after collection your flow.data file(s).
Alan Lehotsky
Occasional Advisor

Re: Using profile-guided optimization with shared libraries and multiple executables

BTW, I compiled TWO of the executables with +Oprofile=collect and ran my training data.

When it finished, there were a bunch of flow.data. and flow.data.lock files left in addition to a 9MB flow.data file.

Attempting to compile the shared library with +Oprofile=use and that flow.data file resulted in aCC errors complaining that flow.data was locked.

And, I am only trying to optimize the heavily travelled execution paths in my shared library, so I don't care if I don't have profiling information from EVERY executable that uses the shared library.
Dennis Handly
Acclaimed Contributor

Re: Using profile-guided optimization with shared libraries and multiple executables

What version of aC++ are you using? The latest is A.06.20.

>When it finished, there were a bunch of flow.data. and flow.data.lock files left in addition to a 9MB flow.data file.

What's in those flow.data. files? They may be created because you fork and exec the same process?

>that flow.data file resulted in aCC errors complaining that flow.data was locked.

If the application is finished, you can remove the .lock files.

>I am only trying to optimize the heavily traveled execution paths in my shared library, so I don't care if I don't have profiling information from EVERY executable

Ok, that should work.
Alan Lehotsky
Occasional Advisor

Re: Using profile-guided optimization with shared libraries and multiple executables

Using A.06.15.

We definitely fork our executables. This is a parallel processing application. The flow.data. files are large (600kb+), and the flow.data..lock files are all of the form

hp8

where is a process id of a process that's gone. There's also a flow.data.log file; it's full of complaints about ffw not being able to write to the flow.data file because it's locked.

Should I delete the temp files, append them to the flow.data file or what?
Dennis Handly
Acclaimed Contributor
Solution

Re: Using profile-guided optimization with shared libraries and multiple executables

>The flow.data. files are large (600kb+)

Then these probably have useful data in them.

>the flow.data..lock files are all of the form

I think you can remove these.

>There's also a flow.data.log file; it's full of complaints about ffw not being able to write to the flow.data file because it's locked.

Oh boy. Perhaps the log will tell you which are valid? And for which executable.

>Should I delete the temp files, append them to the flow.data file or what?

You probably need to merge them with the flow.data file, see fdm(1).
You probably want to create a new combined file:
fdm -o flow.data_BIG flow.data flow.data. ...

And use flow.data_BIG when recompiling:
+Oprofile=use:flow.data_BIG
Nathaniel McIntosh
New Member

Re: Using profile-guided optimization with shared libraries and multiple executables

Collecting flow.data files for multiprocess/multithreaded applications with shared libraries is tricky, no question about it. Here are some suggestions on how you might be able to manage the process more effectively.

The first thing that you can do to make your life easier is to take advantage of a feature that we call flow path qualifiers. When a +Oprofile=collect application finishes execution, it looks at the setting of the environment variable FLOW_DATA. If this "FLOW_DATA" set to a file name or path name, the +Oprofile=collect runtime will try to write the accumulated data to that file or path. You can also tack on the following additional qualifiers to your FLOW_DATA setting:

Suffix Effect
------ ------
,per-process qualifies flow files with executable name
,unique qualifies flow files with process ID

Here is an example that should illustrate:

% /opt/ansic/bin/cc himom.c +Oprofile=collect -o first.exe
% cp first.exe second.exe
% setenv FLOW_DATA "myflow,per-process"
% ./first.exe
hi mom!
% ./second.exe
hi mom!
% ls -ltr myflow*
-rw-rw-r-- 1 me 0 May 13 13:48 myflow,per-process.err
-rw-rw-r-- 1 me 1700 May 13 13:48 myflow.first.exe
-rw-rw-r-- 1 me 1924 May 13 13:48 myflow.first.exe.log
-rw-rw-r-- 1 me 1700 May 13 13:48 myflow.second.exe
-rw-rw-r-- 1 me 2396 May 13 13:48 myflow.log
-rw-rw-r-- 1 me 1931 May 13 13:48 myflow.second.exe.log
% setenv FLOW_DATA "anotherflow,per-process,unique"
% ./first.exe
hi mom!
% ./first.exe
hi mom!
% ./second.exe
hi mom!
%
-rw-rw-r-- 1 me 0 May 13 13:50 anotherflow,per-process,unique.err
-rw-rw-r-- 1 me 1700 May 13 13:50 anotherflow.first.exe-22030
-rw-rw-r-- 1 me 1946 May 13 13:50 anotherflow.first.exe-22030.log
-rw-rw-r-- 1 me 1700 May 13 13:50 anotherflow.first.exe-22036
-rw-rw-r-- 1 me 1946 May 13 13:50 anotherflow.first.exe-22036.log
-rw-rw-r-- 1 me 3594 May 13 13:50 anotherflow.log
-rw-rw-r-- 1 me 1700 May 13 13:50 anotherflow.second.exe-22042
-rw-rw-r-- 1 me 1953 May 13 13:50 anotherflow.second.exe-22042.log
%

As Dennis suggests, you can merge the resulting flow files together after the fact, picking and choosing the things you are interested in. For example, if I know that "first.exe" is performance-critical and "second.exe" non-performance-critical, then I can select only the "first.exe" flow files for merging into my final destination.

With regard to locked flow files: the tools that create flow files will never intentionally leave a flow file locked; if you wind up with flow.data.lock files after the run is complete, then it means that something went wrong somewhere along the line (some process somewhere got interrupted or encountered an error during a flow.data update).

Let me know if this helps. There are other more arcane things you can try if this doesn't do the trick.



Alan Lehotsky
Occasional Advisor

Re: Using profile-guided optimization with shared libraries and multiple executables

Thanks. That worked like a charm; and merging all the flow data boosted my PGO performance to 22% faster than without profiling.
Alan Lehotsky
Occasional Advisor

Re: Using profile-guided optimization with shared libraries and multiple executables

It would be a good thing to extend the documentation in the user's guide to cover profiling shared libraries.
Dennis Handly
Acclaimed Contributor

Re: Using profile-guided optimization with shared libraries and multiple executables

>That worked like a charm; and merging all the flow data boosted my PGO performance to 22% faster than without profiling.

Have you looked into -ipo?

>It would be a good thing to extend the documentation in the user's guide to cover profiling shared libraries.

Which document was this?
aC++ Online Help:
http://docs.hp.com/en/14487/options.htm#optprofilebasedoptopt
Optimizing Itanium-based applications:
http://h21007.www2.hp.com/portal/site/dspp/menuitem.863c3e4cbcdc3f3515b49c108973a801/?ciid=c208dd324de02110dd324de02110275d6e10RCRD
Linker: Profile-Based Optimization
http://docs.hp.com/en/14640/OnlineHelp/applicationperformance.htm#OPTPBO