Operating System - HP-UX
1834584 Members
3597 Online
110069 Solutions
New Discussion

64 bits migration = loss of performance on HPUX 11.00

 
Laurent Laperrousaz
Regular Advisor

64 bits migration = loss of performance on HPUX 11.00

Hi,
For benchmark purpose I compared My application compiled either with:
32 bits :
/opt/aCC/bin/aCC +O2 +Olibcalls +Oregionsched +Odataprefetch +Oentrysched +Oprocelim +nrv -I/home/TANGO/STLport/stlport -D__unix__ -D__hpux__ -Wl,+s -Wl,+blib +z +Z -mt -AA -ext +DA2.0 +W829,921,652

64 bits :
/opt/aCC/bin/aCC +O2 +Olibcalls +Oregionsched +Odataprefetch +Oentrysched +Oprocelim +nrv -I/home/TANGO/STLport64/stlport -D__unix__ -D__hpux__ -Wl,+s -Wl,+blib +z +Z -mt -AA -ext +DA2.0W +W829,921,652

And the result is that the 64 bits version is slower (10%) than the 32 bits version.

Did I miss something in the way I compile?
Anyboy has an explanation?

Thank you!

Laurent
20 REPLIES 20
H.Merijn Brand (procura
Honored Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

Most 64bit apps will take longer to restart, because they have more to initialize.

Unless you need the larger address space, or you have to link to 64bit libraries, I can see next to no reasons to use 64bit apps.

Good reasons to use 64bit apps:
?? Your app needs more than 2Gb address space
?? Your app needs 64bit integers
?? You have to link to 64bit libs (Oracle/64)

If all you want is performance gain, this is not the path to walk

Enjoy, have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
John Bolene
Honored Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

64 bit can be slower depending on what you are doing

Remember that it is now fetching 2 times more data for each instruction load from memory, then the decode of the instruction has to happen and then the execution of each instruction is close to the same, some are faster, some are slower.

64 bit is much better if you have to access more memory or more data than can be accessed with 32 bit

It is always a good day when you are launching rockets! http://tripolioklahoma.org, Mostly Missiles http://mostlymissiles.com
Laurent Laperrousaz
Regular Advisor

Re: 64 bits migration = loss of performance on HPUX 11.00

The reason why I tested 64 bits compiled application is that I am searching ways to enhance the dramatic slowness of my Application on HPUX.
It is 10 times slower than on Linux and AIX. (it's a multi-process/multithreaded application. inter-process communication is handled via IP sockets).
So, I'm trying anything that could burst the application!

...
A. Clay Stephenson
Acclaimed Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

Your 10% loss is about typical. Most UNIX processes are not CPU bound but rather I/O bound so that the differences between 32-bit and 64-bit performance are barely noticeable. You should not think of 64-bit as a performance booster but rather a resource enhancer. It's onl;y in the case where an application can make use of extremely large data structures (e.g. caches) that 64-bit code outperforms 32-bit code.

As for why your performance is an order of magnitude slower on HP-UX, I suspect it's a difference in the socket implementation. I suggest that your use Glance to examine the process to see where (which system call(s)) are causing the bottleneck(s).
If it ain't broke, I can fix that.
H.Merijn Brand (procura
Honored Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

truss and tusc may be handy there too.

http://hpux.connect.org.uk/hppd/hpux/Sysadmin/tusc-7.3/

Enjoy, have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Laurent Laperrousaz
Regular Advisor

Re: 64 bits migration = loss of performance on HPUX 11.00

I also suspect sockets and I am working on FIFO and/or semaphore+shared memory communication.

I use Glance to monitor the bottleneck. the most called system call is sched_yield (multi-thread version)

I also had to change STL (I use STLport) which gave me 30 % enhancement on the CPU consumption.

I also tried to compile in PBO mode but I never succeeded to generate the profile for the final step of compiling! and I don't think that could really increase more than 10 % the performance level!

Do you agree ?

Olav Baadsvik
Esteemed Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00



Hi,

Performanceproblems could also be a question
of patches.
Do you have the latest patches installed on
the system?
Several patches related to threads fix
problems with performance.

Regards
Olav
Steven E. Protter
Exalted Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

I would follow everthing above make sure I've got performance patches in and....

Collect some performance data looking for bottlenecks... script attached.

and ...

If your application uses random number generation, this software might help, though its really a security item.

random number generator
http://www.software.hp.com/cgi-bin/swdepot_parser.cgi/cgi/displayProductInfo.pl?productNumber=KRNG11I


If your application uses scp/sftp/ssh Secure Shell 3.5 might help.

Secure Shell: a replacement for rcp ftp and telnet that encrypts passwords

http://www.software.hp.com/cgi-bin/swdepot_parser.cgi/cgi/displayProductInfo.pl?productNumber=T1471AA

There is a known problem with the depot for 11.00, if you need that, contact support for a usuable depot.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
A. Clay Stephenson
Acclaimed Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

I've had one other thought - check the 'timeslice' parameter. It's possible that someone has set it to a very small value (< 5) and that can lead to exactly the situation that you are seeing - especially if it's set to 1. Set it to 10 and leave it there.
Check for patches but I would much rather get some data about where the bottlenecks lie before skewing the data too much.
If it ain't broke, I can fix that.
Bill Hassell
Honored Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

As mentioned, 64bit code is not going to be faster than 32bit code, so if you do not need the addressing space, stay with 32bit. However, there have been a number of issues with threading, especially in a multi-processor system. Make sure your machine is up to date on all patches. The SupportPlus bundle of hardware (HWE) and software (QPK) patches should both be installed. Get the patch bundles from:

http://www.software.hp.com/SUPPORT_PLUS/


Bill Hassell, sysadmin
Mike Stroyan
Honored Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

It seems really suspicious that the most common system call is sched_yield. I would expect that most system calls would be blocking calls. A frequent use of sched_yield would indicate that some threads are doing busy waiting, using only sched_yield instead of a proper blocking call. That is a dangerous path, leading to thread starvation and excessive load on the entire system. You should consider where and when you are calling sched_yield. Perhaps you could use a debugger with a breakpoint on sched_yield if you are not familiar with all the application/library code.
For another view of where the time goes, you could use the prospect program profiler. It is very strong at showing where CPU time is spent, but it will also show which system calls are responsible for most of the blocked time. It is available from http://h21007.www2.hp.com/dspp/tech/tech_TechSoftwareDetailPage_IDX/1,1703,3282,00.html
Laurent Laperrousaz
Regular Advisor

Re: 64 bits migration = loss of performance on HPUX 11.00

Thanks to everyone who took time to answer.

-32 bits against 64 bits:
I definitly stopped thinking of 64bits migration even using Oracle because Oracle provides a 32bits version of OCI9 libraries.

- patches: We applied last patches specially those about mutli-threading.

- time_slice: I knew about this item and it has been checked and it's ok (10)

- sched_yield: we use a lot of rwlock and mutexes and I think this is the reason why we have so many sched_yield calls.

- I will try the propect program profiler and let you know about the results

Another system call very often called is get_time_of_day because the application keep track of time spent in every step of the transactions but it seems that this call is not costly...

to be followed!
H.Merijn Brand (procura
Honored Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

time spent in a process (thread) is lest costly asked by using 'times ()'

times(2) times(2)

NAME
times - get process and child process times

SYNOPSIS
#include

clock_t times(struct tms *buffer);

DESCRIPTION
times() fills the structure pointed to by buffer with time-accounting
information. The structure defined in is as follows:

struct tms {
clock_t tms_utime; /* user time */
clock_t tms_stime; /* system time */"
clock_t tms_cutime; /* user time, children */
clock_t tms_cstime; /* system time, children */
};

This information comes from the calling process and each of its
terminated child processes for which it has executed a wait(),
wait3(), or waitpid(). The times are in units of 1/CLK_TCK seconds,
where CLK_TCK is processor dependent The value of CLK_TCK can be
queried using the sysconf() function (see sysconf(2)).

Enjoy, have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Stuart Abramson_2
Honored Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

I just read this on "dutchworks.nl" hpux email group:

Make sure you have PHCO_26960 installed
and set the following environment varialble PTHREAD_DISABLE_HANDOFF=3DON
(as described in the patch text)
Make sure it is in /etc/profile as Java scripts work with posix shells=20

This patch has a significant - positive - impact on threaded =
applications performance

Laurent Laperrousaz
Regular Advisor

Re: 64 bits migration = loss of performance on HPUX 11.00

I have the PCO_26960 patch installed. I even tried to use the calls suggested in the description of the correction but in fact the patch does not contain the thread.h with the new prototypes... I declared them in my own file but the entries are missing in the libpthread

Anyway i set the PTHREAD_DISABLE_HANDOFF=ON
and it had no effect at all!...

rick jones
Honored Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

Depending on the application and what else is going-on, PBO could boost by 10% or more. IIRC, it requires a "clean" termination of the application to get the flow.data file written.

There are some writeups on improving BIND performance:

ftp://ftp.cup.hp.com/dist/networking/briefs/

that while not directly applicable, the methodologies may apply. In this day and age, the references to puma and its profiles should be replaced with something like prospect - http://www.hp.com/go/prospect

High sched_yield() could indeed be poor thread synchronization design - it could also mean some non-trivial mutex contention. By default the pthread_mutex_lock call will "spin" for a short time while trying to acquire a lock (the spinning being cheaper than a full sleep/wakup path. If the trhead is making several sched_yield() calls in a row just pior to calling ksleep() in a tusc trace (remember to use the -l option to display lwpids !-) (or you notice something like a 5 or 7 to one ratio between kspeep and sched_yield in glance) it suggests lock contention. You will then later see some other thread calling kwakeup() with similar arguments to the ksleep() call. One of those is the address of the mutex, which you could use to track-down the mutexes with contention. One of these days we'll have to get a mutex lock contention tool out there to the world at large...

IIRC, default for the compiler is to compile for the architecture (eg PA2.0) on which the compile takes place. The "best" way to ask for 64-bit these days is to use +DD64 - that way you will have one less thing to remember when you migrate to IPF.

Another possibly useful tool would be pi (ftp://ftp.cup.hp.com/dist/networking/tools/) which can be used to display things like tlbmiss rates and cache miss rates and the like. If you see more than a fraction of a percent in handling tlbmisses then you may want to chatr your binary for a larger page size (see the chatr manpage)

You can probably also notice an increase in cache misses when you go to 64-bit from 32-bit, reflecting the increased size of pointers, and so the use of a larger number of cache lines. Some of that can be mitigated if you make sure your structures are layed-out well for 64 bit and don't leave unused holes. Typically, that means putting the pointers and longs (8 byte quantities in 64-bit) first, then the ints, then the shorts etc. If you put say the ints first, and only have three ints in the struct, followed by a pointer, there will be a four-byte "hole" between the third int and the pointer - longs and pointers in LP64 have to be on an eight byte boundary, and IIRC, structs get aligned on the most restrictive alignment of their members.

As for sockets performance, you can check that independent of your application by using netperf - http://www.netperf.org/ and compare across your platforms.

Also keep in mind to compare the "raw" horsepower of all the systems you are using. If you don't do much floating point, you might normalize your results to SPECint2000. PA2.0 systems can run the gamut from anchient and slow 160 MHz boxes to current generation 875 MHz systems.

And as aways, those suggestions to be up on the latest patches - especially the ones talking about thread performance - is goodness.
there is no rest for the wicked yet the virtuous have no pillows
harry d brown jr
Honored Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

Laurent,

as a side note:

You said "communication is handled via IP sockets".

Are you by any chance running BIND aka DNS aka named on your HPux box or have your resolv.conf file pointing to DNS name servers ?

live free or die
harry
Live Free or Die
Laurent Laperrousaz
Regular Advisor

Re: 64 bits migration = loss of performance on HPUX 11.00

I enjoy reading your answers!
even if they are sometimes reminding me some "already known statements" but Magic does exist only in films or fairy tales...

Anyway!
to Harry:
connections are permanent on my application and we essentially use localhost and defined ports (added in /etc/services). No DNS access is necessary.

The real issue is the amazing difference of CPU usage between HPUX implementation and others like AIX 4.3 and Linux...

to Rick:

BEcause you are from HP maybe you can explain me why I could not use the facility PTHREAD_DISABLE_HANDOFF or the associated calls while PCO_26960 is available on our boxes?
rick jones
Honored Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

I'm afraid I'm not familliar with that PTHREAD_DISABLE_HANDOFF feature.

As for CPU util differences, don't forget to account for relative raw horsepower when comparing with other system/OS combinations.

Those tools mentioned earlier may help find some simple bottleneck that could be improved.
there is no rest for the wicked yet the virtuous have no pillows
Bill Hassell
Honored Contributor

Re: 64 bits migration = loss of performance on HPUX 11.00

Just a note about Oracle: one of THE most common questions in these forums is about Oracle not being able to get enough SGA (shared memory) and the reason is always: 32bits. So for any high volume, production database like Oracle, it should always be 64bit with plenty of RAM (more then 4Gb per instance) to be able to perform at the expected level. A 32bit version of Oracle will barely get 950megs (or if appropriately linked: 1750 megs, barely) and every instance will reduce the amount of available RAM from there. 64bit HP-UX can handle dozens of Gb, but a 32bit application is stuck with a very limited addressing space.


Bill Hassell, sysadmin