Operating System - HP-UX
1839151 Members
4359 Online
110136 Solutions
New Discussion

Re: Unexpected Performance Differences

 
Pete Devlin
Valued Contributor

Unexpected Performance Differences

Can anyone help - I can't get to the bottom of this:

I have an application (Oracle database, Manugistics processes) that we developed on a K380 2way (240MHz) with 1Gb RAM connected with F/W SCSI to an HP HVD10 surestore disk array, HP-UX11.00 32-bit.

This moved into production on a K580 6way (240MHz) with 8Gb RAM connected to an EMC sym 4 array, HP-UX 11.00 64-bit.

The database and application are 32-bit.

There is a batch process that does a read in from the database followed by a big number crunch followed by a write back to disk.

The whole processes takes 1hr 30min on the K380 and around 2hr on the K580.

The process is single threaded, so all the CPUs are getting used for this, but we had expected at least the same performance.

Immediately we thought it must be I/O and it does appear that the K580 spends longer on the reading from disk, and the number crunching
takes a similar amount of time (as you would expect on one thread).

However we don't see any I/O wait - it's almost like the machine idles along getting the data when it feels like.

I have run sar, vmstat, sarcheck, iostat, glance etc, but I cannot see anything unusual except:

1. The process spends nearly all its time in the run queue - when its waiting it's waiting on pipe.

2. ninode and ncsize are set high (10000/5000) - does this cause problems

3. I/O rates are not very high.

4. Other 32-bit apps run faster on the 64 bit system.


Any help or suggestions are very much appreciated - it's got me stumped.

Cheers.
15 REPLIES 15
A. Clay Stephenson
Acclaimed Contributor

Re: Unexpected Performance Differences

Hi Pete:

Without seeing some data from Glance this is tough but I have two thoughts:

1) Is streampipes set to 1 on the production box and 0 on the test box?

2) Reduce the number of CPU's to 2 and try it. This is not as crazy as it sounds if the TLB's have to be cleared much more often.

My 2 cents, Clay
If it ain't broke, I can fix that.
harry d brown jr
Honored Contributor

Re: Unexpected Performance Differences

Actually, if the app is single threaded, then only ONE cpu will ever be used at any given time.

You need to use glanceplus/measureware and take a look at the content switching and possibly memory swapping. You should also recompile your apps to run faster in the 64bit world. Do you have any "strange" messages in syslog or dmesg?

Could you post your other kernel parameters?
Live Free or Die
Alan Riggs
Honored Contributor

Re: Unexpected Performance Differences

I would take a hard lok at the way data is layed out across your EMC array. I suspect you are getting poor performance due to head travel and/or disk latency.
Pete Devlin
Valued Contributor

Re: Unexpected Performance Differences

Guys,

Thanks for the replies, in answer to your questions:

Alan - I've had EMC looking at this, there are some issues with data layout, and while we have the usual rules applied in terms of database layout, because other machines are attached to the EMC there have been some less good decisions made where there are busy files from more than one box on one physical. However EMC don't think there's 'enough' of this to explain the differences, but we will do this.

Harry - although single threaded there are usually at two processes associated with the batch process running, for example
1. A database connection
2. A number cruncher
So these sit on separate CPUs (applies to both systems)

The system does not show any memory prssure - no paging appears in vmstat, and buffer reads are 99%+ writes 90%+.

Clay - Streampipes is zero on both - is this OK?
It's not a kernel parameter I'm familiar with. Glance shows the process in the run queue nearly all the time, but when waiting it's allways pipe, never PRI or whatever.

Can the transaction buffer rate/times be tweaked in any way? I also thought that the rate of the sync process could be an influence, but I may be wide of the mark?

I have attached kernel parameters from each system, K380.txt and K580.txt


I was also thinking that I could knock together a program to eliminate I/O issues by just doing some instruction in memory, and benchmark both systems, then do the same with some big I/O read/write. Does anyone have any handy bencharking tools like this?

Thanks again...
Pete Devlin
Valued Contributor

Re: Unexpected Performance Differences

Soory - messed up the attachments - previous post attachment is the K580 kernel - this one is the K380

Whooops!


A. Clay Stephenson
Acclaimed Contributor

Re: Unexpected Performance Differences

Hi Pete:

No streampipes set to 0 is good. I was hoping that the production box was set to 1. That would account for slower pipes.

As for simple I/O tests:

First let's do a write test. This should be done preferrably to raw devices. You may need to make a /dev/zero device node to supply an unlimited chunk of ASCII NUL's. Do an ls -l /dev/zero, if you have one - great; otherwise:
mknod /dev/zero c 3 0x000003 ( or simply 0x03)
chmod 444 /dev/zero

timex dd if=/dev/zero bs=8k count=12400 of=/dev/vgxx/rlvolxx

to transfer 100MB in 8K chunks

The read test is similar
timex dd if=/dev/vgxx/rlvolxx bs=8k count=12400 of=/dev/null

By the way, only 1 listing was attached.

Clay
If it ain't broke, I can fix that.
A. Clay Stephenson
Acclaimed Contributor

Re: Unexpected Performance Differences

One more thought, since I haven't seen both list of tunables. Check timeslice - it should be 10 on both boxes.
If it ain't broke, I can fix that.
A. Clay Stephenson
Acclaimed Contributor

Re: Unexpected Performance Differences

Hi Pete:

Never mind, I saw both attachments and timeslice is ok. If this is really a single threaded application and one is a number cruncher then only one processor should be pegged. Is that what you see?

You really need to install the Trial Version of Glance to nail this down.
If it ain't broke, I can fix that.
Volker Borowski
Honored Contributor

Re: Unexpected Performance Differences

Pete,

you said you "moved" the database in production.
Are you sure this went without errors ?
If it was done by export/import, check, if indexes have not been recreated, or optimizer statistics have not been recalculated.

If the new box is doing a tablescan due to a missing index or not-calculated databse statistics, this would make sense.

If the "read from database" in not too complex, it might be worth, to check out the "access path" on both boxes, to see if the database has a diffrent approach on the new box. Check out Oracle Docs on how to do an "EXPLAIN PLAN".

Another thing:
Did you actually adjust the ORACLE-Parameters in the init.ora to the new box. Do a
SELECT * from V$PARAMETER;

on both running database and compare esp.

db_block_buffers
shared_pool_size
sortarea_size

A customer of mine had a typo on a migration like this before. Instead of giving the database buffer 1GB of memory it just had 100MB.
3 people checked it, and nobody realized.
When you hunt for this, you tend to see, what you want to see :-)

Good luck
Volker
Stefan Farrelly
Honored Contributor

Re: Unexpected Performance Differences


Weve seen this exact same performance drop when moving our Oracle apps from 11/32 to 11/64bit (on same hardware). Basically its a problem with running applications designed for 32bit on a 64bit machine. There has been a lot of talk about this before here and on the HPADM lists. Basically you can expect anywhere from a 5-10% performance drop (weve experienced more on some apps!). There are a couple of solutions;

1. Recompile the application to 64bit. We managed this on one app, performance jumped up markedly. Fixed that one.

2. Move the app to a faster machine. Weve moved some to L's and N's (so from 240mhz to 440mhz) and perfromance is now much better - as you would expect. The extra cpu grunt overrides the performance loss from running 32bit app on 64bit.

3. Only run your app on 11/32bit. Weve tried this on some apps (reinstall new server from 11/64 to 11/32) and performance problems went away.

Its very very difficult to quantify why this is so with some apps. I can see youve been trying from all the above replies but Ive not yet seen someone work it out precisely (from past discussions).

Im from Palmerston North, New Zealand, but somehow ended up in London...
Pete Devlin
Valued Contributor

Re: Unexpected Performance Differences

Thanks for the replies:

Stefan - This is my big concern, that its a problem with running 32-bit apps on 64-bit OS. I would have thought that user programs would not have been affected too much, but that system ones may be. Even so, it definately looks like it may be a factor. I did stick this on an L500 attached to HP surestore running 64bit 11.00 and it ran about twice as fast (thanks to the processors). Unfortunately the L class is 'spoken for'!

Volker - Thanks for the Orcale tips - we have checked and re-checked the Oracle settings, but it is worth another look, we have used Precise Software's DBtuner product on this as well - it suggested some minor sql improvements that saved about 1 minute on the overall process - but every little helps. I have passed your stuff on to the DBAs to check.

Clay - Thanks for the I/O test ideas - I have run these and guess what - the I/O rates are much higher on the HP disks, and they complete a large read about 30% faster than the EMC ones. The EMC consultant is here today, so I will ask some questions. If this is the explanation then there's some questions to be asked of them because they have been trying to prove it off the EMC for the last week or so.
The strange thing is I'm not seeing any I/O waits in glance, its almost as if the data requests are not being made fast enough.


I wasn't very clear about the process, here goes:

Stage 1.
Two processes running - one a DB connection the other a memory sort/change. These run on concurrently, pretty much nailed to a different processor for each one.

Stage 2.
Two processes running again, one a DB connection which sits doing nothing until the other process (the crucher) finishes. The crucher is nailed to one CPU pretty much start to finish and is hitting 100% of that CPU all the time (almost), then the DB process writes some stuff to disk.

I have been running glance interactively during these runs, any tips on collecting some of the good stuff it shows?

Cheers...
A. Clay Stephenson
Acclaimed Contributor

Re: Unexpected Performance Differences

Hi Pete:

I've had one more thought. There are simply too many variables here. Why not upgrade your K380 to 64-bit and measure. You could do a make_tape_recovery of your 32-bit OS so that it is easy to revert to the old with Ignite/UX.
This would be really easy if you have a spare drive or two on which to load vg00; all the other volume groups could be used by either OS.
If it ain't broke, I can fix that.
Pete Devlin
Valued Contributor

Re: Unexpected Performance Differences

Clay - Thanks once again for you thoughts, I think you're right, there are too many factors involved and I'm never getting to compare apples with apples. Unfortunately due to this issue I've had to move the K380 into production, but I do have another K380 that I can use - I'll run some tests with this at 32-bit before the upgrade to make sure it's the same - I don't need any more variables!

By the way EMC are sure it's not an I/O problem, and I can see their point - during the read part of the process there is no I/O bottleneck on the K580, while I see some CPU wio time on the K380.

I'm considering lighting a fire under the K580 to see if that makes it run faster!

Thanks once again for the input - I will keep you posted on the outcome.
Ted M Johnson_1
Frequent Advisor

Re: Unexpected Performance Differences

How and where is the EMC connected to the 580, meaning which slot holds the I/O card(s)? Can you attach a ioscan of both the 3xx and the 5xx box?

-ted
Eugen Cocalea
Respected Contributor

Re: Unexpected Performance Differences

Hi,

Why not trying to give up getting the advantages of 64 bit and just compile both kernels (32 and 64 bit) with exactly the same parameters such in different sizes of segments and so on. Then you will see if the performance problem is hardware. It will give you an almost exact figure of the hardware performance.

E.
To Live Is To Learn