Re: L3000 vs N4000 performance issue

Brian Abdallah_2 · ‎06-03-2003

We have a L3000, dual 550 cpu,
2.0gb ram, VA7100 Array attached. A N4000, single 750 cpu, 1.5gb ram. External disk array, Raid 1. A oracle
customer tracking system was moved over to the L3000 recently. User's are stating that it is alot slower now than on the N Class. It takes
2-3 minutes to get a report vs
10 seconds. I am taking cpu and memory utilization stats via glance now.
Does anyone have any advice on what I should look for? What parameters to tune?
Is a dual 550 cpu better than a single 750 cpu. I thought the L3000 was a better model than the N4000 (it is only 1.5 years old) It also has more memory, 2gb vs 1.5gb.
Any other advise is much appreciated.
Thanks to all who contribute!

I am posting the real time measurements also as suggested.
Real time measurements can be collected and pasted in with your next post:

sar -d 5 5
sar -u 5 5
sar -v 5 5
swapinfo ???tam
vmstat 5 5

sar -d 5 5
HP-UX B.11.00 A 9000/800 06/03/03

13:05:20 device %busy avque r+w/s blks/s avwait avserv
13:05:25 c1t2d0 2.99 0.50 4 28 5.20 6.92
c2t2d0 1.80 0.50 4 24 5.29 6.04
c4t0d1 3.19 0.50 4 64 2.79 11.12
13:05:30 c1t2d0 0.80 0.50 1 10 3.04 8.47
c2t2d0 0.60 0.50 1 8 4.47 10.71
c4t0d1 4.00 0.50 4 67 3.83 11.62
13:05:35 c1t2d0 1.20 0.50 2 16 3.56 6.52
c2t2d0 1.00 0.50 2 16 3.57 6.16
c4t0d1 1.60 0.50 2 29 1.78 9.84
13:05:40 c1t2d0 2.20 0.50 4 22 3.99 7.06
c2t2d0 1.60 0.50 3 20 4.10 5.59
c4t0d1 3.81 0.50 4 61 4.04 11.40
13:05:45 c1t2d0 2.60 0.50 5 40 4.42 6.03
c2t2d0 2.40 0.50 5 39 4.41 6.33
c4t0d1 2.20 0.50 3 48 1.92 9.86

Average c1t2d0 1.96 0.50 3 23 4.32 6.75
Average c2t2d0 1.48 0.50 3 21 4.44 6.30
Average c4t0d1 2.96 0.50 3 54 3.07 10.95

sar -u 5 5
HP-UX B.11.00 A 9000/800 06/03/03

13:09:46 %usr %sys %wio %idle
13:09:51 1 0 3 96
13:09:56 0 0 2 98
13:10:01 3 1 3 93
13:10:06 2 0 5 93
13:10:11 1 0 2 97

Average 1 0 3 95

sar -v 5 5
HP-UX B.11.00 A 9000/800 06/03/03

13:10:33 text-sz ov proc-sz ov inod-sz ov file-sz ov
13:10:38 N/A N/A 193/664 0 1991/5000 0 1561/5010 0
13:10:43 N/A N/A 193/664 0 2007/5000 0 1563/5010 0
13:10:48 N/A N/A 192/664 0 1998/5000 0 1557/5010 0
13:10:53 N/A N/A 192/664 0 1991/5000 0 1557/5010 0
13:10:58 N/A N/A 192/664 0 1885/5000 0 1555/5010 0

swapinfo -tam

Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 1024 0 1024 0% 0 - 1 /dev/vg00/lvol2
reserve - 1024 -1024
memory 1537 1106 431 72%
total 2561 2130 431 83% - 0 -

vmstat

procs memory page faults cpu
r b w avm free re at pi po fr de sr in sy cs us sy id
3 0 0 309892 93737 12 4 0 0 4 0 0 565 561 191 1 1 98

Unix is great, when it works

Mark Greene_1 · ‎06-03-2003

What are the kernel parameters dbc_max_pct and dbc_min_pct set to? You'll want them around 20 and 5, respectively.

HTH
mark

the future will be a lot like now, only later

Michael Steele_2 · ‎06-03-2003

This is a lightly used machine with only one problem that I can see, not enough swap.

swapinfo -tam
total 2561 2130 431 83% - 0 -

Add more swap. Here is the procedure:

lvcreate -L #### -n swap -C y -r n /dev/vg##

NOTE: -L = mb

swapon -f -p 1 /dev/vg##/swap

'swapon' may not complain if maxswapchucnks is OK. If it doesn't like maxswapchuncks then it will tell you.

maxswapchunks = total swap / 1024 * swchunk

sysdef | grep -i maxswapchuncks

sysdef | grep -i swchunck

/etc/fstab
/dev/vg##/swap ... swap pri=1 0 1

Continue to monitor over time with these commands and note any dramatic changes.

Support Fatherhood - Stop Family Law

A. Clay Stephenson · ‎06-03-2003

The sar output doesn't help too much at the moment because the box is too lightly loaded. Sar isn't that great a tool anyway. I would load Glance (at least the trial version). You can then instantly spot the bottlenecks.

I would post the kmtune output.

If it ain't broke, I can fix that.

Brian Abdallah_2 · ‎06-03-2003

* Tunable parameters

STRMSGSZ 65535
bufpages 0
dbc_max_pct 10
maxdsiz 0X40000000
maxfiles 2048
maxfiles_lim 2048
maxswapchunks 4096
maxuprc ((NPROC*9)/10)
maxusers 200
maxvgs 80
msgmap (MSGTQL+2)
msgmax 32768
msgmnb 65535
msgmni (NPROC)
msgseg (MSGTQL*4)
msgssz 128
msgtql (NPROC*10)
nfile 5000
nflocks (NPROC)
ninode 5000
nproc ((MAXUSERS*3)+64)
nstrpty 60
nstrtel (MAXUSERS)
nswapdev 25
semmni (NPROC*5)
semmns (SEMMNI*2)
semmnu (NPROC-4)
semume 64
semvmx 32768
shmmax 0X40000000
shmmni 512
shmseg 32
timeslice 10
unlockable_mem (MAXUSERS*10)

Unix is great, when it works

Michael Steele_2 · ‎06-03-2003

What is dbc_min_pct?

kmtune -q dbc_min_pct

For 2GB systems I have 5 for dbc_min_pct.

Support Fatherhood - Stop Family Law

A. Clay Stephenson · ‎06-03-2003

Because nothing jumps out at me at the tunables level, I feel almost certain that you have a Oracle problem - especially given the difference in performance (if these numbers are real).

Disparages of this magnitude between machines that should differ by at most a factor or 2 almost certainly have to be in the software.

How was the Oracle instance moved? I would do identical queries on the two boxes and do a sqlexplain. I strongly suspect that you are missing indices on the new box.

If it ain't broke, I can fix that.

Mark Greene_1 · ‎06-03-2003

Does the L3000 have the same number of controllers attached to the VA7100 array as the N box did? And are they of the same or similar bandwidth?

Are you seeing any unusual or excessive errors in the syslog?

Does netstat -s return an unusual number of errors?

mark

the future will be a lot like now, only later

Brian Abdallah_2 · ‎06-04-2003

There are no errors in the sylog. There are no errors in the netstat output. The L3000 has a Tachyon FCA fibre controller connected to the VA7100 array. The N4000 has a
HP SCSI Adapter connected to a 3rd party external disk array via it's controller box.

Unix is great, when it works

Brian Abdallah_2 · ‎06-04-2003

There are no errors in the syslog. There are no errors in the netstat output. The L3000 has a Tachyon FCA fibre controller connected to the VA7100 array. The N4000 has a
HP SCSI Adapter connected to a 3rd party external disk array via it's controller box.

Unix is great, when it works

Jeff Schussele · ‎06-04-2003

Hi Brian,

Well if the L is fibre & the N is copper, that alone could account for the perf diff.
Fibre smokes copper EVERY time. You probably will NEVER reach the L perf until you start using fibre on the N.

Rgds,
Jeff

PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Brian Abdallah_2 · ‎06-04-2003

Thanks Jeff. But the performance issue is on the L.
Oracle application was moved on the weekend. Now instead of a timesheet query to take 1-2 seconds, it now takes 5-10
seconds. To save a record now takes 10-15 seconds, whereas on the other server (N Class),
it took virtually no seconds.

Unix is great, when it works

Jeff Schussele · ‎06-04-2003

Oops - sorry.
I think I confused this with an earlier post.

You should probably focus on disk I/O performance nonetheless.
Use glance for overall I/O stats & consider using fcmsutil tdX stat
where X=td instamce
to see if you're having fibre comm trouble as well.

Rgds,
Jeff

PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Angus Crome · ‎06-04-2003

How did you do the transfer? If it was a one for one swap, I would recommend a make_tape_recovery of the N-Class to the L-Class. This would effectively have made them identical.

The kernel Params should be set identically, if this is the only application on the machine. Unless you get really tune-happy, the physical hardware is functionally the same between an L and an N. The differences are the SCSI vs. Fibre (Fibre should smoke the SCSI) and the processors. I would think that under heavy load the two proc should out-perform the single, but for a single query test, the 750 should return info faster.

If you don't find any kernel differences (use sysdef to get a quick look at both machines), then I would look at the data layout on the VA. Striping issues, mirroring issues, blown drive in a RAID-5 (if it supports RAID-5).

Someone above mentioned possible missing indeces. This could severely curtail your performance.

Hope something here helps shine some light.

There are 10 types of people in the world, those who understand binary and those who don't - Author Unknown

Mark Greene_1 · ‎06-05-2003

How was the data laid out on the VA7100? What RAID types and how many drives?

Does fcmsutil /dev/td0 nsstat have numbers other than zeroes for the error checks?

mark

the future will be a lot like now, only later

Brian Abdallah_2 · ‎06-05-2003

A thought on mincache=direct.
In checking the user forums and talking to a colleague about the N Class and another unix server server (and she confirmed this with HP also). The mincache=direct should be set in Oracle on
Datafiles and Indices and not set on archive logs, redo logs, app, oraexport and backup, providing the block sizes match, which they do on the L Class (slow server)
We might want to try this on the L Class.
Note: The N Class has no mincache=direct setting on any of the oracle mount points.
Comments please?

Unix is great, when it works

Jeff Schussele · ‎06-05-2003

Hi Brian,

Depends on...

1) How the Oracle SGA is setup - If Oracle has a fairly large SGA that will do it's own buffering, then YES use mincache/convosync=direct

AND

2) The OS ver - On 11.0 it seemed to help IF the SGA was doing Oracle buffering. On an up-to-date patched 11.0 AND on 11i the difference is much less noticeable.

Consult with your DBA. If the SGA is small then I'd let the OS use the buffer cache. Caches *always* beat direct writes by a mile.

My 2 cents,
Jeff

PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Brian Abdallah_2 · ‎06-05-2003

The data is using auto raid.
8 drives with 1 hot spare.
The fsmsutil did not show any
error messages.

Unix is great, when it works

A. Clay Stephenson · ‎06-05-2003

Okay, use rw,mincache=direct,convosync=directr,nodatainlog,delaylog for datafiles and indices

and use
rw,delaylog,datainlog

for everything else Oracle.

About the biggest improvement I've seen was about 1.2x (on 10.20); under 11.0 about 1.1x
- this ain't gonna fix you.

Note that you have taken a 12x-18x performance hit. That's almost certainly got be software - SQL code - lack of critical indices.

A performance hit of about 5x - 7x MIGHT be seen going from RAID 0 to RAID 5 but even that disparity is too small to account for your problems.

If I assume absolutely terrible tuning I'll give that a 2x hit and if we multiply that by a 7x RAID 0 to RAID 5 hit, we are now in the 14x realm - which is about where your problems lie. I give this scenario a very low probabilty and thus I come back to SQL.

I would do this test and maybe we can get the hardware/tuning side of this out of the picture.

timex dd if=/dev/zero bs=64k count=1000 of=/u01/dummy.

(You can play with the bs and the size of the file but the idea is to get an idea of how fast these operations are done. You can also try using raw device nodes or convosync=direct,mincache=direct but they are going to be slower.)

If these transfer rates are far below what is expected then look to the hardware but otherwise start looking at Oracle.

You really, really need to get Glance on this box and look at things while they are bad.

If it ain't broke, I can fix that.

Brian Abdallah_2 · ‎06-05-2003

Thanks Clay. I have Glance on this box. We did monitoring via the VA7100 Performance indicators and concluded that it was not an I/O problem.

Unix is great, when it works

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: L3000 vs N4000 performance issue

L3000 vs N4000 performance issue