- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Performance problem
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-21-2007 10:57 AM
тАО01-21-2007 10:57 AM
I am validating our software products for HP-UX on a rx2620 system with built-in SCSI hard drives. Oracle (used by some of our products) appears unusually slow on the default VxFS (compared to what I would expect based on my experience with other platforms).
I am currently focusing on a single item in our test suite which is an insert-intensive test (almost only inserts). After each insert there is a commit (this may not be optimal, but this part of the test case) which induces a fairly large amount redo-log-writer (LGWR) activity. Still, on previously tested platforms, the ratio of log writer time to statement execution time has been about 1/4. On our HP-UX test installation, this ratio is more than 5/4. (While the LGWR may do some asynchronous background writing between commits, on a commit request it is always writing synchronously causing the application to wait a lot for the commit to return.
I started to research the cause of the slowness and I found one possible cause:
VxFS buffering causes an usually high overhead.
It appears that there some options to VxFS for turning off buffering/caching on the OS level. However, these options appear to be available only with OnLineJFS, which is not installed on our test machine. Before rushing to obtain a copy of OnLineJFS and take my chances with the directio mount options, I would like to find out whether this is really the right move or I should look for other causes.
I am not sure if they are relevant, but here are some sar outputs:
-bash-3.00$ sar -b 1 10000
HP-UX apollo B.11.23 U ia64 01/22/07
00:35:04 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
00:29:05 0 307 100 277 258 0 0 0
00:29:06 0 259 100 255 264 3 0 0
00:29:07 0 264 100 279 264 0 0 0
00:29:08 0 282 100 275 283 3 0 0
00:29:09 0 282 100 279 275 0 0 0
00:29:10 0 283 100 282 285 1 0 0
00:29:11 0 273 100 275 274 0 0 0
-bash-3.00$ sar -u 1 1000
HP-UX apollo B.11.23 U ia64 01/22/07
00:35:19 %usr %sys %wio %idle
00:35:20 18 2 39 42
00:35:21 17 1 43 38
00:35:22 17 1 41 41
00:35:23 19 4 35 43
00:35:24 13 3 38 45
00:35:25 18 4 37 42
Based on this information, does it look like using the directio options of OnLineJFS will result in a 400% (!) increase in write performance?
Is there any other diagnostic data which could help me narrow down the possible causes of this problem.
Thank you in advance.
Peter
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-21-2007 11:43 AM
тАО01-21-2007 11:43 AM
SolutionA simple test for this is to use 'dd if=/dev/zero of=$rawdisk bs=512 count=1000'
If this takes less than 2 second, then write-back caching was enabled, and the disk/controller returned succed before the data made it actually to the disk. Unless there is serious cache protection this is not appropriate for transactional systems.
As you indicate, the logwriter has to go to the disk for each commit, and in this artifical case will only do the one instet.
Batch jobs typically commit several inserts at a time, such that the overhead of the logwriter is less significant.
Typical multy user usage also allows the logwriter to effectivly commit multiple writes per IO, making it more efficient.
Single insert commit log write IOs are small. Less than the 8KB file system buffer. Some suggest that for each first time a buffer is touched this may cause system to read a buffer worth, merge in the change, write the changed buffer.
Why not take the file system out of the equation by (temporarely) going to a raw device for the redo log? You can do this just for the redo, on the fly. Just add two groups on raw devices, switch logs twice, drop the original groups, try again.
You may also be able to switch on write back cache for you local disk by jiggling the 'mode pages' for the disk. I believe that this setting is default for Sun systems. Sorry, no detail info handy just now.
Good luck,
Hein van den Heuvel
HvdH Performance Consulting
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-21-2007 11:54 AM
тАО01-21-2007 11:54 AM
Re: Performance problem
Check out the man page (1m) for scsictl
Option: -m immediate_report
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-21-2007 12:03 PM
тАО01-21-2007 12:03 PM
Re: Performance problem
-bash-3.00$ time dd if=/dev/zero of=/opt/oracle/test_raw bs=512 count=1000
1000+0 records in
1000+0 records out
real 0m0.060s
user 0m0.000s
sys 0m0.010s
/opt/oracle is a directory on the file system where the Oracle data + control files are also located.
Did I something wrong? Shouldn't be "count" much greater be?
Thanks
Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-21-2007 12:30 PM
тАО01-21-2007 12:30 PM
Re: Performance problem
Wrong test. That was using a file system.
doublecheck with IO stat, you will not see many actuall disk IO for that test.
You would have to step back from the mountpoint (/opt ?) to the lvol to the vg.
Is there any space (just a few PE's will do!) left in the volume group that you can create a fresh LV on and then use the Rvol for the test?
[btw... I think I had my terms mixed up. Write-back = write through to disk, reponmd when actually on disk.
Write-behind = Give immediate response when data is in the cache, without waiting for it to hit the disk.
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-21-2007 12:55 PM
тАО01-21-2007 12:55 PM
Re: Performance problem
You guessed right: we only tested on sata/ata disks so far which have write-caching turned on by default (and we never touched them). After disabling write-caching the test executes much slower on those machines as well.
I checked the disk on the HP-UX system with scsictl and it shows that immediate_report is off.
Thank you, again!
Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-21-2007 12:59 PM
тАО01-21-2007 12:59 PM