- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Very high load peaks on a DL740 with RH AS 3
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-03-2004 02:38 AM
тАО12-03-2004 02:38 AM
Very high load peaks on a DL740 with RH AS 3
I'm experiencing very high (even 40-50!) load peaks, during only a few of seconds.
During these peaks, user cpu is usually low but system cpu is ~80-99%.
System is a DL740 with 8 CPUs and 32 GB of RAM with Linux Red Hat Advanced Server 3.0 (kernel 2.4.21-9.0.3.ELhugemem), connected by a 2GBps SAN to an EVA3000 disk-array.
It is an Oracle-only machine, no other significative processes are running.
I tuned it using usual parameters from Oracle (eventually I can attach a kernel parameters list).
It seems that there is no exceptional I/O or paging/swap activity during peaks but I'm not sure (I'm using dstat - http://dag.wieers.com/home-made/dstat/ - to monitor several metrics at same time).
How can I identify what system does when load is high?
Any help will be appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-03-2004 02:47 AM
тАО12-03-2004 02:47 AM
Re: Very high load peaks on a DL740 with RH AS 3
top
ps -ef |more
sar -d 1 100
vmstat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-03-2004 02:50 AM
тАО12-03-2004 02:50 AM
Re: Very high load peaks on a DL740 with RH AS 3
I'm using dstat because it summarizes results from all these utilities and I can look at load, cpu, I/O, interrupt, context-switches, paging and swapping at same moment.
But I'm unable to understand what system does during peaks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-04-2004 06:57 PM
тАО12-04-2004 06:57 PM
Re: Very high load peaks on a DL740 with RH AS 3
-do you have some cron jobs?
- do you have some scheduled tasks in Oracle?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-05-2004 08:20 AM
тАО12-05-2004 08:20 AM
Re: Very high load peaks on a DL740 with RH AS 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-05-2004 09:17 AM
тАО12-05-2004 09:17 AM
Re: Very high load peaks on a DL740 with RH AS 3
-----------------------------------------------
The explanation is somewhat involved...
If you trace cma*d you'll find that it doesn't do anything but open the device, ioctl, close. Admittedly rather more times than should be necessary but that's just incidental bad design.
You'll find the delay - and system time consumption - seems to happen on the close. From here you need a fairly good working knowledge of the Linux kernel...
Ok? still with me then?
Run oprofile for a while and you'll find the cpu time is being consumed by invalidate_bdev. Which is interesting :-).
Invalidate_bdev is called from kill_bdev. Kill_bdev is called from the block device release code. Release is what happens on last close. Now the monitoring daemon is opening the unpartitioned disk device which it is pretty certain nothing else has open. (Off hand I'm not sure if even having an fs on the device counts as it being open. There are subtle differences and I *think* I'm right in saying that block device access and fs access is considered different at this level. Don't quote me or blame me!)
So, each close triggers invalidate_bdev. Why is this so bad? Well, the idea is that when the last close happens on a device you need to flush any cached data because, with much PC HW, you can't be sure when the media gets changed. Invalidate_bdev isn't *meant* to be called often. It works by scanning through the entire list of cached data for block devices to find and drop data related to the device being closed. So it sucks system time and the amount is proportional to the amount of cached (from any device) data you have.
WORKAROUND:
All you need to do is to make sure that each time the cma*d daemon closes the device it isn't the *last* close - i.e. some other process has the device open. The other process doesn't even need to *do* anything. Try something along the lines of:
sh -c 'kill -STOP $$' < /dev/cciss/c0d0 > /dev/null 2>&1 &
Hope that's all clear! (As mud... :-) )
(HP: As well as blind debugging I do Linux & OSS consultancy. I happen to know the answer to this one as it came up at a major investment bank...)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-05-2004 07:48 PM
тАО12-05-2004 07:48 PM
Re: Very high load peaks on a DL740 with RH AS 3
I know that blind debugging is not so easy but anyway I'm sure that on this forums there are a lot of smart people that can giv eme some advice.
Vitaly: I have no cron jobs and Oracle scheduled tasks.
Mike: I'm beginning to hate Linux VM! It caches a lot of file data and fills 32 GB even if you run 'ls'! It is the 'if you buyed RAM, use it' policy but I don't like it.
I don't run any HP utility. Can you give me some details?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-05-2004 10:08 PM
тАО12-05-2004 10:08 PM
Re: Very high load peaks on a DL740 with RH AS 3
what is your backup police ? Does the peak happening around your backup time ?
Are you running rman ? rman is a very cpu consumming task.
It's possible to you start top right in the moment of peak time to see what is the major consuming task ?
regards,
Xyko
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-05-2004 10:16 PM
тАО12-05-2004 10:16 PM
Re: Very high load peaks on a DL740 with RH AS 3
I'm not excluding that Oracle is without guilt: sometime I see 10-15 oracle processes running for some seconds, bringing load to 8-15 and then disappearing.
But is it normal that some workload bring a big system like this to this peaks?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-05-2004 10:18 PM
тАО12-05-2004 10:18 PM
Re: Very high load peaks on a DL740 with RH AS 3
catch process which eats CPU/RAM.
which driver do you use for storage?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-05-2004 10:22 PM
тАО12-05-2004 10:22 PM
Re: Very high load peaks on a DL740 with RH AS 3
sometime, processes run only a few seconds.
I use driver 7.00.03 (from HP): I'm planning to upgrade both kernel and driver to latest versions (2.4.21-20 and 7.01.01, respectly).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-05-2004 10:30 PM
тАО12-05-2004 10:30 PM
Re: Very high load peaks on a DL740 with RH AS 3
you have a situation that needs deep inspection. Some time ago I suggest acct to a problem and I think it may help you also.
Please look to my last reply in http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=713322
Hope it helps you.
Regards,
Xyko
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-05-2004 11:52 PM
тАО12-05-2004 11:52 PM
Re: Very high load peaks on a DL740 with RH AS 3
Then you need to know what's going on. Install oprofile (get it from oprofile.sf.net and build it if necessary). Run it for a while and then use opreport to examine the suspect processes and see what bits of the kernel are getting hammered.
I still suspect a linear scan of buffer heads for some reason. I'm not that familiar with Oracle set up but you should be mounting filesystems it uses with the noatime option and you should have adjusted the bdflush sysctl values to spread I/O out rather than trying to do it in bursts?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-06-2004 12:35 AM
тАО12-06-2004 12:35 AM
Re: Very high load peaks on a DL740 with RH AS 3
now I'm running acct (installed by default in RH) and I'll give a look at oprofile too.
Now I'm seeing lot of Oracle processes in D state, only for a few of seconds, bumping load even to 15; what is D state?
Anyway, I'm currently using following (default) values for bdflush:
50 500 0 0 500 3000 80 50 0
but I tried also:
60 2000 0 0 500 3000 87 50 0
Do you really suggest disabling atime update for Oracle filesystem? Can you point me to an 'official' reference for this?
I'm beginning to think that peraphs a 32bit architecture is very inefficient with a lot of RAM :-(
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-06-2004 05:48 AM
тАО12-06-2004 05:48 AM
Re: Very high load peaks on a DL740 with RH AS 3
D state is uninterruptible sleep. 99% of the time that means waiting for disk I/O of some description.
Practically nothing uses atime so it's generally the first to go on loaded filesystems. Most fs' put the inode table at one end of the disk, away from the data so updating atime tends to encourage head movement for no reason.
And, yeah, large amounts of memory on a 32bit system is not great for performance. For one thing the address extension from the cpu side is something of a hack. For another, if you don't have 64bit PCI with a 64bit PCI controller and drivers that know about 64bit capable PCI, *every* I/O will involve copying to/from bounce buffers in the lowest 4GB...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-08-2004 07:56 PM
тАО12-08-2004 07:56 PM
Re: Very high load peaks on a DL740 with RH AS 3
It was a known issue with my kernel and Oracle version.
Oracle doc 262004.1:
- Much higher system time both while running and during connect/disconnect.
The problem gets worse as more users concurrently connect and disconnect.
The high system time can cause system instability depending on what's
running on the machine. If you are facing this issue you should:
1. updating to RHEL 3 U3
2. export DISABLE_MAP_LOCK=1 (set this so that oracle and the listener
inherit this) 3. Install the patch 3596858 (available for 9.2.0.4 & 9.2.0.5,
fixed on 10g
Also:
bug 3570979:
PERFORMANCE PROBLEM (HIGH SYS TIME) WHEN USING REMAP_FILE_PAGES() ON RAMFS
Bye
Domenico Viggiani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-08-2004 09:55 PM
тАО12-08-2004 09:55 PM
Re: Very high load peaks on a DL740 with RH AS 3
and thank's for posting the solution.
Regards,
Xyko