- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: System CPU skyrockets until oracle shutdown ne...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-11-2010 10:19 AM
тАО08-11-2010 10:19 AM
System CPU skyrockets until oracle shutdown needed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-11-2010 10:37 AM
тАО08-11-2010 10:37 AM
Re: System CPU skyrockets until oracle shutdown needed.
What was the actual CPU usage at the time?
What did 'swapinfo -tam' show?
How was the storage behaving? Any problems with lots of WIO for processes?
Were there any large reports running? Had anything changed, other than the new DB version? Are all indexes valid? Perhaps some job was trying to do a full table scan of a very large DB table?
Lots of questions and few answers at this point.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-11-2010 12:12 PM
тАО08-11-2010 12:12 PM
Re: System CPU skyrockets until oracle shutdown needed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-11-2010 12:53 PM
тАО08-11-2010 12:53 PM
Re: System CPU skyrockets until oracle shutdown needed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-11-2010 01:18 PM
тАО08-11-2010 01:18 PM
Re: System CPU skyrockets until oracle shutdown needed.
When you went to 10g, was the size of anything key increased or created? New pools? More space in the buffer cache? Change for the Auto PGA aggregate target? Was the number of either I/O slaves or DB writers kept consistent? What about redo log interleaving? Redo log buffer size?
sort_area_size for each connection (pga param)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-11-2010 02:12 PM
тАО08-11-2010 02:12 PM
Re: System CPU skyrockets until oracle shutdown needed.
It seems like a lock issue of some sort. I have had lots of apps locks before - but that causes concurrent processes to pile up. The load increases but not so dramatically. I am attaching graph of system load, etc for the morning for the 5 hours before, up to the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-11-2010 03:10 PM
тАО08-11-2010 03:10 PM
Re: System CPU skyrockets until oracle shutdown needed.
Reason stated was that the Oracle DB can only keep 7 segments active at the same time, you'll become queued up after that, wherein the system starts picking which segment it can address, and which ones it can't.
I like to make sure that my shmmax is at least 1/5 of the SGA size to keep me out of trouble, adding a bit of fudge to stay away from 1/7 if things grow a bit over time (and they always do somehow).
Re: statspack. You don't need a snapshot from before the upgrade, you need snapshots while things are good, and as they decay, and when everything is bad. After you bring the server back up, then you can run your reports and see what's killing you. You can also see how much cpu time you occupied, how much disk I/O time, etc. The report will also point out to you the top running sql statements and how much cpu, I/O, parse, executions,etc time each one took. By comparing over time, you can see if something is trending. If the top sql remains the top sql (and it should), but the waits on cpu have gone up, or the the waits on I/O are going up, it gives you something to look for.
Watch also your buffer cache hit ratios, your shared pool misses.
If the sql statements change totally per hour, then maybe these newer sql statement have been "detuned" with a bad execution plan from the 10g upgrade. I saw quite a bit of this very thing, especially in the Oracle workflows, which was so critical it tended to slow a lot of things down. It required hinting of the code pieces to use indexes the way I wanted them to, and even in some cases, where tables changed much more rapidly than stats could be gathered on them (complete fills, then almost complete deletion of rows), we just switched them to rule based via hinting. In other cases we created a new index that helped things along and optimizer chose it. The reason was that the old plan, while not perfect, ran fine sans the new index, the new plans poorly, but all we had to do was create an index that the optimizer liked, and it chose it.
The number of occurrences that this happened was small (10 maybe), but some of them were very critical.
What about lock waits? If you're seeing that, then it could be even just the few sql statements that need tuning have some key blocks so busy that it makes everything else wait. And the queue builds up.
I've seen cases where someone creates a new index to help "tune" a problem, and they end up massively slowing everything down. Reason: they created an index that needs to be updated, but left the initrans at default level 2, and you had 15 updates at a time trying to hit portions of that table. Lifting the initrans of the new index and reviewing the initrans of the rest of the related tables, and their indexes freed up the enqueued lock waits.
I've also seen things like this that points not to ONE thing, but points out the need to tune lots of things.
Since you're queueing up, I'd be reviewing statspack for sql code that is running longer, and in the top list of waits on cpu, for starters, after that, go for top sql statements by number of executions, because 1 statement running very quickly but millions of times per hour can eat up a lot of cpu. I've seen this when folks that are used to object oriented programming push code down to the lowest level possible, many times re-running code over and over again that's already been fully resolved or just resolvable with the same query at a much higher level in the code. For instance, getting let's say, inventory_organization_id of a collection of data, for each node of data. Even though that data is relevant and fully defined at the whole collection level. Now, suppose suddenly the thing runs a little slower, and it was never noticed before, and now, reexecuting this called piece of code that creates a whole new cursor instance, and does a whole new fetch, returns the thing on a stack each and every time for a million nodes. Suddenly your little piece of code that used to cost "1" or "2" now costs "only 8" but now it and a small pile of its poorly performing friends are collectively killing the server.
Also, look for long-running (real plain ol' clock time) queries, and see if they now have a) bad execution plans and/or b) are now sorting on disk instead of in internal swap space.
As for context switching you can see it in glance and perfview, "sar -w
You can also determine context switching on a long running process with a piece from the following web page: http://christianbilien.wordpress.com/2007/04/
An Excerpt: (uses the os pid number of the Oracle shadow process as ospid, assuming a client/server setup and not a local or bequeathed connection).
----------------------------------
Oradebug helps diagnosing the LGWR forced and volontary switches:
SQL> oradebug setospid 2022
Oracle pid: 6, Unix process pid: 2022, image: oracle@mymachine12 (LGWR)
SQL> oradebug unlimit
Statement processed.
SQL> oradebug procstat
Statement processed.
SQL> oradebug tracefile_name
/oracle/product/10.2.0/admin/MYDB/bdump/mydb_lgwr_2022.trc
SQL> !more /oracle/product/10.2.0/admin/MYDB/bdump/mydb_lgwr_2022.trc
├в ┬ж
Voluntary context switches = 272
Involuntary context switches = 167
----------------------------
Hope some of this helps, let us know what y'all are seeing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-24-2010 08:39 AM
тАО08-24-2010 08:39 AM
Re: System CPU skyrockets until oracle shutdown needed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-30-2010 06:01 AM
тАО08-30-2010 06:01 AM