- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- pstat_getprocvm unusual behaviour
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-01-2006 04:01 AM
тАО06-01-2006 04:01 AM
I've encountered today a very strange issue:
after an Oracle 9.2.0.7 issue (a lot of Oracle processes eating 100% CPU time in SYS mode) and after stopping/cleaning them we have the following unusual behaviour:
the lsof utility hangs forever when instructed to show all system processes but lsof -u ^oracleuser return almost immediately.
After some research we have found that lsof spent most of his time in pstat_getprocvm syscall so we have compiled a small program which calls that function; timex shows a six times more time spent in sys mode for an oracle process comparing with some other process and the behaviour is quite sistematic.
Have anyone seen something similar? Any hint about this?
The OS is HP-UX 11iv2 with September 2005 patch bundle installed.
Solved! Go to Solution.
- Tags:
- pstat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-01-2006 04:56 AM
тАО06-01-2006 04:56 AM
SolutionIf it is consistent - then it sounds more like you've got contention getting process-wide VM locks... likely because that same process is doing a lot of virtual address space modfication requests at the same time (mmap, brk, sbrk, malloc, fork, etc.). For example, if Oracle was eating 100% CPU because it was stuck in a shmat/shmdt loop, you'd see this sort of behavior.
If it isn't consistent (i.e. you always seem to get "stuck" on a particular region type or most especially on a constant offset where that offset is in Shared address space and mapped to all Oracle processes) then you're likely getting contention on sub-object locking because there are I/O operations in flight which are either being issued like crazy or timing out/retrying a lot.
If in fact any single call to pstat_getprocvm() isn't taking all that long -- your total time per process is what's 6 times larger.. I have to ask if the Oracle processes have 6 times as many process regions (you get 1 per call... 5x more regions == 5x more calls if you want to visit them all).
Can you post your program that you used and the timing data generated?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-01-2006 07:40 AM
тАО06-01-2006 07:40 AM
Re: pstat_getprocvm unusual behaviour
I have investigated a little bit more and I have found that the system call is taking a large ammount of time only when executed against the shared memory segment of the instance (we have configured shmmax large enough to accomodate Oracle SGA in only one shared memory segment based on Oracle recommandations). The behaviour is consistent with one of the possibilities you have mentioned. Moreover, it does not depend on the numbers of memory regions the process have (for a python program with more than 100 regions the small program which uses pstat_getprocvm returns much quickly than an oracle process with 40 regions).
Now some numbers: the SGA has almost 16GB of RAM, the number of oracle processes is somewhere between 1000 and 1700, the system running the workload is a SD partition with 24 CPU/32GB total RAM and 8GB of RAM configured as CLM. The system consumes almost 4GB of RAM.
Any recommandation will be highly appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-01-2006 08:13 PM
тАО06-01-2006 08:13 PM
Re: pstat_getprocvm unusual behaviour
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-02-2006 01:14 AM
тАО06-02-2006 01:14 AM
Re: pstat_getprocvm unusual behaviour
Being only on the SGA limits the possibilities - you aren't having locking issues (the locks acquired in this path don't care which virtual memory object in the process is being looked at, if you had contention here - you'd see contention on non-SGA objects within the Oracle process as well).
Note that /dev/kmem utilizing tools are completely different paths -- they don't worry about locks at all (you can get garbage running on a live system sometimes because of this) and they're not always reporting the same things. pstat has to be a good kernel citizen, kmeminfo is reading the raw data and doesn't have to be as polite.
Also (see below), I don't think kmeminfo generates per-page statistics for process scans hence it won't hit what I believe is your scaling problem here. If you have vpsinfo, I would expect it to take longer to process your Oracle processes than others on the system for much the same reason pstat does.
In any event -- the only path that makes any sense for your slowdown is the generation of the page size statistics. That path is large page aware (thank goodness), so this implies to me that your 16Gb SGA is backed by small page sizes. The statistics generation is perforce a linearly scaling algorithm -- so it costs time in proportion to the number of unique pages present in the object. Large page usage == fewer pages, so you'd get faster time. For all the smaller virtual objects on your system (and I think it is safe to assume you don't have all that many more 16Gb objects) there are correspondingly a lot less pages.. hence you take less time.
You've got a 16Gb SGA -- is your Oracle binary chatr'ed with a large data size hint? What are your vps_* tunables set to -- especially vps_chatr_ceiling? You really should try for 4Gb large pages with a 16Gb SGA for performance (reduces TLB miss rates)... you may not get them, especially if you start Oracle when the system has little free memory to begin with -- or if Oracle is using IPC_MEM_FIRST_TOUCH to get CLM within the SGA and your CLM doesn't have 4Gb in single pages left -- but you definately want as large of pages as possible.
If you are configured for large pages, are getting large pages and still see this slowdown then I would expect the SGA to have holes [where the system hasn't yet created memory because Oracle never referenced that virtual address]. Untranslated virtual pages are equivalent to 4k pages in the scanning method (I don't want to delve into a discussion of alternate scanning methods and the tradeoffs made here -- it would get way too internal in nature). I have to confess that I don't expect this to be the case since Oracle usually locks the SGA in memory (either for async I/O purposes if you've configured for async or just for performance - which I thought was the default)
In summary:
What's the page size information you get back from pstat for the SGA, your chatr settings on Oracle and your vps_* tunable settings?
If you have vpsinfo, that output would be handy as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-02-2006 02:01 AM
тАО06-02-2006 02:01 AM
Re: pstat_getprocvm unusual behaviour
I have attached bellow the required information. Thank you very much for all the insights you gave me.
The page size hint for shared memory segment is 1024MB (as seen from gpm).
The distribution of different size pages is as follows:
4KB: 2
16KB: 1
64KB: 0
256KB: 184
1MB: 791
4MB: 167
16MB: 11
64MB: 8
256MB: 2
1GB: 13
# chatr /oravapp/product/9.2.0.1/bin/oracle
/oravapp/product/9.2.0.1/bin/oracle:
64-bit ELF executable
shared library dynamic path search:
LD_LIBRARY_PATH enabled first
SHLIB_PATH enabled second
embedded path enabled third /oravapp/product/9.2.0.1/rdbms/lib/:/oravapp/product/9.2.0.1/lib/:/usr/lib/pa20_64:/opt/langtools/lib/pa20_64:
shared library list:
libodm9.sl
libskgxn9.sl
libjox9.sl
libcl.2
librt.2
libpthread.1
libnss_dns.1
libdl.1
libm.2
libc.2
shared library binding:
deferred
global hash table disabled
global hash table size 1103
shared library mapped private disabled
shared library segment merging disabled
shared vtable support disabled
explicit unloading disabled
segments:
index type address flags size
6 text 4000000000000000 z-r-c- 64M
7 data 8000000100000000 ---m-- L (largest possible)
executable from stack: D (default)
static branch prediction enabled
kernel assisted branch prediction enabled
lazy swap allocation for dynamic segments disabled
nulptr references disabled
# kctune -q vps_ceiling
Tunable Value Expression
vps_ceiling 64 64
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-02-2006 02:03 AM
тАО06-02-2006 02:03 AM
Re: pstat_getprocvm unusual behaviour
I forgot to mention vps_chatr_ceiling:
# kctune -q vps_chatr_ceiling
Tunable Value Expression
vps_chatr_ceiling 1048576 Default