Re: OS failure message in Oracle

John Jimenez · ‎11-14-2007

Can anyone interpit what this means? It is a Memory issue that came from Oracle. I cannot find any issues in any of my HPUX 11.11 logs. but Oracle says it is an OS issue so I am trying to resolve

Errors in file /opt/oracle/admin/MXPROD/udump/mxprod1_ora_655.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27544: Failed to map memory region for export
ORA-27300: OS system dependent operation:socket failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: sskgxpcre1
Wed Nov 14 09:43:54 2007

Hustle Makes things happen

Michael Mike Reaser · ‎11-14-2007

The cynic in me wants to point out that Oracle will **ALWAYS** claim it's an OS issue. Of course, can they tell you *what* that "OS issue" might be?

No. All that matters is that they have now proclaimed that it's **NOT** an "Oracle issue".

There's no place like 127.0.0.1

HP-Server-Literate since 1979

Murat SULUHAN · ‎11-14-2007

Hi John

ORA-00603: ORACLE server session terminated by fatal error

ORA-00603 problem is Internal Oracle error so you should consult to Oracle support. They may give you specific or general patch

Best Regards
Murat

Murat Suluhan

A. Clay Stephenson · ‎11-14-2007

If the error number is 11 (EAGAIN) then that suggests that the system-wide number of processes limit (NPROC) or the per-user number of processes (MAXUPRC) has been reached. Examine those tunables and see if they need increasing. (I'm leaning towards maxuprc.)

The bad news is that '11' could also be a signal number and signal 11 is SIGSEGV is a segmentation violation which means that a process was trying to use memory it wasn't supposed to --- which almost allways means bad code --- which could be Oracle itself or a shared library.

However, because I see 'Resource temporaily unavailable' that strongs suggests that this '11' is indeed EAGAIN. Running the process under tusc would allow you to see exactly which system call is failing and thus know the tunable to set.

If it ain't broke, I can fix that.

John Jimenez · ‎11-14-2007

He is having listener issues

Here is the long story. Last week I created 2 50 gig LUNS on the SAN for the Oracle Admin to use for (we have 2 RP7420 using RAC). He ran into some issues adding this on to data002 it just hung. In the past I have noticed that the usually created a new data or fra when I gave them disk space, not add them to a current one. So I think it was a Oracle procedural reason for the failure. He canceled it and it created a zombie on the O.S. An hour later everyone locked up. We rebooted the Server with the zombie, but ran into a 2nd issues 2 months when I created 2 other 50 gig luns which they used for data2, they did not put these 2 groups in the start up script, so it did not mount. He fixed it, but to make sure we rebooted both servers one at a time last night. This morning at 9:00 the listner went down and people were not able to connect again. One thing that has always confused me is that the listener uses port 1521, but I never had this in /etc/services file

Hustle Makes things happen

John Jimenez · ‎11-14-2007

Current info from Glance. One thing that is weird on this system is that everone comes in from the application that is on Windows as the same Login, if I do a "who I have one entry for MAXUX but there is actually 150 people using it.
Here is the glance output, nothing seems bottlenect system wide, but maybe since it thinks there is only one user some other kernal is maxed out.

SYSTEM TABLES REPORT Users= 5

System Table Available Used Utilization High(%)
--------------------------------------------------------------------------------
Proc Table (nproc) 4200 736 18 18
File Table (nfile) 65536 7077 11 11
Shared Mem Table (shmmni) 512 10 2 2
Message Table (msgmni) 4200 2 0 0
Semaphore Table (semmni) 4096 25 1 1
File Locks (nflocks) 4200 7 0 0
Pseudo Terminals (npty) 60 0 0 0
Buffer Headers (nbuf) na 215109 na na

Hustle Makes things happen

John Jimenez · ‎11-14-2007

Just to correct the "long story" Last week I created the LUN, Yesterday is when the oracle admin trying to add it and we started having issues.

Hustle Makes things happen

John Jimenez · ‎11-14-2007

Murat, Funny you should mention patches. Last month my O.S. Patches were 12 months behind, but are now caught up. The patches on Oracle are way behind and are going to be scheduled soon.

Hustle Makes things happen

John Jimenez · ‎11-14-2007

FYI swapinfo

# swapinfo -tam
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 8192 4 8188 0% 0 - 1 /dev/vg00/lvol2
reserve - 8188 -8188
memory 16357 13741 2616 84%
total 24549 21933 2616 89% - 0 -

Hustle Makes things happen

John Jimenez · ‎11-14-2007

Another correction...sorry the 2 servers are on 11.23 not 11.11.
Clay, I think you were on to something on the NPROC, because everone comes in with user MAXMC, NPROC = 4200 and maxuprc = 3700 but glance shows this to only be at 18%. If you have any other idea's can you let me know? thank you

Hustle Makes things happen

John Jimenez · ‎11-14-2007

Clay,
The Oracle admin found this. It looks like this might be the issues. I currently have MAX-ASYNC set to 200

Symptoms
The following messages in the alert log file and thereafter the instance crashed.

ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:socket failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: sskgxpcre1
Cause
This error means that there was an socket failure happened at the OS level while we try to communicate with
the remote node.

This issue can be caused if the MAX_ASYNC_PORTS has low value as it required to create a socket for a
useof the private interconnect when required.

MAX_ASYNC_PORTS specifies the system-wide maximum number of ports to the asynchronous disk I/O driver
that processes can have open at any given time. The default value for this parameter is 50.

http://docs.hp.com/en/939/KCParms/KCparam.MaxAsyncPorts.html
Solution
1. Set MAX_ASYNC_PORTS to a value high enough
2. Deactivate async io at os level.

Hustle Makes things happen

Volker Borowski · ‎11-14-2007

Hi,

well you can start and stop the listener without having the database up and running, and the port ist set in listener.ora no matter what is in /etc/services.

And normaly you can start and stop the database with or without a listener being up and running.

If your listener died at 09:00 and your database died at 09:43:54, this suggests, somewhere is a memory leak.
So if you are lucky, the oracle patches may help. (btw from where to where do you want to patch?)

How is the dba trying to address the space you provide? Extending a tablespace with a new file ? in Filesystem or RAW ?

You have some alert.log stuff around that timestamp where the extension starts and where it "hangs" that you can attach?

Good hunting
Volker

John Jimenez · ‎11-14-2007

Thanks for the info Volker.

I feel all three issues we had were for different reasons. The database issues is resolved, but not we are having trouble keeping the listener up.
-Addressing the space? I do not know Oracle, but in the past the dba would create a new name FRA000 then FRA001 or DATA0000 then DATA001 then DATA002. But yesterday he tried to add it on to DATA0002. But because we had issues he will create a new DATA0003 instead.... but because of the issues will probably do this next week.
-Raw or file system? Everything having to do with RAC is using RAW, including the new LUNS. I change the permissions to one of the paths in /dev/rdsk and then the Oracle DBA sets it up.
-alert.log stuff? That just it I do not see any issues on the O.S. The error that I posted above was from the Oracle logs.

But I think this morning issue might had to do with MAX_ASYNC, because maybe there was garbage left over from yesterdays issue. I looked at some threads but have not been able to measure MAX_ASYNC. I opened up a new thread on MAX_ASYNC in the HP admin forum. I have not gotten an answer, so I just opened up work order with HP support too.

Hustle Makes things happen

John Jimenez · ‎11-14-2007

shmmax is full on both systems.
shmmax 5368709120 / 5368709120

Hustle Makes things happen

Ariel Cary · ‎11-15-2007

Hi John,

I think your reply on 'Nov 14, 2007 18:50:09 GMT' can lead to an answer.

Short answer: increase your swap space.

Longer one:
clearly, there is a lack of some OS resource(s):

ORA-27300: OS system dependent operation:socket failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: sskgxpcre1

when a process in HPUX is created or request more memory, the same amount of physical memory allocated is reserved in the swap area (such that it can be guaranteed to be used when the actual swapping happens); this is HPUX's behavior. If there is no swap space to reserve from, Oracle will fail to create processes, which could happen to user sessions, listener process, etc. or any other process that wishes to allocate more memory.

Output from swapinfo shows you have reserved all 8GB of total swap you have in your 16GB physical memory box. Thus, you must have at least 16GB of swap, even if you're not using it altogether.

HTH,

-Ariel

Yogeeraj_1 · ‎11-15-2007

hi,

Waht version of Oracle are you running?

In oracle 10g, you have the Enterprise Manager Database control that provides will some online performance diagnostics as well as some more graphs on Host performance, Instance Disk IO and Instance Throughput.

You can get some indications there too.

e.g. Stats concerning the host performance:
Current Memory Page Scan Rate(pages/s) 0.3
Current Swap Utilization (%) 0.66

Paging Activity
Active Pages 74248522
Inactive Dirty Pages 51588930
Inactvie Clean Pages 0
Pages Paged-out (per second) 276.52
Pages Paged-in (per second) 5883.95
Pages Scanned by Page Stealing Daemon (per second) 21.28

please check
kind regards
yogeeraj

No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)

John Jimenez · ‎11-16-2007

Yes,

we have 10g. I will get with the Oracle Admin and see what he sees and I will post it.

Hustle Makes things happen

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: OS failure message in Oracle

OS failure message in Oracle