Re: SGLX 12.80 - not resistant to file system overflow /tmp

yilmazaydin · ‎03-29-2023

Hello, we are faced with unplanned package stops during overflow of the temporary directory /tmp.
Oracle database monitoring failed to complete its task:

Mar 29 15:57:50 root@sglx_node1 tkit_module.sh[6549]: Retrying 3 more time(s) before giving up.
/opt/cmcluster/oracletoolkit/hagetdbstatus.sh: line 33: cannot create temp file for here-document: No space left on device
Mar 29 15:57:54 root@sglx_node1 tkit_module.sh[6549]: Retrying 2 more time(s) before giving up.
/opt/cmcluster/oracletoolkit/halistener.mon: line 50: cannot create temp file for here-document: No space left on device
Mar 29 15:57:57 root@sglx_node1 tkit_module.sh[6544]: Oracle Listener unisvfe failure detected.
Mar 29 15:57:57 root@sglx_node1 tkit_module.sh[6544]: Oracle Listener unisvfe failed
/opt/cmcluster/oracletoolkit/hagetdbstatus.sh: line 33: cannot create temp file for here-document: No space left on device
Mar 29 15:57:58 root@sglx_node1 tkit_module.sh[6549]: Retrying 1 more time(s) before giving up.
Mar 29 15:58:00 root@sglx_node1 tkit_module.sh[6544]: All listeners have failedMar 29 15:57:50 root@sglx_node1 tkit_module.sh[6549]: Retrying 3 more time(s) before giving up.
/opt/cmcluster/oracletoolkit/hagetdbstatus.sh: line 33: cannot create temp file for here-document: No space left on device
Mar 29 15:57:54 root@sglx_node1 tkit_module.sh[6549]: Retrying 2 more time(s) before giving up.
/opt/cmcluster/oracletoolkit/halistener.mon: line 50: cannot create temp file for here-document: No space left on device
Mar 29 15:57:57 root@sglx_node1 tkit_module.sh[6544]: Oracle Listener unisvfe failure detected.
Mar 29 15:57:57 root@sglx_node1 tkit_module.sh[6544]: Oracle Listener unisvfe failed
/opt/cmcluster/oracletoolkit/hagetdbstatus.sh: line 33: cannot create temp file for here-document: No space left on device
Mar 29 15:57:58 root@sglx_node1 tkit_module.sh[6549]: Retrying 1 more time(s) before giving up.
Mar 29 15:58:00 root@sglx_node1 tkit_module.sh[6544]: All listeners have failed

I checked the hagetdbstatus script.sh - it uses the following script construction:

/usr/local/cmcluster/oracletoolkit/hagetdbstatus.sh: if [[ -f /tmp/ora_error_${SID_NAME}.txt ]] ; then
/usr/local/cmcluster/oracletoolkit/hagetdbstatus.sh: cat /tmp/ora_error_${SID_NAME}.txt

Is this a bug or a feature of SGLX Product?

I understood that bash by default creates temporary files in this directory or in the directory specified in the TMPDIR variable and in the same case, if the /tmp directory overflows, we would face the same problem - stopping the package.

YA

Mike_Chisholm · ‎03-30-2023

I would position this as expected behavior. Serviceguard's primary role is to provide high availability to packaged applications. This means if the node currently running the application is experiencing a problem of some sort, Serviceguard should fail the package over to one of the other adoptive nodes. So the monitor may not be explicitly designed to detect and handle a full /tmp filesystem, I would not say the outcome (failure of the monitor service and faliover of the database) is a completely undesireable outcome from a HA perspective. A full /tmp file system can certainly destabilize a linux operating system leading to problems across many subsystems. Although it might or might not affect Oracle directly, it can affect many other operating system processes so in my mind this is a situation where a failover is probably desirable.

If /tmp is filling up repeatedly that should of course be fixed, either by growing it or figuring out why it keeps happening and stopping whatever it is that is filling it up.

I work for HPE.

yilmazaydin · ‎03-30-2023

Hi @Mike_Chisholm.

Thank you for a balanced answer. I agree that any problems potentially negatively affecting the cluster node can also negatively affect the managed application.

YA.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: SGLX 12.80 - not resistant to file system overflow /tmp

SGLX 12.80 - not resistant to file system overflow /tmp

Re: SGLX 12.80 - not resistant to file system overflow /tmp

Re: SGLX 12.80 - not resistant to file system overflow /tmp