Operating System - HP-UX
1836617 Members
2205 Online
110102 Solutions
New Discussion

Re: Not so High Availabilty with SAP Clustered App?

 
SOLVED
Go to solution
Ralph Grothe
Honored Contributor

Not so High Availabilty with SAP Clustered App?

Hello,

today an HP CE needed to replace a defective CPU on the primary node of a 2-node active/standby SG cluster which has an SAP CI as sole package.
This cluster was once set up together with the official SAP helper scripts from the SG Extensions for SAP.

# B3935DA A.11.14 MC / Service Guard
B3935DA.ServiceGuard A.11.14 Service Guard
B3935DA.ATS-CORE A.11.14 Service Guard Advanced Tape Services
B3935DA.Package-Manager A.11.14 HP Package-Manager
B3935DA.Cluster-Monitor A.11.14 HP Cluster Monitor
# B7885BA B.03.07 MC/Service Guard Extension for SAP
B7885BA.SG-Ext-SAP-R3 B.03.07 MC/ServiceGuard Extension for SAP


Because of this and the marketing alliance HP and SAP/Oracle had formed I have been mislead so far to beleive that clustering of SAP on MC/SG would result in true HA.
That's why I assured our SAP DBAs that if we took the primary node out of the cluster (by a cmhaltnode -f thereon) the package would automatically failover without most logged in users even noticing, or at worst only being blocked for the time of the package failover (approx. 5-10 mins.)
I also said this because I beleived that the TCP connections would be sustained anyway, and any db or SAP handles would be retained as soon as the failover completed.
This was obvoiusly not the case,
and instead all users got kicked out from SAP.
What a shame.
Isn't there any state persistence of pending connections performed by SAP for auto-relogins after failover?

Now they were beating me, and even worse got their creeping suspicion nurtured that in the end they don't really have a real HA application, despite all the dear money they spent on the extra standby HW and licenses etc. pp.

I would be grateful for any argumentative support by you that I could raise to my defence.

Rgds.
Ralph
Madness, thy name is system administration
5 REPLIES 5
Patrick Wallek
Honored Contributor

Re: Not so High Availabilty with SAP Clustered App?

Depending on the type of connection you have to the virtual IP, it cold work either way.

I have a couple of SG clusters that require users to telnet to the virtual IP to run the app. Now if the package were to fail over to the other node, then the users will definitely get bumped off and would have to log in again. The secondary node knows nothing about the telnet connections the users had to the primary node.

So in the case of telnet, ssh, a sqlplus or ODBC database connection or just about any other type of stateful connection I would expect users to get disconnected and have to log in again.

Now if it were something like an http connection for a browser based app, then users may just have to do a browser refresh or something similar.

I don't know SAP so I'm not sure how exactly users connect.
Carsten Krege
Honored Contributor

Re: Not so High Availabilty with SAP Clustered App?

If your users connect directly to the CI and the CI package fails over to another node then (of course) the TCP connections are broken and have to be reestablished.

If your users connect through an application server (AS) instead and the CI moves to another node, you can configure your AS to reconnect automatically to the CI (SAPâ s Transaction Reset feature) and the users only have to wait for the CI to start up again (because the TCP connections of the clients to the AS are not affected, only the TCP connections from the AS to the CI are).

The current setup is still HA in some way. It protects you from SAP becoming completely unavailable in case of a h/w failure etc. After a failover to the other node, all TCP connections to the package need to be reestablished though - this is just normal.

Carsten
-------------------------------------------------------------------------------------------------
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG
Ralph Grothe
Honored Contributor

Re: Not so High Availabilty with SAP Clustered App?

Hello Patrick,

shortly after I had posted my question I started realizing that my assumption was a bit silly.
As you pointed out tha takeover node has no notion of the sockets that the failed one had open.
Therefore tcp connections would even be severed by the reopening clients, after the failover has completetd, with a dedicated RST as soon as they receive ACK packets with totally out-dated sequence No. for the next packet.
This scenario is described as "Half-Open Connection Discovery" in the TCP RFC.

As you correctly indicated a stateless protocol such as HTTP (where you have at least for protocol 1.0 a new 3-way handshake for every new HEAD, GET, POST request for every item of a page) would not be affected by a failover.
In that respect would it be legitimate to constate that only tcp applications with a such a protocol pattern are truely highly availabilitable (ouch, forgive this grammatical tort ;-)

Nevertheless, I wonder whether some clever programmer couldn't write a client server application that somehow could manage to maintain protocol state?
I also beleive this to be "easier" for clustered applications that demand their own custom clients (like I'm convinced SAP does).
Madness, thy name is system administration
Ralph Grothe
Honored Contributor

Re: Not so High Availabilty with SAP Clustered App?

Hello Carsten,

interesting point.
So there already seems to exist a work-around for that problem?
I am afraid that I don't know enough about SAP tools.
Does the application server that cares for re-establishing of lost connections usually reside as package in the same cluster,
or is it something disjoint like, say a cluster quorum server?
I only know that with our current SAP cluster some split between CI and AS is already taking place. Actually, the SAP Extra Functions are doing this by starting the AS on a normal cluster start, when both nodes are available, already on the standby node.

I think I have to revisit the SAP Extra functions' scripts and have a closer look at the comments there.
As often, I only have taken over the administration of this cluster,
and I am not sure whether those who deployed it once were fully aware of all options?

Madness, thy name is system administration
Carsten Krege
Honored Contributor
Solution

Re: Not so High Availabilty with SAP Clustered App?

Well, nowadays most customers move away from having clients connected to the CI directly but let them go through the AS. The CI only has minimal size and only hosts the critical processes.

This is also the new concept that SAP uses when they introduced the System Central Services Instances.

Once you have this setup, you can get the AS configured to reconnect automatically to the CI's reloc IP once it failed over. This is just SAP config stuff.

It does not play a role where the AS are located - in the cluster or outside; configured as SG packages or just being started by invoking startsap. The reconnect is an SAP feature independent of SGeSAP.

Carsten
-------------------------------------------------------------------------------------------------
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG