1834736 Members
2619 Online
110070 Solutions
New Discussion

H/24 support

 
SOLVED
Go to solution
FERRARI MARCO
Advisor

H/24 support

Hi everyone.
I am soon going to evolve into an h24/365 production environment, with HP-UX 11.11 and Serviceguard.
We are now allowed some hours in the night to performance maintenance and package switching between the nodes.
I am asking if someone knows of a course (or manual or book ) specifically covering the problems of administering ever-active HP-UX clusters.
Best regards,
Marco
9 REPLIES 9
spex
Honored Contributor

Re: H/24 support

Hi Marco,

See this page (especially the document list at the bottom):

http://h20293.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=B8679BA:B3936-90026

PCS
rariasn
Honored Contributor

Re: H/24 support

Hi Ferrari Marco,

http://www.docs.hp.com/en/ha.html

rgs,
FERRARI MARCO
Advisor

Re: H/24 support

Evidently I wasn't clear: my internal customer it is impossible to stop their application at whatever time. Furthermore, did you ever install reboot requiring patches ? How can you do that without stopping for a few minutes ? I know my question seems weird, but I am told not to stop, ever. Marco
FERRARI MARCO
Advisor

Re: H/24 support

Moreover, what if your colleagues don't know enough and are there for night shifts? How many people should know root passwords? What if nobody understands those problems but those in the upper technical sector who cannot hire people? A manual would help. To me: rule 1: have always a free node with MC/SG rule2: contract a fixed number of minutes of un-agreed application stop during the year rule3: contract a fixed number of "phone callable" status rule4: shout when incompetent people are sold as competent ( very good for your career, in general! It's a lose-lose situation !)
Marco
Steven E. Protter
Exalted Contributor
Solution

Re: H/24 support

Shalom,

With Serviceguard properly installed and configured, you set up all the applications to run on either node.

When patching is required on one node, use the command line to fail all packages over to the other node of the cluster.

Now you have a node that can be rebooted as many times as needed to get the patches installed.

Note that ServiceGuard is a High availability environment. This means there are still single points of failure that can cause problems. If you can't afford any downtime you need a fault tolerant environment at multiple locations.

I would look into the SG Metro or Continental Cluster products for this type of situation.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Jaime Bolanos Rojas.
Honored Contributor

Re: H/24 support

Marco,

I do not think there is a specific manual for that, experience is the most important factor, having service guard well configure, having all the redundand hardware that you need, server wise, network wise and datacenter infraestructure wise.
We do not do much if we have service guard running, and in the middle of the night there is a black out and the UPS is not able to handle 5 hours black outs, you need a power generator for that UPS, you need redundancy in your routers and switches, dedicated lines, redundand power supplies, network cards, memory, etc, etc and the list goes on and on.

There is always going to be the time that you are going to have a tech that does not really know what he is doing at nights ( one of those that will panic if he/she sees a red line on one of the drives in the Virtual Array and call the CEO at 3:00 in the morning - I had one of those techs before )
Having procedures documented, having those techs using sudo to get access to root privileges is a good idea, having well prepare tech on call just in case something out of wack happen is also good idea so the junior tech can call him in case of emergency.
Having a contract with HP to support hardware and software is a really good idea too.

Well I can keep on talking about this, like I said before experince will do.

Regards,

Jaime.
Work hard when the need comes out.
FERRARI MARCO
Advisor

Re: H/24 support

Thanks, especially to Jaime. Actually we've had a 100% uptime since April 2005, thanks to SG and, even more than that, thanks to the very low number of hardware failure events.
We now perform patch installing (and/or node switching at a SG package level) between 1am and 7am. Therefore, I am worried by the shrinking of this free ( but not comfortable and not paid until now ) window. Sudo is not an option for the unpredictable and we have managed to have zero problems on the predictable. We're less naive than it seemed to people who recommended SG in the earlier answers. Root password sharing is the biggest problem in such an environment. I am thought to be available 9 to midnight but that's wishful thinking.
A book would persuade executives more than the words of a sysadmin, who is mocked by their typical question: "why are you making all those difficulties if nothing has happened so far ?". Some way, you're prisoner of your good past behaviour. At the same time you'll be blamed when something goes wrong.
Responsibility without power, not uncommon I believe.
Marco

PS:
I am soon going to close the thread if nobody brings his own 'rules' coming from EXPERIENCE as someone suggested.
Jaime Bolanos Rojas.
Honored Contributor

Re: H/24 support

Marco,

I was looking to see if this book was for sale, and it's, it talks about HA in an easy way to read it, I do not know if you already have it, it's not going to teach you configuration or anything like that in detail, but it will give you some very good tips:

Clusters for High Availability: A Primer of HP Solutions, 2/e

Regards,

Jaime.
Work hard when the need comes out.
FERRARI MARCO
Advisor

Re: H/24 support

Thanks everyone.
I believe it is still a situation open to many individual choices, in a trade-off between security and response to failures.
I'd like not to read an answer like "Read the Manuals" on the forums, as if it was something worth the time of typing it. We can read the forum, therefore we can read the manuals.
Marco