Operating System - HP-UX
1753789 Members
7784 Online
108799 Solutions
New Discussion юеВ

Re: Pausing PA-RISC and Itanium servers?

 
Patrick Wallek
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

>>My suggestion makes it less painful than what it is doing now.

No, not necessarily. By shutting things down cleanly, they are guaranteed that NOTHING will be running and that processes are terminated cleanly.

Your suggestions could lead to some abnormal terminations (disabling the NIC) or just disallow new connections (stopping the listener) -- neither of which is what they really want.

Hot backup mode is by far the best tool for this job.
Larry Finnegan
Occasional Advisor

Re: Pausing PA-RISC and Itanium servers?

ITRC ate my post. :'/ The middle tier application server is pretty busy and there are lots of files changing on it contstantly. It's not just a simple web server, but a set of server processes (akin to Oracle listener) along with a proprietary legacy database and hundreds of thousands of files (0 to 2k in size) which are frequently updated. In addition there are usually special proprietary reports being run on the system as well. So, the I/O on that box really could hit heavily while snapshots are being made unless there is a way to stop I/O safely. This application server is the primary reason why downtime is so long each day. It takes about five minutes to stop and five to seven minutes to start back up again.

Oracle hot backup mode sounds like a possibility and I'll look into it with our middle tier application server vendor to make sure it wouldn't hose any reports (they run pretty much 24x7).

The other thing that makes our situation a bit out of the ordinary is that since our application services very random searches that are never the same twice (public library back-end) most of the standard DB behavior does not apply. So we can't count on predictable DB queries resulting in predictable levels of I/O. That applies to both the proprietary legacy DB and the Oracle DB (not sure if or how it would affect hot backup mode).

Our original procedure on our old VA7410 was to stop all remote services (seven other servers not in the SAN picture) that rely on the middle tier application server, stop the application server, stop the second instance of the application server (for a different use), stop the Oracle DB, stop the listener, kill any remaining processes for the application server user, umount all the SAN hosted file systems, snapshot them (about 30 LUNs now), mount the file systems, start up the Oracle listener and DB, start up the application server, start up all the dependent services on remote servers and we're done. In total this process takes about 15 minutes on a good night.

Hot backup mode might help a bit, but it seems the application server is the bit time hit. The vendor has no recommendation for how to speed things up in this regard, which is why I'm looking for an OS based solution. However... it's starting to sound like there might not be one.

A final thought: In the past we had an experience where the OS on the application server lost communication with the SAN. It appeared the the OS partially hung waiting for the SAN to come back. Would there be any possibility unpresenting the LUNs at a time when the application server is doing less than usual (very small windows) taking the snapshot and then presenting them back to the OS with only a minor hiccup? Unpresenting would certainly cut off I/O. Frankly the idea sounds ghastly to me, but I'm really desperate to shorten this downtime so I'm looking at all possibilities. Can HP-UX gracefully recover from this situation?
Mel Burslan
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

If I am understanding you correctly, you have an application server which changes its disk content quite frequently as well as a database server which does the same. If this is the case, your application architecture is ready for an overhaul or redesign on the very desperate end.

In the 3 tiered application systems, web and app server contents should not change that often. Or even if they change, they should be disposable, not requiring a backup, other than a mundane daily or weekly backup of the OS and App-server binaries.

If the data stored on the app server is being relied on in the process of application serving, this function needs to be migrated out of the app server and placed on the database server. In general, app servers don't/shouldn't need SAN connections as their storage needs are very limited.

Having said that, your database end can definitely use the hotbackup method for consistent backups. App server on the other hand, is a totally different story. Momentarily un-presenting the SAN volumes, taking a backup (snapshot) and presenting them might seem like a good idea but at the time of un-presenting if there are open files, your snap will be quite meaningless and should you need to restore to this snap point in the future, you will have some data loss. Even if you can live with this, performing this action, may jeopardize the sanity of your application, as the SAN latencies and time it takes to snap the volumes, may exceed a certain threshold and trigger an application or in the worse case, an operating system panic, costing you dearly.

In unix, anything is possible and everything has a price and risk factor. Unfortunately what price to pay or which risks to take, can only be answered by someone like you, who knows the environment inside-out. I personally would not take the risk for which you are looking for approval and live with 15 minutes down time each night for a sane backup. And if this is a production system, the business should agree on this down time or fund a project to re-engineer the application.

Hope this helps
________________________________
UNIX because I majored in cryptology...
Tingli
Esteemed Contributor

Re: Pausing PA-RISC and Itanium servers?

Another thing can be considered for data base backup is using Oracle rman backup.
Steven E. Protter
Exalted Contributor

Re: Pausing PA-RISC and Itanium servers?

Shalom,

The OS is designed to run for months without interruption.

You need to rethink your configuration on the SAN to permit a more dynamic situation without need for breaks.

EVA-4000 snapshots can be used to get your backups done with nearly zero downtime.

You don't need to stop the server, you need to reconfigure the SAN and possibly educate yourself on the things capabilities.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Rita C Workman
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

I'm not an HP storage person....so here goes:

Oracle should be done using hotbackup mode, this will make it easier for you and keep that running to avoid downtime.
Your app server that has alot of I/O and file activity...well....have you thought of trying to set up something with an additional mirror.
You would have to shut down the app server; attach the new mirror disk & sync it up (lvsync). Then split the xtra mirror disk off (lvsplit). [Restart your app server] Mount your mirror disk on some "other" server and make your backup.
The first time you sync your disk should be longest time, after that I was thinking you could use the (lvmerge) so it only has to resync what bit maps are out of sync (so less time).
It would be a bit to set up, but once done - it's done. Check and see if HP has a product to do this (cause it's what I do using EMC tools).

Just a thought,
Rita
Mel Burslan
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

Rita and Steven,

I believe his problem is not the time it is taking to perform the snap's or backups. His problem is the time it requires to quiesce his application and its connections so that he can take a clean backup. And unfortunately, I can not see a way to alleviate that kind of pain even with the best storage system in the world. I think his application infrastructure needs to be revamped to distribute the load so this quiescence of the app does not take that long.
________________________________
UNIX because I majored in cryptology...
Rita C Workman
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

Mel,

That may well be the case. We too have some "applications" here where the Appl server piece is what takes the longest to stop/start (around 10 min). And frankly there is no way around it. It is strictly a vendor app thing that we can not control. What we did is make that piece the only issue. By syncing a mirror then splitting it, this is then just incrementals only & takes us a few minutes - added that to the 10 min.it takes to stop/start our App server, we keep the downtime of the App server to generally less than 15 minutes.
For us this was do-able with an acceptable downtime until they change vendors and software.

He may have no way to address the app software stop/start either...

/rcw
Larry Finnegan
Occasional Advisor

Re: Pausing PA-RISC and Itanium servers?

Thanks to everyone's replies. It looks like we'll just have to live with the app server. Since the vendor designs and owns it, we have no way to get them to revamp anything about it. It's about 30 years old and the migration from the legacy DB to Oracle is not yet complete. However, even with that move, I don't expect this application server to stop using local files that are key to operation. It's not my place to judge their design. So, 15 minutes seems to be our best case scenario. Rita hit it right on the head when she said that we don't really have a choice. Thanks again, some really helpful answers and ideas.