System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Pausing PA-RISC and Itanium servers?

Larry Finnegan
Occasional Advisor

Pausing PA-RISC and Itanium servers?

I need to be able to make snapshots on our EVA 4000 SAN of LUNs that host striped LVM logical volumes. (The striping is used to load balance across the dual HBAs and fiber paths back to the SAN switch) My goal is to do it with minimal downtime to our services. Currently, I have to down the application server (PA-RISC) and the Oracle database server (Itanium). The application server can take five minutes to stop and between five and seven minutes to start up again. This results in (accounting for other processing) a 15 minute downtime each night.

I am thinking that if I can pause the OS on each system to prevent any I/O, I could make reliable snapshots with almost no downtime. Is this possible?
18 REPLIES
Mel Burslan
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

I am assuming all you want is to take a consistent snapshot of your database. Is that right ? App servers, usually do not have files that change frequently, so backing them up, if necessary, should not be a problem.

On the other hand, for oracle, you need to talk to the dba's and have them provide you a script or set of commands to put the database into what-so-called "hot backup mode" and you make sure these commands run moment before your backu0p starts and reverted right after the backup is completed.

While the database is in hot backup mode, users can still access the database. But the updates get written to a special log instead of updating the tables directly. Once out of this mode, tables get updated from the log file.

Hope this helps
________________________________
UNIX because I majored in cryptology...
Pete Randall
Outstanding Contributor

Re: Pausing PA-RISC and Itanium servers?

You mean a pause button that would suspend "playback", or processing in this case, then resume when you're ready? Brilliant!!

And despite the fact that it would seem to be easy enough to do, I don't know of anyone that's thought of it. The industry seems to be thinking exactly the opposite. In fact, HP even has an environment dubbed "NonStop".

If I were you, Larry, I'd be talking to the patent office - soon!


Pete

Pete
Tingli
Esteemed Contributor

Re: Pausing PA-RISC and Itanium servers?

If it is only to stop the application to prevent I/O in database. You can just cut off the connection between the two servers. Say, stop the nic card and etc...
Patrick Wallek
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

>>Say, stop the nic card

I'm not sure that would be advisable.

If you stop the NIC, then depending on the duration of the outage, you could have lots of connections that get dropped suddenly, which could lead to other issues.

If this is part of a cluster, then the disappearance of an IP address could another NIC to come online or, depending on the configuration, it could trigger a package failover to another node.
Tingli
Esteemed Contributor

Re: Pausing PA-RISC and Itanium servers?

Or, stop the Oracle listener.
OldSchool
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

"If it is only to stop the application to prevent I/O in database. You can just cut off the connection between the two servers. Say, stop the nic card and etc..."

yeah...right...and what do you think that would do to the running database???
Patrick Wallek
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

>>Or, stop the Oracle listener.

That will only prevent new connections.

Anything currently running in the DB, say a SQL query, will continue to run, and anyone already connected to the DB will retain their connection.
Tingli
Esteemed Contributor

Re: Pausing PA-RISC and Itanium servers?

But currently they just shut down the application server. My suggestion makes it less painful than what it is doing now.
Mel Burslan
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

One word: Oracle Hotbackup Mode

(well three words, all right)
________________________________
UNIX because I majored in cryptology...
Patrick Wallek
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

>>My suggestion makes it less painful than what it is doing now.

No, not necessarily. By shutting things down cleanly, they are guaranteed that NOTHING will be running and that processes are terminated cleanly.

Your suggestions could lead to some abnormal terminations (disabling the NIC) or just disallow new connections (stopping the listener) -- neither of which is what they really want.

Hot backup mode is by far the best tool for this job.
Larry Finnegan
Occasional Advisor

Re: Pausing PA-RISC and Itanium servers?

ITRC ate my post. :'/ The middle tier application server is pretty busy and there are lots of files changing on it contstantly. It's not just a simple web server, but a set of server processes (akin to Oracle listener) along with a proprietary legacy database and hundreds of thousands of files (0 to 2k in size) which are frequently updated. In addition there are usually special proprietary reports being run on the system as well. So, the I/O on that box really could hit heavily while snapshots are being made unless there is a way to stop I/O safely. This application server is the primary reason why downtime is so long each day. It takes about five minutes to stop and five to seven minutes to start back up again.

Oracle hot backup mode sounds like a possibility and I'll look into it with our middle tier application server vendor to make sure it wouldn't hose any reports (they run pretty much 24x7).

The other thing that makes our situation a bit out of the ordinary is that since our application services very random searches that are never the same twice (public library back-end) most of the standard DB behavior does not apply. So we can't count on predictable DB queries resulting in predictable levels of I/O. That applies to both the proprietary legacy DB and the Oracle DB (not sure if or how it would affect hot backup mode).

Our original procedure on our old VA7410 was to stop all remote services (seven other servers not in the SAN picture) that rely on the middle tier application server, stop the application server, stop the second instance of the application server (for a different use), stop the Oracle DB, stop the listener, kill any remaining processes for the application server user, umount all the SAN hosted file systems, snapshot them (about 30 LUNs now), mount the file systems, start up the Oracle listener and DB, start up the application server, start up all the dependent services on remote servers and we're done. In total this process takes about 15 minutes on a good night.

Hot backup mode might help a bit, but it seems the application server is the bit time hit. The vendor has no recommendation for how to speed things up in this regard, which is why I'm looking for an OS based solution. However... it's starting to sound like there might not be one.

A final thought: In the past we had an experience where the OS on the application server lost communication with the SAN. It appeared the the OS partially hung waiting for the SAN to come back. Would there be any possibility unpresenting the LUNs at a time when the application server is doing less than usual (very small windows) taking the snapshot and then presenting them back to the OS with only a minor hiccup? Unpresenting would certainly cut off I/O. Frankly the idea sounds ghastly to me, but I'm really desperate to shorten this downtime so I'm looking at all possibilities. Can HP-UX gracefully recover from this situation?
Mel Burslan
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

If I am understanding you correctly, you have an application server which changes its disk content quite frequently as well as a database server which does the same. If this is the case, your application architecture is ready for an overhaul or redesign on the very desperate end.

In the 3 tiered application systems, web and app server contents should not change that often. Or even if they change, they should be disposable, not requiring a backup, other than a mundane daily or weekly backup of the OS and App-server binaries.

If the data stored on the app server is being relied on in the process of application serving, this function needs to be migrated out of the app server and placed on the database server. In general, app servers don't/shouldn't need SAN connections as their storage needs are very limited.

Having said that, your database end can definitely use the hotbackup method for consistent backups. App server on the other hand, is a totally different story. Momentarily un-presenting the SAN volumes, taking a backup (snapshot) and presenting them might seem like a good idea but at the time of un-presenting if there are open files, your snap will be quite meaningless and should you need to restore to this snap point in the future, you will have some data loss. Even if you can live with this, performing this action, may jeopardize the sanity of your application, as the SAN latencies and time it takes to snap the volumes, may exceed a certain threshold and trigger an application or in the worse case, an operating system panic, costing you dearly.

In unix, anything is possible and everything has a price and risk factor. Unfortunately what price to pay or which risks to take, can only be answered by someone like you, who knows the environment inside-out. I personally would not take the risk for which you are looking for approval and live with 15 minutes down time each night for a sane backup. And if this is a production system, the business should agree on this down time or fund a project to re-engineer the application.

Hope this helps
________________________________
UNIX because I majored in cryptology...
Tingli
Esteemed Contributor

Re: Pausing PA-RISC and Itanium servers?

Another thing can be considered for data base backup is using Oracle rman backup.
Steven E. Protter
Exalted Contributor

Re: Pausing PA-RISC and Itanium servers?

Shalom,

The OS is designed to run for months without interruption.

You need to rethink your configuration on the SAN to permit a more dynamic situation without need for breaks.

EVA-4000 snapshots can be used to get your backups done with nearly zero downtime.

You don't need to stop the server, you need to reconfigure the SAN and possibly educate yourself on the things capabilities.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Rita C Workman
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

I'm not an HP storage person....so here goes:

Oracle should be done using hotbackup mode, this will make it easier for you and keep that running to avoid downtime.
Your app server that has alot of I/O and file activity...well....have you thought of trying to set up something with an additional mirror.
You would have to shut down the app server; attach the new mirror disk & sync it up (lvsync). Then split the xtra mirror disk off (lvsplit). [Restart your app server] Mount your mirror disk on some "other" server and make your backup.
The first time you sync your disk should be longest time, after that I was thinking you could use the (lvmerge) so it only has to resync what bit maps are out of sync (so less time).
It would be a bit to set up, but once done - it's done. Check and see if HP has a product to do this (cause it's what I do using EMC tools).

Just a thought,
Rita
Mel Burslan
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

Rita and Steven,

I believe his problem is not the time it is taking to perform the snap's or backups. His problem is the time it requires to quiesce his application and its connections so that he can take a clean backup. And unfortunately, I can not see a way to alleviate that kind of pain even with the best storage system in the world. I think his application infrastructure needs to be revamped to distribute the load so this quiescence of the app does not take that long.
________________________________
UNIX because I majored in cryptology...
Rita C Workman
Honored Contributor

Re: Pausing PA-RISC and Itanium servers?

Mel,

That may well be the case. We too have some "applications" here where the Appl server piece is what takes the longest to stop/start (around 10 min). And frankly there is no way around it. It is strictly a vendor app thing that we can not control. What we did is make that piece the only issue. By syncing a mirror then splitting it, this is then just incrementals only & takes us a few minutes - added that to the 10 min.it takes to stop/start our App server, we keep the downtime of the App server to generally less than 15 minutes.
For us this was do-able with an acceptable downtime until they change vendors and software.

He may have no way to address the app software stop/start either...

/rcw
Larry Finnegan
Occasional Advisor

Re: Pausing PA-RISC and Itanium servers?

Thanks to everyone's replies. It looks like we'll just have to live with the app server. Since the vendor designs and owns it, we have no way to get them to revamp anything about it. It's about 30 years old and the migration from the legacy DB to Oracle is not yet complete. However, even with that move, I don't expect this application server to stop using local files that are key to operation. It's not my place to judge their design. So, 15 minutes seems to be our best case scenario. Rita hit it right on the head when she said that we don't really have a choice. Thanks again, some really helpful answers and ideas.