Around the Storage Block
cancel
Showing results for 
Search instead for 
Did you mean: 

Snapshots, Backups, and other SAP HANA complications.

flashlight9_400x267.jpg

“I don’t get it! Every Saturday morning about 12:20 AM our system crashes.’”

That is how my new manager, started his story of why data protection is serious business. I had just started a new job at the campus data center where I noticed odd operational rituals and trinkets hanging in interesting places.

“I watched that system for weeks,” he continued.  

“We had system engineers look at it. We had developers fly in. We couldn’t figure it out; so, I bought a sleeping bag and camped out! That first Friday night I fell asleep in my office and I missed it. The next week I was ready – I had set my alarm clock! Watching my bank of terminals, I waited. 12:15…12:16…12:17 and then it happened. Noting on my monitors indicated what had happened. Running to the datacenter, I found every light in the place was on.

“Whose here!” I screamed.

A visibly shaken and scared security guard revealed himself from behind one of the stacks.

“What did you do? What did you touch?” 

“Nothing!” he said nervously.

“You had to have done something,”

“No, I did the same thing I always do. I turn on the lights, walk through the stacks to make sure everything is secure, then leave.”

After 10 minutes of interrogation, I reassured him it’s ok and I escorted him to the door. Just as I was ready to close the door, he stopped me. 

“I need to get my flashlight.”

There stuck to the side of the first cabinet of our mainframe was this big, long security flashlight heflashlight1_cropped.jpgld securely to the cabinet by a huge magnet!

Pointing to the location where the flashlight was that night, my new manager made his point, “That is why we have data protection. Because you can never predict who will do something to crash your system!”

One of the games we get to play in the Storage Solution Lab for SAP HANA is the “What if” game. What if the network goes down? What if a hard disk fails? What if the server catches fire? Not really! I’ve never had to use a fire extinguisher to put out a server that instantaneously combusted. What if someone slaps a magnetic flashlight on a cabinet? Needless to say there are days we think about some of those scenarios – and then we test them.saphanamsara2018.png

While working on the HPE Reference Architecture for SAP HANA TDI using HPE MSA 2052 SAN Storage and HPE ProLiant DL560we implemented a Highly Available (HA) & Disaster Recovery (DR) configuration to make sure the architecture would support real world customer events. However, what made this RA more challenging was wondering if traditional data protection theories aligned with the SAP HANA in-memory database.

For example, the first question I asked myself was “With an in-memory database, do I really need a highly available storage array?” Reading the SAP HANA Administration Guide, it says persisting data to storage is for “a fault or a failure” purpose. That sounds a lot like a backup to me. That led me to wonder if the whole 3-2-1 data protection best practice should be used.DataProtect3-2-1.jpgSo the engineer in me started out this design, implementation, and test experience questioning the very theories of data and system protection I had come to rely on for years. What I found is a resounding “Yes you should!” – In both cases! However, the lines can gray when you get right down to it.  

How SAP HANA Works

There are two important aspects of the SAP HANA database persistence model: preserving the data base and protecting the transactions. By convention, a data volume is created to preserve changes to the database. Data savepoints occur every five minutes, which initiates a push of changes from the in memory data to the data volume. This is predictable behavior unless there are enough transactions clipping along that a savepoint action is triggered before the scheduled time.

To save transactions, a separate log volume is created. This protects each transaction executed against the database. Between these two volumes, the SAP HANA database is protected. When the SAP HANA database services start up, the data from the data volume is loaded first. This loads the last known savepoint of data. Then a comparison of data to transactions is performed and the missing transaction entries are executed from the log volume. In this way, SAP HANA restores the database to an application consistent state upon startup. Failures that effect either of these two volumes will require some form of data recovery. You don’t want to do this very often so this is why you want an HA storage system.

When looking at protecting the SAP HANA environment, we often speak in terms of protecting the database in a crash-consistent or an application-consistent state. The normal operation of the SAP HANA database follows a crash-consistent model. For example, when an SAP HANA node crashes, the oldest savepoint could have occurred 4 minutes and 59 seconds earlier. The data volume is then recovered to that savepoint, followed by the reloading of the transactions in the log volume for the last 4 minutes and 59 seconds. This two-step process brings the database to an application-consistent state based on a crash consistent saving model.  

To protect the SAP HANA database using an application-consistent method requires software to trigger a global savepoint on the database - effectively locking the database for an instance in time. This is similar to a traditional SQL database quiesce. Triggering a snapshot in conjunction with this global savepoint is the best way to preserve this data state. HPE Recovery Manager Central has this ability when using the HPE 3PAR storage systems; however, there currently isn’t a utility to perform this action for the HPE MSA 2052 so we had to come up with a different protection plan.  

How much data can you lose?

Every time I ask this question to customers, they almost always reply: none! However, our sales teams report most SAP clients implement a Recovery Point Objective (RPO) of less than 15 minutes for production systems. Why? It’s expensive to get those last 15 minutes!

With this insight and a good understanding of the unique challenges SAP HANA creates for IT organizations, we set out to create a set of contingency plans using storage replication to create a HA and DR protection plan for the HPE ProLiant DL560 servers and HPE MSA 2052 SAN storage systems. Although we used two HPE MSA 2052 arrays and the MSA’s volume replication to create a fast recovery model to preserve our volumes, this same type of storage replication feature is also available on the all the HPE 3PAR and Nimble storage systems TDI certified for SAP HANA.

Let me be the first to say that using snapshots and storage replication is not a complete data protection plan; however, from the testing done in the lab, I can say it worked. It provided a fast and effective way to recover volumes and using storage replication in conjunction with volume snapshots, we were also able to test a common question we are asked about creating test and dev configurations using snapshots.

Building protective contingency plans

What we tested in the Storage Solutions Lab was a simple three-tier contingency plan to provide a crash-consistent recovery model for SAP HANA:

  1. Storage-based replication for fast snapshot recovery of volumes for a quick point–in-time recovery.
  2. Fast replication and storage snapshot recovery combined with the added protection of using SAP Management Suite-file based recovery.
  3. Remote storage-based replication to a hybrid storage array with lower cost media.

The first step of this plan (1 above) was a simple volume replication of our data and log volumes. This provided a quick recovery point for our volumes; however, replication restrictions on the HPE MSA SAN limited the replication time to syncs every 30 minutes. (You won’t see this limitation on the 3PAR arrays.) This 30 minute limit provided a snapshot which was equivalent to a crash consistent state the SAP HANA database service would experience if the system crashed at that point in time. This meant the worse-case RPO would be 30 mins. Trying to get to an RPO of 15 minutes, we added the next contingency step.  

The second step expanded the protection model by utilizing storage replication in conjunction with SAP Management Suites’ ability to back up transactions of the log volume. This backup schedule can be set to any schedule and we set it for the small interval of 5 minutes. Adding a backup volume for these backups to be saved to and changing the offset time of the data and log by 15 minutes our overall contingency plans placed us at a worst case RPO of approximately 15 minutes.

As an added step to try to follow the 3-2-1 Data Protection theory, we added a secondary replication copy of our snapshots to a remote MSA. The MSA volume snapshot schedule replicated our updated snapshots to provide the ability to restore our data, log, and backup volumes from the lower cost storage system. See the diagram below.

msa2052replication.png

Technically, this configuration doesn’t pass the criteria of the 3–2-1 Data Protection theory because it uses the same MSA technology; however, it’s easy to create an argument that the arrays use different media types - the primary/secondary arrays being all SSD and the remote hybrid array using HDDs.

What you can learn from our playing in the lab

From our work in the HPE Storage Solutions Lab for SAP HANA, we can tell you the HPE Reference Architecture for SAP HANA TDI using HPE MSA 2052 SAN Storage and HPE ProLiant DL560 provides a great setup for SAP HANA. These systems support production, test, and dev environments in a very economical and datacenter friendly footprint.

When protecting your SAP HANA database environment, you will want a super-fast primary storage system based on flash storage to reduce any latency the database creates persisting updates to the storage system. And a very definitive “YES” you want a fully HA storage system for production configurations!

You will also want to formalize a set of contingency plans to recover from unexpected events - like security guards with big magnetic flashlights! And when it comes down to creating those plans, volume replication and snapshots are a very practical way to preserve a crash consistent state of your database. Any of HPE SAP HANA TDI certified storage systems can support this configuration while also providing that fast copy/clone setup for test and dev.

For those mission critical production environments where you want to preserve an application consistent backup, look for HPE Recovery Manager Central and the HPE StoreOnce backup appliance with the HPE Catalyst plugin for SAP to assist you in creating a rock solid data protection plan. 

For more information about the HPE partnership with SAP, see HPE SAP HANA Solutions page.

If you’re interested in protecting your SAP systems against flashlight carrying security guards, contact your HPE solutions partner or an HPE Pointnext representative at hpe.com/us/en/services/consulting.html

0 Kudos
About the Author

Rheid_Schloss

HPE Storage Solutions Engineer specializing in servers, storage, infrastructure design, solution architecture and implementation.