topic Re: linux backup of huge data in Operating System - Linux

linux backup of huge data

Rasheed Tamton — Tue, 17 Feb 2004 07:55:44 GMT

Hi,

I have got 400GB data on a RH Linux Server. Around 350GB data is in a single dir but consisting of hundreds of database files. Now we need to schedule backup. The box has single tape drive of 100/200 GB Ultrium. When I do a manual backup it takes two days and three Ultrium tapes to complete and while the backup continues the system becomes very slow also.

Can anyone please let me know what are the good options to implement the backup strategy for the box. Now I have to make a backup policy and make it on a cron job and hand over it to the Operations group to carry on with the backup. I am confused how it can be arranged with a backup that needs multi tapes. The It is an IBM intel based server. Is there any other backup tool other than tar to use with Redhat. I remember reading about Amanda. Does anyone have got good experience with it with huge data.

Thanks

Re: linux backup of huge data

Nobody's Hero — Tue, 17 Feb 2004 08:06:37 GMT

If you dont have an enterprise backup solution like legato or veritas it will take a long time to back up this data, as you know.

On a couple of my RH9 systems I use a product called 'star'. It is easy to download and install. It runs a bit faster than traditional 'tar', but it is still going to take a long time.

Another option, if you have lots of capacity, is to copy the data to another silo and back it up there. Taht way, time is of no concern.

If this is an ongoing backup each day or week. You need to look into a single license for veritas. I backup 300 GB in about 3.5 hours. Also, ArcServe is available but not real reliable.

Also, try a product called Amanda. I believe it is part of the install of RedHat version9.

Re: linux backup of huge data

Mark Grant — Tue, 17 Feb 2004 08:44:58 GMT

The only thing I can think of is an SDLT or LTO tape drive. You will get all your data on one tape with hardware compression and they are fast. We sometimes restore from SDLT tape rather than use "cp" because it is faster.

Re: linux backup of huge data

Jerome Henry — Tue, 17 Feb 2004 08:59:57 GMT

Can't you do incremental backups ? SAy incremental everyday and full backup on week ends ?

Amanda is said to be slightly hard to use, star ( http://www.fokus.gmd.de/research/cc/glone/employees/joerg.schilling/private/star.html ) works great, I like much mondo ( http://www.microwerks.net/~hugo/ ) too...

J

Re: linux backup of huge data

Brice_3 — Tue, 17 Feb 2004 14:31:42 GMT

Hi Rasheed,

I assuming you are using Ultrium 215, the slower module in LTO-1 family. Even without compression, you should be able to reach 27GB/hr. That means, for a full backup of 400GB data, the time should be less than a day. With H/W compression enabled, it should be faster. We can see the compression is enabled since you are not using 4 100/200G tapes for 400G data.

Ultrium drive backup performance issue might also caused by system resources. Recommanded 1G Hz CPU and 1 GB memory, it also depends on how much work load do you have on the server. You can download HP Library & Tapetools from http://www.hp.com/support/tapetools, and run a sys perf to analysis the server throughput. Maybe the server is the bottle neck.

Besides that, changing to use incremental backup is a good way to reduce the amount of data being backup.

Brice

Re: linux backup of huge data

Olivier Drouin — Tue, 17 Feb 2004 14:57:14 GMT

It takes way too much time time for your backup, you have a performance problem.
You should hve good performance with dump/restore ( standard on redhat ).
I used other soft like tar, gtar but none of them could feed an utltrium as fast as the HW can take.

Re: linux backup of huge data

Mark Travis — Tue, 17 Feb 2004 17:01:24 GMT

I bet you have millions of little files.

Millions of little files can cause something even like Legato and a fast tape subsystem to take days to restore.

Though without knowing more about this particular environment this is just a guess.

I recommend that you find the bottleneck and then look to widening it.

Like I speculated above, if you have lots of little files then the bottleneck is probably the OS doing open/close on the little files.

To test this, try backup/restoring a single file of 50G or so. Just dd something from /dev/urandom. Make it from /dev/urandom so any compression done in the tape drive won't skew your results. Then go and backup and restore the thing. It should go quite fast.

Before discussing a solution you should find the bottleneck.

Re: linux backup of huge data

Mark Travis — Tue, 17 Feb 2004 17:04:03 GMT

Oops. I didn't read closely enough. If all you have are hundreds of files then filesystem open/close shouldn't cause a bottleneck.

But anyway, advice about finding the bottleneck holds. You need to find the bottleneck.

Re: linux backup of huge data

Martin P.J. Zinser — Tue, 17 Feb 2004 22:45:15 GMT

Hello,

the first problem you have to solve is if a backup that encompasses many files that have been stored over an extended period of time (independent if it is days or just a "few" hours) will do you any good at all. If there is anything in these files that does a coordinated update of more than one of these files you can perform a backup like you do it right now to /dev/null. This will be much faster, cheaper and equally usefull. (Sorry for being so direct but you really need to think about this!)

Now assuming all of these files are totally independent, what most probably happens is that other processes access the filesystem
while you do your backup and you just do not get enough bandwidth from the disks to your tapedrive to keep the tape streaming. A simple audible check of the drive during the backup should be able to confirm this.
Things you can do:
Put the data on a mirror set. Break the set at backuptime, then remount one half readonly and use this to feed the tape. (This will also solve the first problem mentioned. You still might need to quiesce any open databases you have on the volumes before breaking the mirror). It also might help to put the tape on a separate SCSI controller to make sure the bus is not busy with other requests while feeding the tape.
Given the requirements you have, you should seriously consider an enterprise backup solution. Note: If you do go for something like Veritas or Legato plan on deicated servers and storage.

Greetings, Martin

Re: linux backup of huge data

Steven E. Protter — Tue, 17 Feb 2004 22:59:01 GMT

There is a new Ultrium format generation 2 tapes which do handle up to 400 GB of data. That can do the job for you. HP sells them and some of them are certified for Linux.

If funding is available, you might want to investigate one of HP's tape libraries. These can accommodate up to 4 drives and many tapes.

We have implemented such a solution at our shop.

SEP

Re: linux backup of huge data

Rasheed Tamton — Wed, 18 Feb 2004 06:54:01 GMT

Thanks to everyone who commented on the subject. It is an IBM eServer X series 250. The database is mySQL. I just need a filesystem backup to the tape. Tape drive is (Vendor: HP Model: Ultrium 1-SCSI Rev: N15G).

I use 200GB Ultrium which can have a hardware compressed files of 200GB. But most of the files are mySQL database files I do not get the HW compression from the tape. So I normally use 3 tapes to get the 400GB data.

We have got Legato and BrightStore with other systems but with this particular box I am not sure we will be going for those. The data is from a legacy mainframe that do not change much often. Normally used for reports - not updates. I need to do the backup once per week only.
It has dual PIII processors as below
model name : Pentium III (Cascades)
cpu MHz : 699.199
cache size : 1024 KB
Phys Mem : 16GB

I do not have enough free space to move it to another file system or to make mirror.

Pls advise.
Thanks
Rasheed.

Re: linux backup of huge data

Thilo Knoch — Mon, 23 Feb 2004 10:10:14 GMT

Only an idea for more performance with tar.
Would it be possible to remount the filesystem with "noatime" for backup?
With the default atime each read to a file (and even the backup by tar) updates the access time entry for the file.
Sometimes software compression is faster.

Have you tried the z option with tar? (but better don't combine hardware AND software compression).

dump/restore is an idea, but my last experience is that the file system has to be in a quiet state (remounted read only or unmounted). In the other case the the backup seems to be okay, but there are "block expected/different block found errors while restore.