1827880 Members
1407 Online
109969 Solutions
New Discussion

Re: Poor performance

 
SOLVED
Go to solution
Eric_260
Frequent Advisor

Poor performance

Hello,

I know that there has been a lot of discussion on this. But I was unable to find the perfect solution to my problem.

I've looked and tried a lot of solution that i've found. Maybe I've found the wrong posts or... i don't know... But anyway! Here's my info :

Filesystem Statistics:

Directories ........ 21791
Regular files ...... 571147
Symbolic links ..... 425
SYSV FIFOs ......... 0
BSD sockets ........ 0
Block Devices ...... 0
Character Devices .. 0
Unknown Objects .... 0
----------------------------
Objects Total ...... 593363
Kbytes Total ...... 71519261

Run Time ........... 4:41:15
Backup Speed ....... 4238,18 (KB/s)

4 hours for 70 gigs ? Normally its suppose to be like 60/70 gigs for 1 hour using a single drive ?

We have a HP LTO-2 using SCSI-2

I've seen post about Disk agents, buffers and stuff like that... nothing worked.
I know that the more files that we have that might impact on the backup performance.

But anyway, if anyone might have a little advice that would be great!

Thanks!
19 REPLIES 19
Steven E. Protter
Exalted Contributor

Re: Poor performance

Have you been through this doc?

http://www6.itrc.hp.com/service/cki/search.do?category=c0&docType=Security&docType=Patch&docType=EngineerNotes&docType=BugReports&docType=Hardware&docType=ReferenceMaterials&docType=ThirdParty&searchString=UPERFKBAN00000726&search.y=8&search.x=28&mode=id&admit=1552802428+1091736050969+28353475&searchCrit=allwords

Attaching a script to help gather data for the doc.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Patrick Wallek
Honored Contributor

Re: Poor performance

What are you using to write to tape? tar, cpio, fbackup, OmniBack/DataProtector?
Eric_260
Frequent Advisor

Re: Poor performance

Never got that doc before!

I'm going to read that right now!


To do the backups, i'm using Data Protector 5.1
on Windows.

The example of filesystem was from a drive for a Oracle instance.

But I get the same performance with other machines running on Unix or Windows.
A. Clay Stephenson
Acclaimed Contributor

Re: Poor performance

It's not clear what version of OB2/DP you are running. If you have already increased the data buffers and segment size, I doubt further tuning is going to help much.

I suspect that the fundamental bottleneck is that there not enough disk agents to keep the drive busy. Observe the tape drive while the backup is running. If the drive does not stream almost continuously then you are taking a huge (10-100X) performance hit compared with the throughput of a streaming device. OB2/DP really performs best with fast tape drives when multiple filesystems (in OB2 speak, objects) are feeding a common media agent. I suspect you could actually back 3 filesystems as fast as you back up the one. You might try the divide and conquer approach by using includes and excludes of different directories so that the same filesystem becomes different objects. You then make sure that the device concurrency is >= the number of objects.
The downside to the divide and conquer approach is that it is now rather eash to miss something when directories are added.
It also helps greatly if the disk agent and media agent are on the same host so that the network is not a bottleneck. Finally, you are backing up a fairly large number of files; each of these requires a database hit. A few large files will be much faster than many small files for the same total volume of data. The database under DP is improved considerably over that of OB2.
You might try reducing the logging level to "Log Directories" if you suspect that the database is a bottleneck.

If it ain't broke, I can fix that.
Eric_260
Frequent Advisor

Re: Poor performance

Actually if I click on the link I get no results.

And the script seems to be for a Unix machine.
I'm installed on a Windows machine.

Don't know if that makes a good difference on speed.

I know that when I was running with my AIT device on Unix with Networker the performance was okay. When we switched the AIT device to Data Protector under windows, the performance was decreased.

Don't know if that's still the case now.
Eric_260
Frequent Advisor

Re: Poor performance

Here's the settings of my drives:

Hardware compression
Concurrency 4
Block Size (Default)
Segment size 2000
Disk Agent 8

I have a MSL6030 with 2 LTO2 Drives.

The backup machine use the network at 100%.
The CPU Performance is okay, most of the time the machine is in idle at 90% (during backups).

How can I determine that the issue is with the database of DP ?
Ted Buis
Honored Contributor

Re: Poor performance

Are you saying you are backing up over the network? What type of network? You are not likely to keep the lto-2 drive streaming (without shoe-shining) even with GigE without pulling data from multiple sources at the same time. This is possible in DP. There must be something on this in the DP documentation.
Mom 6
Eric_260
Frequent Advisor

Re: Poor performance

Yes, doing backup over the network.

We are on 100Mbps.

When backing up 1 client, 1 big file, or even a disk i always get a network usage of about 30Mbps.

Since the drives are 30MBps, the network should be used at 100Mbps no ?
Ted Buis
Honored Contributor

Re: Poor performance

The drives are 30 to 60 MBytes/second while 100BaseT is 100Mbits/second, or less than 10MBytes/second. Try multiple NICs and trunking or a GigE link. Also, the packet size of 1504bytes and TCP/IP overhead is also a big issue.
Mom 6
Bill Hassell
Honored Contributor

Re: Poor performance

60Gb per hour is impossible over a network. You are looking at the speed of the tape drive connected directly to a computer. Networks are VERY slow compared to today's tape drives. Figure less than 50% of the network's native speed is available for data (this accounts for overhead, packetizing, etc) and networks are rated in bits, not bytes. A best case formula would be: wire-speed / 20 = bytes/sec. So for 100BaseT, that's 5Mbytes/sec. So your stats:

> Backup Speed ....... 4238,18 (KB/s)

is right on for a 100Mbit link. You could try changing all the NICs and switches to 1000BaseT but now CPU/driver time may limit the maximum speed so don't expect 50Mbytes/sec, more like 30 or so. With 10000BaseT you may want to use jumbo frames but this may be incompatible with Windows.


Bill Hassell, sysadmin
Ted Buis
Honored Contributor

Re: Poor performance

Try an FTP of a file over your network. Assuming it shows you the transfer statistics as it does in UNIX, you can see the limitations of your network. I've done 7 or 8 MBytes/sec over FDDI at 100Mbits/sec, with FTP but Bill says 5MBytes/sec. That would be 18GBytes/hour best case. With backup, it isn't just reading the disk, but marking each file as to when it was last accessed, and there is file system overhead, so you never can get the highest speeds that you might think just in getting the data off the disk. Then there is the TCP/IP overhead. You still may need to combine streams just to get as much as you can over the network.
Mom 6
Eric_260
Frequent Advisor

Re: Poor performance

Yeah, right I was mixing up Mbps in MBps!
Damn it! I hate that speaking about Mbps and MBps there's always a mix up somewhere...

anyway, I know the implications of file access, open/close, TCP overhead etc...

But still its normal to get 4MB/s on a 100Mbps ?

Because okay let's put the network at 90% capacity. So 90Mbps.

90 / 8 = 11.25 MB/s
If I do a backup of a single filesystem, I never get that speed, always get 4MB/s.

That's only 32 Mbps. 32% of the network.
If I do a transfert between the 2 machines I get the 100% of the Network usage.

I know that Network are usually slow for backups, but that's the best option that my company have so far. So I need to maximize the network usage.

Anyway I'll try to get another solution in place. Because if 4MB/s is really the max I can get, I won't go anywhere. I guess that's it i'm at the maximum speed we can do.

Thanks for all your help!
Bill Hassell
Honored Contributor
Solution

Re: Poor performance

90/8 is not the right calculation. True, a byte is 8 bits but that does not account for the TCP/IP overhead. Let's start with an 8 Kbyte block. It can't be transmitted over TCP/IpP on a 100BaseT link. The maximum is only 1500 bytes per packet (rough numbers). So the 8Kb block must be sent as 6 packets. Each packet needs TCP/IP headers that include IP addresses (source and destination) along with other details. So that 8bit byte is looking more like 10bits with all the overhead. But the data doesn't travel in one direction. The packets must be disassembled, transmitted, acknowledged, retransmitted if necessary, and reassembled into the original data record. That's where server overhead (driver time) gets involved.

Secondly, you'll never get 90% throughput on your 100Mbit link. After you browse Rick Jones' NetPerf web pages, you'll understand more about real network performance. http://www.netperf.org/ is the place to go for LAN performance. With some TCP?IP tuning, you might improve on the 50% number but not by much.

Finally, as mentioned before, you'll probably get shoe-shine performance out of your tape drive and that will seriously affect throughput as well as prematurely wear out your tape drive. You pay a very stiff penalty for not keeping the drive busy. In the old days of reel-to-reel tapes, the drive actually started, recorded and stopped for every record (really old stuff). These drives would literally buzz as the capstan rollers and brakes did their work. Later, the drives started buffering a few Kbytes and allowed the tape to avoid the mechanical start/stop cycles and performance soared.

Then in the 80's, the concept of a streaming tape drive became a reality. No more start/stop. Instead, a small motor would slowly spin the tape up to full speed. Once at speed, data would be recorded continuously. The idea is that the computer would keep sending data fast enough to keep the tape moving. But if the computer could not keep the data coming fast enough, the tape drive would run out of data (an underrun condition) and it would have to stop, backup, take a running start again, figure out where it was on the tape, find the end of the last record and then start recording again at just the right spot.

Streamers were inexpensive nad performed well but were terrible if the data stream wasn't fast enough, and for an entire tape, you could resposition several thousand times thus causing recording head wear equivalent to several hundred tape passes without repositioning.

Now, virtually all tape drives are streamers. The difference is that they have a ENORMOUS appetite for data. The LTO-2 drive is no exception. True, these drives have bigger data buffers (many megs) but unless the data stream is slightly faster than the tape drive (measured over many seconds), the tape will still be repositioned once the buffers are exhausted. This is a case where a slower tape drive is actually a better choice. You match the average data rate of the tape drive with the data rate of your backup stream.

So the bottom line is that you need a very fast link between computers. GigE is a possibility, just computer-to-computer if necessary, or aggregate multiple LAN cards if both ends support it. Otherwise, you may need to look at a local tape drive to keep the expected backup speeds and not destroy your drive in 6 months.


Bill Hassell, sysadmin
Eric_260
Frequent Advisor

Re: Poor performance

6 Months, really ?

mmm, I think we will start working on getting
something running on GigE for the moment. That will be better.


Well, thanks to all who have provided some answers to help me stop mixing Mbps and MBps! ;)

And understanding how the drives works in streaming!

Thanks again!
Hein van den Heuvel
Honored Contributor

Re: Poor performance

The largsh number of files will also play a role. It will make it harder to keep the tape streaming. No chance to do very large IOs. 500K files in 16000 seconds is about 35 files/second. One would expect 2 or 3 IOs umper file (inode + data) so the disk probably was doing 100 IO/sec. That would be approaching the max, depending on exact placement. So do NOT expect a 10x speed bump if you get 10x more potential throughput from the wire. 2x maybe.

Hein.
Bill Hassell
Honored Contributor

Re: Poor performance

Hein is quite right about small files. sar -a 1 will show you how busy the directory gets during a backup. That's why fbackup and Data Protector launch multiple processes to grab data as fast as possible. If the disks are buffered with an array controller that has a lot (100's of megs) then multiple files can be opened and read at the same time. As a part of your planning process, you might borrow the Ultrium drive and attach it directly to the HP-UX machine. That will provide two important benchmarks: a full speed channel and whether small files are degrading the maximum speed of the tape drive.


Bill Hassell, sysadmin
Ted Buis
Honored Contributor

Re: Poor performance

In one reply Eric said that he was running DP on Windows, so unless he has a UNIX client, I don't think sar will help.
Mom 6
Eric_260
Frequent Advisor

Re: Poor performance

Yep, I'm in windows.

Anyway, so far I've been able to reduce the backup time of about 1 hour for my 6 backup groups.

I've attached the backup server in GigE to his switch, and the switch is connected to the main switch via GigE.

All the other servers are connected to their switch in 100MBps to their switch and then the switch to the main one.

(Yeah that's not the optimal setup, but that's what we have now)

At least now the backup server can receive a lot of stuff from other servers on the network.

So for now that's not so bad, but could be wayyyyy better! I'll have to deal with this, and I'll try to push to create a better infrastructure for the backups.

Thanks Everyone!
Bill Hassell
Honored Contributor

Re: Poor performance

I think Eric is running DataProtector tape server on Windows but the backup is coming from DP client on the HP-UX box. The reason to run sar -a 1 is to see how much directory activity goes on during the backup period.


Bill Hassell, sysadmin