HPE EVA Storage

SAN got slow suddenly

 
Eric Antunes
Honored Contributor

SAN got slow suddenly

Hi,

I noticed recently that the Data Protector backups were taking 4 to 5 more time to complete for the same amount of data.

To troubleshoot the issue I tested a rman backup directly to the server's disks and it is also taking more than twice the time it was taking before the issue.

This is happening since last month. Any idea?

Thank you,

Eric Antunes



Each and every day is a good day to learn.
15 REPLIES 15
Ivan Ferreira
Honored Contributor

Re: SAN got slow suddenly

A full description of your storage area network would help.

There are a lot of possible reasons for that, including, storage utilization by other hosts, location, disk failures, MPxIO related issues.

Check the service times for your disks, transfer rate, etc. If you have historic performance data, compare them.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Eric Antunes
Honored Contributor

Re: SAN got slow suddenly

Hi Ivan,

This is a 1.5 Tera bytes SAN.

It is part of an EVA 4000 with 2 HP-UX 11iv1 servers, 2 Windows Servers (Win Server 2003) , 2 32-bits Blades (Win Server 2003) and 1 64-bits Blade (Win Server 2003).

It has the following hardware:

- 1 "HP 2-ch U320 SCSI HBA" (whatever that means);
- 1 "HP StorageWorks 4/8 Base SAN Switch";
- 1 "HP Universal Rack 10642 G2 Shock ALL";
- 1 "Mod PDU 16A HV WW ALL";
- 1 "HP EVA4000-A 2C1D Array";
- 8 "HDD 146GB FC 10K 1' ALL ADD-ON ALL";
- And, finaly, last December we added 1 "HDD ~292GB (not sure about the exact size) FC 10K 1' ALL ADD-ON ALL".

Eric
Each and every day is a good day to learn.
Richard Tengdin
Trusted Contributor

Re: SAN got slow suddenly

You said you have the following:

- 8 "HDD 146GB FC 10K 1' ALL ADD-ON ALL";
- And, finaly, last December we added 1 "HDD ~292GB (not sure about the exact size) FC 10K 1' ALL ADD-ON ALL".

Since you only have 9 drives listed they must be in the same disk group. You will have 1373 GiB of total space in the Group (Protection level of None), but fully 20% of the space is held on the 300GB drive.

20% of space meants 20% of the data which means 20% of I/O. Your 300GB drive is badly hot-spotted.

Your best solution is to get two 146GB drives, add them to the Disk Group, and remove the 300GB drive. That configuration will add about the same space but each of the 10 drives will have 10% of the space, data, and I/O.
Ivan Ferreira
Honored Contributor

Re: SAN got slow suddenly

You said, a direct backup with rman to the server's disk.

¿Is the source and destination storage the same? or the backup is local?

Have you tried doing read and write performance test only with dd, using /dev/zero and /dev/null?

¿What is your disks service time (avserv) as reported by sar -d 5 1000, during the backup?

¿If you have multiple paths, have you verified which path is using, and tested forcing the usage of the second path? We had a performance problem related to one HBA failure in a multi HBA host.

Also, if the other hosts connected to the storage changed they load, they can affect the overall storage performance. The evaperf tool may help you to check the status of your storage.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Eric Antunes
Honored Contributor

Re: SAN got slow suddenly

Richard:

After all we have one more "HDD 146GB FC 10K 1' ALL ADD-ON ALL". So we have the following:

- 9 "HDD 146GB FC 10K 1' ALL ADD-ON ALL";
- 1 "HDD ~292GB (not sure about the exact size) FC 10K 1' ALL ADD-ON ALL".

Yes, our 292Gb disk gets rougthly 18% of the total I/O. You made a good point but we can't add 146Gb disks anymore because they were discontinued.

I noticed that this same 292Gb disk has an older firmware (H02) than the other disks (H03).


Ivan:

"Is the source and destination storage the same? or the backup is local?"

Yes, the source and destination storage was the same: from SAN to SAN, not local.


"Have you tried doing read and write performance test only with dd, using /dev/zero and /dev/null?"

What do you mean with "only with dd"? No activity on the server besides dd?


"What is your disks service time (avserv) as reported by sar -d 5 1000, during the backup?

Bad, vey bad:


"If you have multiple paths, have you verified which path is using, and tested forcing the usage of the second path? We had a performance problem related to one HBA failure in a multi HBA host."

We have only one HBA. Have we a bottleneck here...?

Will check the evaperf tool this afternoon ...

Thanks,

Eric
Each and every day is a good day to learn.
Eric Antunes
Honored Contributor

Re: SAN got slow suddenly

I forgoted to post the "sar -d 5 333" averages:

Average c9t0d0 0.46 0.63 13 220 0.04 0.42
Average c4t0d1 14.95 1.24 56 1678 0.50 5.07
Average c4t0d2 0.60 0.50 11 681 0.01 1.09
Average c4t0d3 0.01 0.50 0 0 0.00 5.58
Each and every day is a good day to learn.
Eric Antunes
Honored Contributor

Re: SAN got slow suddenly

Those averages were collected with the Production Oracle Applications 11i up and a single "dd if=/dev/zero of=/disc3/x count=1000000": no backup was run.
Each and every day is a good day to learn.
Ivan Ferreira
Honored Contributor

Re: SAN got slow suddenly

Your average service time was good, but you should test with larger block sizes dd if=/dev/zero of=(device) bs=8k (256 and 512).

Check also your rman parallelism configuration, check also your overall system resource usage when you run your backup.

I would like to see some performance data collected with sar during your backup session. Disk, memory, and cpu.

Also, please specify which one is your SAN disk on the output of sar -d.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Eric Antunes
Honored Contributor

Re: SAN got slow suddenly

Hi Ivan,

Here are the stats collected during the rman backup (paralellism=4).

The SAN physical volumes are c4t0d1 and c4t0d2.

Eric
Each and every day is a good day to learn.