MSA Storage

MSA1000 Slow performance, reoccuring


MSA1000 Slow performance, reoccuring


MSA1000 with Dual array controllers. Both controllers have 512Mb Cache Array Controller firmware is 5.2 Two W2k3 SP3 DL380 G4 servers with a Qlogic Fiber HBA. Each connecting to a MSA 2/3 hub.
The MSA1000 has two additional MSA20 storage enclosures - that gives me 3 bays full of hard drives.
My cache is set to 50% read 50% write
My hard drive rebuild priority is set to low I have 2 HP Ultrium LTO tape drives. One connected to each DL380 Server

My MSA1000 SAN is configured like this:
Bay #1 has (14) 72.8Gb hard drives. Drives 1-7 are configured as Array "A" with Raid 5. Drives 8-14 are configured as Array "B" with Raid 5.
Bay #2 has (14) 72.8Gb hard drives. Drives 1-6 are configured as Array "C" with Raid 5. Drives 7-12 are configured as Array "D" with Raid 5. Drives 13 + 14 are configued as Array "E" with Raid 1 (this is my quorum set) Bay #3 has (14) 146Gb hard drives. Drives 1-14 are configured as Array "F" with Raid 5.

All hard drives are using the default strip/block size, and are configured as BASIC disks.

We utilize the MSA1000 for our central file storage on our network. It also hosts DHCP, IIS, and a three SQL databases.

I have a reoccuring performance issue with our MSA1000 setup. The performance issues is most notably when performing a tape backup to LTO 2 tapes. However, the issue seems to only happen when one Cluster Server has held the cluster for several days.

What I can clearly see that tells me the problem is present is when performing a tape backup with Windows NT Backup, the "Indexing" feature is incredably slow. If I allow the indexing to complete without any intervention, it will take approxamatly 8 -12 hours to complete the 'Indexing". At that point it will start the tape backup process, and the actual backup is also incredably slow. Backing up around 400Gb will take over 24 hours.

I can temp. fix this by switching the cluster over to the 2nd DL380 Clustered server. And running my tape backup from there.The problem will go away and the "indexing" portion of my tape backup process will take about 15-20 minutes to complete, and the total backup process of about 400Gb will take about 7 hours to complete.

I can only assume that the actual performance to my end users is also affected by this slow degredation of performance. I do a tape backup once a week, and this problem is becoming more and more problematic. This is also effecting my nightly incremential backups, as they are inconsistant in the amount of time it takes to complete each nightly backup.

This performance of my MSA1000 seems to degrade over the course of a week. If I manually move the ownership of the cluster to the other cluster server, it seems to clear up, and the process starts to repeat itself. It does not matter which server I allow to own the cluster. The performance issue will be present on either server.

I have allowed Pefromance monitor to run throughout the course of a week on both servers to monitor CPU, memory, disk que length and a few other properties. It does not seem to involve the processor or memory of the servers. Im not sure what I can monitor on the MSA1000 itself - the ACU is not very descriptive. Is there anything I could monitor via the CLI?

I have recently replaced the fiber cables, and upgraded the cache from 256 to 512 Mb in each controller. We have also verified that we are using the most up to date HBA drives. We have even started to use new LTO tapes each week. All this and no resolve...

I do not use MPIO software - maybe I need to? (FYI, firmware V5.2 was recently installed due to a failure in one of our array controllers - it auto updated when I installed the replacement controller we ordered - this problem has been seen with firmware 4.48 and 5.2)

The real problem that I do not understand is this: If I can resolve the performance issue by rebooting one or both of my cluster servers, what is making it slowly come backup through the week?
Martin Smoral
Trusted Contributor

Re: MSA1000 Slow performance, reoccuring

FW 4.x or 5.X are both Active/Passive FW but for proper failover you should be using the Basic MPIO installed on the windows servers if you have dual controllers/dual switches/dual HBA's in the servers. You should be able to get better performance if you go to 7.x Active/Active FW, then you can spread the Luns across the controllers and do some load balancing between the HBA's. As far as the Backup, it will definately be faster if the data to be backed up is currently mounted on the same cluster node that is doing the backup. otherwise it will be slower to perform the back ovder the network between the cluster nodes

Re: MSA1000 Slow performance, reoccuring

Each server only has a single Fiber HBA card installed, that is connected to an MSA 2/3 hub. I remember reading somewhere in the 7.X firmware documentation, that the MSA 2/3 hub is not supported on the 7.x firmware. Is this still true? I have not been able to find that document since..

Since i only have a single path from each server to the MSA 2/3 hub, do i still need the MPIO software? I dont believe that i would have any beinfit from using that - but if it's required, then i'll install it. But its been running for quite some time without it.

How would i split the LUNs across HBA's? Is that a new feature in the 7.x firmware?

ALso - i think i found the smoking gun to our performance problem. Turns out that the cache batteries in each server are not maintaing a charge, and utilmatly disabling the array accelerator on my onboard Smart Array 6i. Im working with HP to get the replacements now.. Having the cache work intermintaly will def. explain my mysterious issues that ive been seeing...

Thanks in advance for the advice guys..
Uwe Zessin
Honored Contributor

Re: MSA1000 Slow performance, reoccuring

Do I understand correctly?

An MSA1000 with two controller modules and two 2/3 hubs, one server connects to one hub only and the other server connects to the second hub only?
Martin Smoral
Trusted Contributor

Re: MSA1000 Slow performance, reoccuring

Uwe, I was wondering that myself.

William, your right the A/A FW 7.x is not supported on the MSA Hub. If you only have a single HBA, then your also correct that you do not need the MPIO on the servers and for that matter, you do not need two MSA controllers either.

The A/A FW and FULL MPIO SW would let you balance luns across the controllers but you would need to convert your msa by removing the 2/3 hubs and replacing them with embedded switches or Fiber I/O mods and external Switches and then add the 2nd HBA to each server so you have 2 completely redundant paths.

If this info has helped, please assign points accordingly....

Re: MSA1000 Slow performance, reoccuring

Just to clarify.

I have two servers. Each have a single HBA. I have only one MSA2/3 hub that each servers HBA is connected to. So only one path the the MSA1000 for each server.

How much of a performance increase could i expect to gain, if i upgraded the current configuration - and installed the required hardware as mentioned above, to support two HBA connections from each server, and then splitting up the LUNs across each path?

Would that effectivley double the throughput of my MSA1000?

Just food for thought - i am getting ready to replace the current MSA1000 since it is no longer supported by HP (hardware parts). I am looking at the MSA2000.

Could someone recommend a good hardware description/setup for an MSA2000, with dual/redundant paths. Im looking to get the maximum amount of throughput, speed, IOPS as i can out of this thing. Money is not an issue for this project.

I will be replacing the entire MSA1000, with an MSA2000. That will include all current hard drives (i have 3 shelves of 14 hard drives each - 10k RPM Ultra3 SCSI - 72gb). It will include all connectivity as well - so im really starting with a blank canvas, and an open check book...