Dramatic decrease in RAID 0/1 space

Vassily Gorbounov
Occasional Advisor

Hi, all
I have interesting problem that I can't explain so far (so as local HP support).

The point is that I have AutoRAID 12H with 8x9GB disks and Active Hot Spare enabled.

Arraydsp $ID output is as follows:

-- Disk space usage --------------------
Total physical = 69465 MB *
Allocated to LUNs = 46000 MB *
Used as Active Hot spare = 8683 MB *
Used by non-included disks = 0 MB *
Used for Redundancy = 13264 MB *
Unallocated (avail for LUNs) = 1518 MB *

As I understand that should give at least (8683+1518)MB as RAID 0/1 space (not taking in account the space not allocated for lvols inside LUNs).

But the output of arraydsp -v $ID give me the following:

Raid 0/1 blocks = 14848
Raid 0/1 block length = 512
Raid 0/1 capacity = 7 MB *

And of course this leads to WriteWorkingSet metric rising skyhigh with enormous number of RAID1 and RAID5 back and forth relocations.

My question is how could it happen and should I do to correct the situation?

Thanks in advance.

Maarten van Maanen
Regular Advisor

Re: Dramatic decrease in RAID 0/1 space

I don't quite understand what your problem is. With hot-spare on and with 8x9GB disks you have a maximum of 47Gb at your disposal of which you have allocated 46GB.
The amount of hot-spare is fixed and is the cap. of your largest disk (8683MB in your case). The amount of redundancy is also determined by the 12H itself and is non user-definable. The calculation then goes something like this:
Total capacity 69464
Hotspare 8683
Remains 60781
Redunancy 13264
Usable 47517
Allocated 46000
Unallocated 1517
These numbers are all more or less fixed; they depend only on what you have allocated, included and whether hotspare is on or off.
The Raid 0/1 capacity cannot be compared with this information and will change accordig to your needs. Maybe you have just changed your 12H-configuration and restored all the data. In that case the 12H will gradually change the Raid 0/1 capacity to what it will need. For example, a couple of weeks ago I completely reconfigured our 12H, allocated ALL available space. Just after this, the Raid 0/1 space was only a few hunderd MB, now it is 22316Mb and it's still growing gradually.
I would suggest you monitor the Raid 0/1 capacity with a script writing the output of arraydsp -v to a file so you can see how it develops.

Vassily Gorbounov
Occasional Advisor

Re: Dramatic decrease in RAID 0/1 space

Thank you Maarten for your reply.
But I would like to clarify the problem.
I gave the first numbers just to show that there ARE possibilities to create RAID 0/1 space: there is Active Hot Spare disk which can be used for making RAID 0/1 and there is unallocated space for LUNs.
So as I understand if needed an array SHOULD construct RAID 0/1 space in these limits ( not more than 10 GB).
BUT for the last week (or two) the size of RAID 0/1 varies from 0 to 200 MB while Write Working Set stays the same as usual - approximately 4 GB.
This causes WriteWorkingSet Ratio to be enormous - from 18 to 40000. And there is no tendency to correct this situation. That is RAID 0/1 size doesn't grows - it's just oscillating.
Recently we've thied to upgrade controller firmware to HP60 (from HP32) but it didn't help.

Below I've attached the output from arraydsp -m command. May be it will be useful to determine the cause of trouble.
Maarten van Maanen
Regular Advisor

Re: Dramatic decrease in RAID 0/1 space

Thanks for your extra input. My next question what your 12H is being used for. Also, just how much space from you 46Gb is already occupied and in how many LUN's it is divided. Otherwise than that, I cannot but wonder if you don't have some hardware or 12H-software problem. Raid 0/1 should not be shrinking or growing all the time. Have you been changing settings or have you been performing other system-wide changes.
I would also most certainly take a log at your disk and controller error-log. Use the logprint-command to do this.
The 12H will always try to match the Raid 0/1 space to the WWS as much as possible.
Vassily Gorbounov
Occasional Advisor

Re: Dramatic decrease in RAID 0/1 space

Thank you, Maarten for your participation in solving my problem.
Now I could give you some portion of information concerning AutoRAID performance.
1. Primarily our AutoRAID is used as a file storage for a flat database (several dozens of 300 MB files).
2. There were no rebuilds or reconfigurations over past few months.
3. There were no disk problems (I mean disk errors that could lead to disk array rebuild). Actually there are approx. a dozen errors on one disk that appear now and then but error decription says "Recovered by disk drive", so I think that it's a reason for replacing a drive, but has nothing to do with my problem.
4. Herein a attached an Excel file which carries a chart with WWS size and RAID size for the past 6 months. It shows that RAID1 size decreased constantly since 17 of May by approx. 10 GB a month - so we' ve come to almost a zero in the end of August.

Thank you for your assistance once more and hope it will help you to find a cause of my problem.

Vassily Gorbounov
Occasional Advisor

Re: Dramatic decrease in RAID 0/1 space

Another addition, concerning size of the space used:
From 46 GB allocated for LUNs we use approx. 20GB.
I mean that total size of all files in our filesystems that lay on the AutoRAID is 20 GB.

But I've read that if some files were deleted , then AutoRAID still thinks of that space as of used, so the deletion of files gives us nothing.

Maarten van Maanen
Regular Advisor

Re: Dramatic decrease in RAID 0/1 space

Looking at the output you supplied I see you actually see the size of the WWS. We are using a 12H with HP54 firmare on both controllers. This firmware only gives me the WWS size in relation to RAID 0/1 space, not its actual size. You mentioned you upgraded to HP60. I haven't done that yet so I have no experience with that. Are both you controllers working with the same firmware version? (arraydsp -c ).
Concerning the errors you get on one of your disks the following. It might well be that this does influence your 12H-performance. Since you have hot-spare on and are also not particularly short on space I would take the disk out for a day or so and see if that changes things. However, make sure you have your Auto-Rebuild on. It will slow the performance of the 12H down but if you do it in the evening it will do the rebuild during the night (could take a couple of hours max.)
Vassily Gorbounov
Occasional Advisor

Re: Dramatic decrease in RAID 0/1 space

Maarten, concerning WWS size information I can say that using "arraydsp -m" you can get only WWS Ratio (despite of firmware version), but using "logprint -t perf" you can obtain a lot of info, that also includes the following information:

Raid 1 capacity = 21185920
Write cache size = 41418752
Working set size = 12107392

Actually I don't know what is write cache of 41 GB here, but WWS size and RAID1 can easily be extracted.

Speaking of firmware version, previously we had HP32 version and now it was upgraded to HP60 (with applying appropriate patch to hp-UX). But nothing changes since firmware upgrade. Both of our controllers running HP60 now.

Today I've tried to pull-off one of the drives (which gave us errors) and after some time plugged it back. RAID1 size before plugging was constantly between 0 and 10 MB.
It's strange but since that change RAID1 size start to rise gradually and now it over 105 MB.
But the problem remains because I dont know what caused such a low value of RAID1 size. And it could happen in the future. So far I suspect some controller hardware fail, although error logs for controllers are empty, so it could be a bug.

Maarten van Maanen
Regular Advisor

Re: Dramatic decrease in RAID 0/1 space

The write cache size is the cache on your controller.
I would by know suggest you contact your HP dealer with your problem and give him the data you collected. It might after all be the disk that gave you errors before. If you do find the cause, I would like to know about it.

Re: Dramatic decrease in RAID 0/1 space

If I can interject something here also. If your database application issues FUA's it is possible the array is cooperating (skipping cache and may be influencing your metrics). The latest ARMserver patches PHCO_21309 for 10.20 and PHCO_21435 for 11.0 (and probably superceded now) include an enhancement to allow the array to ignore Forced Unit Access commands. When patched issue command:
arraymgr -J Normal (arrayid)
then command:
arraydsp -s
to see that Simplified Resiliency Setting=Normal and Forced Unit Access Response=0
...the array performs much better when it is allowed to use its cache.