Storage Boards Cleanup
To make it easier to find information about HPE Storage products and solutions, we are doing spring cleaning. This includes consolidation of some older boards, and a simpler structure that more accurately reflects how people use HPE Storage.
Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

Balancing on 12H

SOLVED
Go to solution
Preethi
Occasional Advisor

Balancing on 12H

We have a 12H Disk Array connected to a PA_RISC N Class server (Hardware 9000/800/N4000-36) with Unix 11.0 installed. We have luns created on this array, mounted on the unix servers and Oracle is installed on it. From the recent past, the array is entering a "Balancing" state freq and during this period, applications running on the unix servers are unable to access oracle installed on these LUNs. Using the command "arraydsp -a" we see that all the components are in good state. Is there any specific reason why the array is balancing data so frequently?
11 REPLIES
Thayanidhi
Honored Contributor

Re: Balancing on 12H

Hi,
The balancing usualy after replacing a failed disk or if the array is new. Frequent balancing? How often it balances?
Check th disk statusus.. arraydsp -d and check all the disks are included.
You can also check the event logs using arraylog for suspecious failures. see man page of arraylog.

"Unable to access" what about ioscan? Is all the LUNs are claimed?

Regds
TT
Attitude (not aptitude) determines altitude.
Preethi
Occasional Advisor

Re: Balancing on 12H

Thanks TT for analysing this issue. It balances at least 1 time a day and we loose connection with our database everytime which is highly damaging for our business. Arraydsp -D shows all disks as "Disk State=INCLUDED" and arraydsp -a shows all components are in GOOD condition.
We ran arraylog -d command for all A1 - B6 disks and saw that under "Read Error Counter Page" -> "Corrected read errors without delay = 9540055" and "Total Corrected read errors = 9540055" and "Total times read ECC used = 9540055". I see similar error meesages across all A1 - B6 disks.

When I checked onthe configuration, I see the below listing where "Read Cache" is disabled. Are these read error related to the disabling of Read cache which is inturn causing the Array to balance?


Array SCSI configuration:
Controller X SCSI Address = 5
Controller Y SCSI Address = 5
Write Cache = ENABLED
Read Cache = DISABLED
SCSI Parity Checking = ENABLED
SDTR = ENABLED
WDTR = ENABLED
Terminator Power = ENABLED
Unit Attention = ENABLED
Disable Remote Reset = ENABLED
Secondary Controller Offline = DISABLED
Very Early Busy = DISABLED
Queue Full Threshold = 1952
Maximum Queue Full Threshold = 1952
Simplified Resiliency Setting = Unknown
Single Controller Warning = ENABLED
Lock Write Cache On = TRUE
Disable NVRAM on WCE False = FALSE
Disable NVRAM with One Ctrlr = TRUE
Disable NVRAM on UPS absent = FALSE
Force Unit Access Response = 2
Disable Read Hits = FALSE
Resiliency Threshold = 4
Preethi
Occasional Advisor

Re: Balancing on 12H

Also just wanted to let u know that all the disks were claimed in the ioscan output.
Mark Grossman
Regular Advisor

Re: Balancing on 12H

Hi,
according to the manual, "balancing" state means the array is re-ditributing data among disk modules for better performance -
A failed disk would "rebuild" not balance. However, you are obviously seeing worse performance during balancing.
How full is this array, and how much available space is there? Not disk space that was used and files deleted, but space never allocated before to any luns?
The 12H likes lots of unused free space.
It will move Raid 1/0 data to Raid 5 space when it runs out. We had very very bad performance when our 12H's were near capacity.
Mark

Mark
Preethi
Occasional Advisor

Re: Balancing on 12H

Hi Mark,

I got this output from "armdsp -a" command -
--- Disk space usage --------------------
Total physical = 207415 MB *
Allocated to LUNs = 129024 MB *
Used as Active Hot spare = 17366 MB *
Used by non-included disks = 0 MB *
Used for Redundancy = 32237 MB *
Unallocated (avail for LUNs) = 28788 MB *
-----------------------------------------

If I remember right, shouldn't firmware operations like Balancing, Optimizing .. be run as background operations without affecting the IO or accessibility to LUNS? There was no mechanism available to stop all these activities in a VA .. can we stop them manually in 12H?

We are currently using Firmware revision HP56. Is there any other later release that corrects this issue?

Thank you so much for analysing our problem.
Mark Grossman
Regular Advisor

Re: Balancing on 12H

just for kicks try arraydsp -r - this should give performance recommenation from the 12H.
All it ever told me was to add more disks.

Also, what is your scsi cable config - we went from two to four cables at one point, then to 8 for failover. Wondering if a scsi connection is dead. Do you see anything is /var/adm/syslog about scsi lbolt or scsi errors during these times.

Mark
Mark Grossman
Regular Advisor
Solution

Re: Balancing on 12H

just to add - I really think the answer is that you are filling up the 12H and will either need to add more new disks, or delete the database, reorg it, and lay it all back out clean on your 12H.

12H's like 40-50% unallocated disk space for best performance. See this link too:
http://forums2.itrc.hp.com/service/forums/questionanswer.do?threadId=63401&admit=-1335382922+1121798533499+28353475

Mark
Preethi
Occasional Advisor

Re: Balancing on 12H

Thank you so much for the pointer Mark. That thread pretty much answers all my queries. Thanks once again.
A. Clay Stephenson
Acclaimed Contributor

Re: Balancing on 12H

I suspect that your problems have absolutely nothing to do with "Balancing". When an AutoRAID is balancing, it is moving the most recently used data to RAID 1/0 and moving the less used data to RAID 5 BUT this should be absolutely invisible to the host computer and i/o should appear normal. You need to check the syslog for lbolts and i/o timeouts. The most comon error is not using pvchange to increase the i/o timeout on an array LUN from the default 30 seconds to something like 120-150 seconds. This applies to essentially all disk arrays not just the old 12H.
If it ain't broke, I can fix that.
Preethi
Occasional Advisor

Re: Balancing on 12H

We did have lbolt issues and we changed the Physical volume char for IO timeout to 90 secs. We do not have those issue any more. The only SCSI error message I see is

vmunix: SCSI: Attempt to access partially open device -- dev: bc002000
Mark Grossman
Regular Advisor

Re: Balancing on 12H

A. Clay is right, I had mentioned scsi lbolts as well - just wanted to let you know we had our timeout up to 180 without any problems if you should need to go higher.