Operating System - HP-UX
1832592 Members
2465 Online
110043 Solutions
New Discussion

Re: SureStore VA 7400 - bizarre behaviour after N class reboot

 
SOLVED
Go to solution
Jim Griffiths
Advisor

SureStore VA 7400 - bizarre behaviour after N class reboot

Hi,

We have a VA7400 based SAN with 4 N-class and a number of L and A class attached via a brocade network. We are also running Service Guard across the 4 N class. We were unable to shutdown a package because it complained certain file systems were in use. Time pressing we did a forced shutdown via shutdown -h -y0. The box was left down overnight. On one of the other N's, I/O utilisation which glance normally reports at 30-50%, went to 100%, the I/O queue length rocketed to the 1000's and heavy I/O apps started to grind to a halt; during this time Physical I/O Rate had halved. At a loss as to know what was going on this activity continued until (bizzarely) the downed box was restarted the following morning. We used hp command view sdm to look at 7400 performance stats and sure enough I/O throughput falls off a cliff when the box went down and is okay after restart the following morning. There were no errors anywhere.

I can't find anything similar in the forum postings. We are a number of firmware revisions behind so we will upgrade. The problem I have got is that we're running business critical apps and have very limited scope for any testing, so very interested if anyone has experienced anything similar?

As an aside the 7400 is supposed to be a high availability solution, and I'm very concerned something on one client could affect another!

Any help comments much appreciated,

Thanks,

Jim
If you need a miracle, play for it (BRIDGE)
6 REPLIES 6
Eugeny Brychkov
Honored Contributor

Re: SureStore VA 7400 - bizarre behaviour after N class reboot

Hi Jim,
attach 'armdsp -a' and 'supportshow' (from switches) outputs to the next reply. Also please analyze 'armlog -e' output. Check that HBA drivers are the latest revision.
I did not understand: server sees disk utilization as 100% or VA reports 100% utilization? Can we localize problem? To server? To switch's port? To VA's controller?
Eugeny
Jim Griffiths
Advisor

Re: SureStore VA 7400 - bizarre behaviour after N class reboot

Eugency,

Please see attached for armdsp -a VA7400 output. Still looking at armlog output, where is the BHA driver revision info?:

Controller M/C1 Firmware ------ = 38370HP14P1221010834 ???????

Also abit of clarification. It was glance, on the N class server, that immediately started to report 100% I/O utilisation when the OTHER N was shutdown. We didn't initially connect the two events, we initially thought it was a problem on the server; it was only when this OTHER box was restarted the following morning and the I/O problem went away that we retrospectively looked at the stats in the VA itself. The "Total I/O throughput metric" showed a dramatic drop when the N class was shutdown, and a corresponding back to normal when it was restarted.

Thanks,

Jim
If you need a miracle, play for it (BRIDGE)
Eugeny Brychkov
Honored Contributor

Re: SureStore VA 7400 - bizarre behaviour after N class reboot

First of all, there was a sar bug showing incorrect load percentage. Try searching forum with subject 'bogus sar output' (patch needed);
Second, I see that attached file has been cut from the beginning (only second half of output is available). Rebuild priority is high. I may suspect you could have rebuild in progress and as soon as it has high priority it will slow down VA responce and full buffers with requests and thus raise utilization;
Third, you have HP14 firmware which should be updated to HP18.
If you will attach whole armdsp output I may tell you more
Eugeny
Jim Griffiths
Advisor

Re: SureStore VA 7400 - bizarre behaviour after N class reboot

Eugeny,

Yes apologies. Please see attached what should be a complete armdsp -a VA7400 output.

Interested in what you were saying about high rebuild priority, what happens if I reduce this?

Not sure why you mention sar output? It was glance reporting 100% disk I/O; this was definitely due to poor VA7400 response because as mentioned Physical I/O Rate had halved at the same time as I/O queue length normally 2/3/4 going into the 1000's!

If a rebuild was in progress suspect it would never have completed without restarting the downed server. Think the key thing here is why should restarting the shutdown server allow the VA7400 to suddenly be okay???

Thanks very much for your help,

Regards,

Jim
If you need a miracle, play for it (BRIDGE)
Eugeny Brychkov
Honored Contributor

Re: SureStore VA 7400 - bizarre behaviour after N class reboot

Everything looks ok except HP14 controller fimrware. Call HP for upgrade. Do not forget to make full backup before proceeding;
High rebuild priority will take over normal host activities so responce will be slower; low rebuild priority may suspend rebuild until host's activity will drop
Eugeny
Eugeny Brychkov
Honored Contributor
Solution

Re: SureStore VA 7400 - bizarre behaviour after N class reboot

You can see what was happening with VA in certain time analyzing the following outputs:
armlog -e VA7400 (controller logs)
logprn -t All -v -a VA7400 (CommandView logs)
Eugeny