Disk Enclosures
cancel
Showing results for 
Search instead for 
Did you mean: 

VA7100 controller fault

SOLVED
Go to solution
a_79
Advisor

VA7100 controller fault

Hello , I am HP CE.

I maintain 2-L node cluster with one VA7100.

I had replaced VA7100 controller 2, but after one month, the controller 2 is faulty again.

I had collect info.
Please some specialist hep me, see attachment.

phynomina:

the all of indicators on controller 2 had been turned off except the battery indicator is flashing.



18 REPLIES
Srinivasa_6
Advisor

Re: VA7100 controller fault

There are several ab.ffff.134 Abterm events. ab.ffff abterms usually indicate a controller hardware failure. Try replacing the faulty controller.
Sameer_Nirmal
Honored Contributor
Solution

Re: VA7100 controller fault

Hi,

The armlog shows the controller C1 was reset and rebooted maybe because of C2 failure.

I would take diagnostic status information using
# armdiag -I -if array_status

You can run armdiag inquiry against the controller to know if it is responding. Running such command require to contact HP Response center and follow their instructions.

From host side, it is useful to run STM logtool and check the report.

It is worth to check what went wrong with the earlier controller you replaced for assessment. Maybe you can get feedback from the repair/diagnostic center.

It is interesting to know the controller 2 failure is occuring on account of some hardware failure or "rejection" on account of mismatch between the two controllers.

Lastly, it is recommended to keep the firmware level of the array to latest which is now HP22. You can consider upgrading the firmware in due course from the existing HP19.
Mohanasundaram_1
Honored Contributor

Re: VA7100 controller fault

Hi,

abterm indicates that you have serious problem in the array. You have to involve HP support immediately to prevent any data loss.

With regards,
Mohan.
Attitude, Not aptitude, determines your altitude
a_79
Advisor

Re: VA7100 controller fault

hello , great thanks.

I have two questions:

1) power module is OK ? the voltage of output is trusting ?

2) midplane is OK ?
Srinivasa_6
Advisor

Re: VA7100 controller fault

I can see events like :

I2C_DRIVER_FAILURE and VSC_7130_FAILED_EH - These components reside over the midplane. I2C is used for some amount of NVRAM mirroring and also for the communication between the 2 controllers.

So, I would say its safe to change the midplane along with the faulty controller.
Sameer_Nirmal
Honored Contributor

Re: VA7100 controller fault

Hi,

Looking closely at armlog, it is quite clear that on two occessions, the reason for C2 failure is the h/w failure of its VSC7130 chip. The chip failure has been indicated by I2C warning and subsequent
reset of C1. The chip resides on the controller card and monitored on the I2C bus. I believe the I2C bus is looped across the array monitoring various components and has main control circuit on the mid-plane.

As mentioned in the log , there are 2 VSC chips. I guess in case of VA7100, the host port VSC7130 would be in picture. Thus following error should belong to host port VSC7130.
7130 Update Error: 0x02, 0x01
7130 Update Error: 0x02, 0x02
You can send these errors to storage engineering to confirm on the same. "armdiag" output would be useful too in most cases.

As you asked about power module, its status could be assessed ( TS Guide ) using the 2 LEDs on it.
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=33737&prodTypeId=12169&prodSeriesId=33737&objectID=lpg60204

If Power module looks OK, then I would suspect something is wrong at host port. It maybe a faulty/mis-behaving FC transceiver. The GBIC is first suspect followed by the HBA at the host side.
Change the GBIC first as it most probable cause and easily replacable. As I said before, you can run STM logtool and diagnostics on FC HBA to track any errors at host side.
a_79
Advisor

Re: VA7100 controller fault

Hello sirs:

I have replaced the bad controller 2, two power supplies and midplane.
But after two days, the controller 2 break down again, and System Fault lamp lighted.

It is horrible!!

I also collected some new logs in the attachments.

Another question: command ioscan -fnCdisk can not find all paths to LUNs in VA7100,
When I replced the controller, I have to reboot HP9000 to force the FC HBAs to find all paths of LUNs.
for example :

ioscan -fnCdisk | more
Class I H/W Path Driver S/W State H/W Type Description
==========================================================================
disk 0 0/0/1/1.2.0 sdisk CLAIMED DEVICE SEAGATE ST318404LC
/dev/dsk/c1t2d0 /dev/rdsk/c1t2d0
disk 1 0/0/2/1.2.0 sdisk CLAIMED DEVICE HP DVD-ROM 30
5
/dev/dsk/c3t2d0 /dev/rdsk/c3t2d0
/dev/dsk/disk_query
disk 7 0/4/0/0.8.0.1.0.0.0 sdisk NO_HW DEVICE HP A618
8A
/dev/dsk/c10t0d0 /dev/rdsk/c10t0d0
disk 8 0/4/0/0.8.0.1.0.0.1 sdisk NO_HW DEVICE HP A618
8A
/dev/dsk/c10t0d1 /dev/rdsk/c10t0d1
disk 13 0/4/0/0.8.0.1.0.0.2 sdisk NO_HW DEVICE HP A618
8A
/dev/dsk/c10t0d2 /dev/rdsk/c10t0d2



someone told me to use fcmsutil command, so I do not have to reboot HP9000 machine.
how to use fcmsutil?
Torsten.
Acclaimed Contributor

Re: VA7100 controller fault

First thing I would do is to upgrade from HP19 to HP22 - commandview upgrade included. In any case of doubt call HP (also called CE-assist for you) ;-)

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Mohanasundaram_1
Honored Contributor

Re: VA7100 controller fault

Hi,

I stand by what I said earlier. If it is abterm, you are bound to have serious issues in the array. Please refer to HP support immediately to prevent any data loss.

abterm=abnormal termination of the controller. This indicates that the controller was unable to determine the course of action for a particular event occurence. This is a serious event which needs immediate attention.

With regards,
Mohan.
Attitude, Not aptitude, determines your altitude
a_79
Advisor

Re: VA7100 controller fault

hello sir:

I do really want to understand the truth that
although we had replaced almost all parts, the breakdown also occurred aagain and again.

I am HP CE, I search kmine and obtain no solution.

My attachments include armdsp and armlog and stm tool info and HP L2000configuration,
I really want your help to analyse the info.
and give us more and more advices.

thanks for your help.

your sincerely
johnson
Mohanasundaram_1
Honored Contributor

Re: VA7100 controller fault

Hi Johnson,

I do understand that there are no major hits for your problem in kmine. You need to refer this problem to the backend support.

WTEC needs to be involved for this problem. Since you say you are an HP CE, it should not be a problem for you to elevate this case to the next level.

With regards,
Mohan.

P.S. I was a HP Response center Engineer.
Attitude, Not aptitude, determines your altitude
Sameer_Nirmal
Honored Contributor

Re: VA7100 controller fault

Johnson,

Did you replace the GBIC module?
Can you post "armdiag" output?

Refer the attached document for using
"fcmsutil"

I haven't check the posted logs, but if the VSC chip is a cause of failure this time as well, then I would replace the GBIC,FC cabel and the HBA at host side.
Nguyen Anh Tien
Honored Contributor

Re: VA7100 controller fault

Hi A
Replacing Controller will use another port WWN so that host will not recognize old WWN of defective controller.
SOLUTION: Reboot server if possible or use fcmsutil
#fcmsutil replace_dsk
HTH
tell me if you are still in trouble
tienna
HP is simple
a_79
Advisor

Re: VA7100 controller fault

Hello ,
I checked the subsystem setting, and found that VA7100 's RAID mode is RAID 0+1, with no HotSpare; Volume Setting is 0 but HP19 firmware needs 2 rather than 0?


customer had configured too many alternate Links, for example, /dev/rdsk/c6t0d0 /dev/rdsk/c7t0d0,
/dev/rdsk/c16t0d0 /dev/rdsk/c17t0d0 are all the same physical LUN is the VG configuration, HP-UX only support 3 Alternate Links ?


Command View SDM need to be upgrade ?
Nguyen Anh Tien
Honored Contributor

Re: VA7100 controller fault

No!
Alternate link is cause by FC cable
VA7100 has 02 cable to SW. and you have 02 FC HBA to SW so that "ioscan" report 04 pv (01 active+03 alternate path)
pls assign point to solutions
thanks
HP is simple
Torsten.
Acclaimed Contributor

Re: VA7100 controller fault

The alternate pathes are OK.

It's not a bad idea to upgrade the firmware and the commandview version, but it's not needed.

It gives some more functionality like LUN formating and better troubleshooting capabilities - but it's OK to stay at HP19.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
a_79
Advisor

Re: VA7100 controller fault

Hello sir:
If alternative Links are OK and HP19 is OK, then I think HPUX 11i need patch to upgrade A5158A driver?

the controller2 in VA7100 always breakdown in last three months, After I replaced the 2 PowerSupplies and midPlane, and then Controller 2 break down again , It is horrible.

So I now challenge every factors whitch may be relevant to Controller 2.

I had put on logs in uper dialogue, please anyone help me.

your sincerely

luo johnson
a_79
Advisor

Re: VA7100 controller fault

And How about Volume Setting ?



Vendor ID:______________________________HP
Product ID:_____________________________A6188A
Array World Wide Name:__________________50060b000014e557
Array Serial Number:____________________00SG224J0091
Alias:__________________________________va7100
Software Revision:______________________01.07.00 - 0256 - 030925
Command execution timestamp:____________Apr 23, 2006 3:25:59 AM
------------------------------------------------------------

ARRAY INFORMATION

Array Status:_________________________Warning
Firmware Revision:____________________38370HP19P1031031142
Product Revision:_____________________HP19
Last Event Log Entry for Page 1:______343807

ENCLOSURES

Enclosure at M
Enclosure ID__________________________0
Enclosure Status______________________Good
Enclosure Type________________________HP StorageWorks Virtual Array 7100
Node WWN______________________________50060b000014e557

FRU HW COMPONENT IDENTIFICATION ID STATUS
===========================================================================
M Enclosure 00SG224J0091 Good
M/P1 Power Supply 94020HE01470 Good
M/P2 Power Supply 94020HE01471 Good
M/MP1 MidPlane 000601310298 Good
M/C1 Controller 00PR04111697 Good
M/C1.H1 Host Port Good
M/C1.J1 BackEnd Port Good
M/C1.G1 Host GBIC A28ALMJ Good
M/C1.B1 Battery 30597:MOLTECHPS:NI2040:2002/4/15 Good
M/C1.PM1 Processor HP:A6188A:HP19 Good
M/C1.M1 DIMM 512 Good
M/D1 Disk 3FP0NAKB Good
M/D2 Disk 3FP0NF6A Good
M/D3 Disk 3FP0NHAD Good
M/D4 Disk 3FP0NGXP Good

CONTROLLERS

Controller At M/C1:
Status:_______________________________Good
Serial Number:________________________00PR04111697
Vendor ID:____________________________HP
Product ID:___________________________A6188A
Product Revision:_____________________HP19
Firmware Revision:____________________38370HP19P1031031142
Manufacturing Product Code:___________IJMTU00013
Controller Type:______________________HP StorageWorks Virtual Array 7100
Battery Charger Firmware Revision:____4.3
Front Port At M/C1.H1:
Status:_____________________________Good
Port Instance:______________________0
Hard Address:_______________________1
Link State:_________________________Link Up
Node WWN:___________________________50060b000014e557
Port WWN:___________________________50060b0000146d5e
Topology:___________________________Private Loop
Data Rate:__________________________1 GBit/sec
Port ID:____________________________1
Device Host Name:___________________scm1
Hardware Path:______________________Unknown
Device Path:________________________Unknown
Host GBIC at M/C1.G1:
Status:_____________________________Good
Identification:_____________________A28ALMJ
Battery at M/C1.B1:
Status:_____________________________Good
Identification:_____________________30597:MOLTECHPS:NI2040:2002/4/15
Manufacturer Name:__________________MOLTECHPS
Device Name:________________________NI2040
Manufacturer Date:__________________April 15, 2002
Remaining Capacity:_________________5108 mAh
Remaining Capacity:_________________85 %
Voltage:____________________________12192 mVolts
Discharge Cycles:___________________2
Processor at M/C1.PM1:
Status:_____________________________Good
Identification:_____________________HP:A6188A:HP19
DIMM at M/C1.M1:
Status:_____________________________Good
Identification:_____________________512
Capacity:___________________________512 MB

PORTS

Settings for port M/C1.H1:
Port ID:______________________________1
Behavior:_____________________________HPUX
Topology:_____________________________Private Loop
Queue Full Threshold:_________________750
Data Rate:____________________________1 GBit/sec

DISKS

Disk at M/D1:
Status:_______________________________Good
Disk State:___________________________Included
Vendor ID:____________________________HP 36.4G
Product ID:___________________________ST336605FC
Product Revision:_____________________HP09
Data Capacity:________________________33.378 GB (70000000 blocks)
Block Length:_________________________520 bytes
Address:______________________________111
Node WWN:_____________________________20000004cf27fe9f
Initialize State:_____________________Ready
Redundancy Group:_____________________1
Volume Set Serial Number:_____________0000486300000005
Serial Number:________________________3FP0NAKB
Firmware Revision:____________________HP09
Recovery Maps are on this disk.

Disk at M/D2:
Status:_______________________________Good
Disk State:___________________________Included
Vendor ID:____________________________HP 36.4G
Product ID:___________________________ST336605FC
Product Revision:_____________________HP09
Data Capacity:________________________33.378 GB (70000000 blocks)
Block Length:_________________________520 bytes
Address:______________________________112
Node WWN:_____________________________20000004cf27fee3
Initialize State:_____________________Ready
Redundancy Group:_____________________1
Volume Set Serial Number:_____________0000486300000005
Serial Number:________________________3FP0NF6A
Firmware Revision:____________________HP09
Recovery Maps are on this disk.

Disk at M/D3:
Status:_______________________________Good
Disk State:___________________________Included
Vendor ID:____________________________HP 36.4G
Product ID:___________________________ST336605FC
Product Revision:_____________________HP09
Data Capacity:________________________33.378 GB (70000000 blocks)
Block Length:_________________________520 bytes
Address:______________________________113
Node WWN:_____________________________20000004cf2f12f7
Initialize State:_____________________Ready
Redundancy Group:_____________________1
Volume Set Serial Number:_____________0000486300000005
Serial Number:________________________3FP0NHAD
Firmware Revision:____________________HP09

Disk at M/D4:
Status:_______________________________Good
Disk State:___________________________Included
Vendor ID:____________________________HP 36.4G
Product ID:___________________________ST336605FC
Product Revision:_____________________HP09
Data Capacity:________________________33.378 GB (70000000 blocks)
Block Length:_________________________520 bytes
Address:______________________________114
Node WWN:_____________________________20000004cf2f1328
Initialize State:_____________________Ready
Redundancy Group:_____________________1
Volume Set Serial Number:_____________0000486300000005
Serial Number:________________________3FP0NGXP
Firmware Revision:____________________HP09

LUNS

LUN 1:
Redundancy Group:_____________________1
Active:_______________________________True
Data Capacity:________________________35.156 GB
WWN:__________________________________50060b000008ab0c0001000000000009
Number Of Business Copies:____________0

LUN 2:
Redundancy Group:_____________________1
Active:_______________________________True
Data Capacity:________________________28 GB
WWN:__________________________________50060b000008ab0c000200000000000c
Number Of Business Copies:____________0

CAPACITY Totals for Redundancy Group 1:
REGULAR LUNs:_________________________63.156 GB
BUSINESS COPIES:______________________0 bytes

CAPACITY USAGE

Total Disk Enclosures:________________1

Redundancy Group:_____________________1
Total Disks:________________________4
Total Physical Size:________________133.514 GB
Allocated to Regular LUNs:__________63.156 GB
Allocated as Business Copies:_______0 bytes
Used as Active Hot Spare:___________0 bytes
Used for Redundancy:________________70.19 GB
Unallocated (Available for LUNs):___172 MB

SUB-SYSTEM SETTINGS

RAID Level:___________________________HPAutoRAID
Auto Format Drive:____________________On
Hang Detection:_______________________On
Capacity Depletion Threshold:_________100%
Queue Full Threshold Maximum:_________4096
Enable Optimize Policy:_______________True
Enable Manual Override:_______________False
Manual Override Destination:__________False
Read Cache Disable:___________________False
Rebuild Priority:_____________________Low
Security Enabled:_____________________False
Shutdown Completion:__________________0
Subsystem Type ID:____________________0
Unit Attention:_______________________True
Volume Set Partition (VSpart):________False
Write Cache Enable:___________________True
Write Working Set Interval:___________8640
Enable Prefetch:______________________False
Disable Secondary Path Presentation:__False

RESILIENCY SETTINGS

Simplified Resiliency Setting:________Normal Performance (Default)
Enable Secure Mode:___________________True
Disable NVRAM on UPS Absent:__________False
Disable NVRAM on WCE False:___________False
Disable Read Hits:____________________False
Force Unit Access Response:___________1
Lock Write Cache On:__________________True
Performance Goal Configuration:_______Normal Performance
Resiliency Threshold:_________________4
Single Controller Warning:____________True

DISK SETTINGS

Auto Include:_________________________On
Auto Rebuild:_________________________On
Hot Spare:____________________________None
Max Drives per Loop Pair:_____________15
Max Drives per Subsystem:_____________15

ENCLOSURE SETTINGS

Max Enclosures per Loop Pair:_________0
Max Enclosures per Subsystem:_________0

LUN SETTINGS

LUN Creation Limit:___________________128
Maximum LUN Creation Limit:___________128
Migrating Write Destination:__________False

SCRUB SETTINGS

Scrub Restart Period:_________________7190 minutes
Scrub State:__________________________Idle, initialized

OPERATIONS IN PROGRESS

None

WARNINGS

WARNING: An internal FRU monitoring resource has become inoperative.

WARNING: The array has a single controller present.


BTW, Talk about fcmsutil usage in FC hub environment.

I use "fcmsutil /dev/td0 replace_dsk -l 1 " command to reconfig the A5158A HBA to acknowledge the new controllers in VA7100, if vA7100 controllers' LOOPID is 1 and 2, (1 FC HUB act as link of va7100 to HP9000 )
So I do not have to reboot the L2000 machine to find the FC links.
I am correct ???