Disk Enclosures
1748213 Members
2926 Online
108759 Solutions
New Discussion юеВ

Re: VA7100 controller fault

 
SOLVED
Go to solution
a_79
Advisor

VA7100 controller fault

Hello , I am HP CE.

I maintain 2-L node cluster with one VA7100.

I had replaced VA7100 controller 2, but after one month, the controller 2 is faulty again.

I had collect info.
Please some specialist hep me, see attachment.

phynomina:

the all of indicators on controller 2 had been turned off except the battery indicator is flashing.



18 REPLIES 18
Srinivasa_6
Advisor

Re: VA7100 controller fault

There are several ab.ffff.134 Abterm events. ab.ffff abterms usually indicate a controller hardware failure. Try replacing the faulty controller.
Sameer_Nirmal
Honored Contributor
Solution

Re: VA7100 controller fault

Hi,

The armlog shows the controller C1 was reset and rebooted maybe because of C2 failure.

I would take diagnostic status information using
# armdiag -I -if array_status

You can run armdiag inquiry against the controller to know if it is responding. Running such command require to contact HP Response center and follow their instructions.

From host side, it is useful to run STM logtool and check the report.

It is worth to check what went wrong with the earlier controller you replaced for assessment. Maybe you can get feedback from the repair/diagnostic center.

It is interesting to know the controller 2 failure is occuring on account of some hardware failure or "rejection" on account of mismatch between the two controllers.

Lastly, it is recommended to keep the firmware level of the array to latest which is now HP22. You can consider upgrading the firmware in due course from the existing HP19.
Mohanasundaram_1
Honored Contributor

Re: VA7100 controller fault

Hi,

abterm indicates that you have serious problem in the array. You have to involve HP support immediately to prevent any data loss.

With regards,
Mohan.
Attitude, Not aptitude, determines your altitude
a_79
Advisor

Re: VA7100 controller fault

hello , great thanks.

I have two questions:

1) power module is OK ? the voltage of output is trusting ?

2) midplane is OK ?
Srinivasa_6
Advisor

Re: VA7100 controller fault

I can see events like :

I2C_DRIVER_FAILURE and VSC_7130_FAILED_EH - These components reside over the midplane. I2C is used for some amount of NVRAM mirroring and also for the communication between the 2 controllers.

So, I would say its safe to change the midplane along with the faulty controller.
Sameer_Nirmal
Honored Contributor

Re: VA7100 controller fault

Hi,

Looking closely at armlog, it is quite clear that on two occessions, the reason for C2 failure is the h/w failure of its VSC7130 chip. The chip failure has been indicated by I2C warning and subsequent
reset of C1. The chip resides on the controller card and monitored on the I2C bus. I believe the I2C bus is looped across the array monitoring various components and has main control circuit on the mid-plane.

As mentioned in the log , there are 2 VSC chips. I guess in case of VA7100, the host port VSC7130 would be in picture. Thus following error should belong to host port VSC7130.
7130 Update Error: 0x02, 0x01
7130 Update Error: 0x02, 0x02
You can send these errors to storage engineering to confirm on the same. "armdiag" output would be useful too in most cases.

As you asked about power module, its status could be assessed ( TS Guide ) using the 2 LEDs on it.
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=33737&prodTypeId=12169&prodSeriesId=33737&objectID=lpg60204

If Power module looks OK, then I would suspect something is wrong at host port. It maybe a faulty/mis-behaving FC transceiver. The GBIC is first suspect followed by the HBA at the host side.
Change the GBIC first as it most probable cause and easily replacable. As I said before, you can run STM logtool and diagnostics on FC HBA to track any errors at host side.
a_79
Advisor

Re: VA7100 controller fault

Hello sirs:

I have replaced the bad controller 2, two power supplies and midplane.
But after two days, the controller 2 break down again, and System Fault lamp lighted.

It is horrible!!

I also collected some new logs in the attachments.

Another question: command ioscan -fnCdisk can not find all paths to LUNs in VA7100,
When I replced the controller, I have to reboot HP9000 to force the FC HBAs to find all paths of LUNs.
for example :

ioscan -fnCdisk | more
Class I H/W Path Driver S/W State H/W Type Description
==========================================================================
disk 0 0/0/1/1.2.0 sdisk CLAIMED DEVICE SEAGATE ST318404LC
/dev/dsk/c1t2d0 /dev/rdsk/c1t2d0
disk 1 0/0/2/1.2.0 sdisk CLAIMED DEVICE HP DVD-ROM 30
5
/dev/dsk/c3t2d0 /dev/rdsk/c3t2d0
/dev/dsk/disk_query
disk 7 0/4/0/0.8.0.1.0.0.0 sdisk NO_HW DEVICE HP A618
8A
/dev/dsk/c10t0d0 /dev/rdsk/c10t0d0
disk 8 0/4/0/0.8.0.1.0.0.1 sdisk NO_HW DEVICE HP A618
8A
/dev/dsk/c10t0d1 /dev/rdsk/c10t0d1
disk 13 0/4/0/0.8.0.1.0.0.2 sdisk NO_HW DEVICE HP A618
8A
/dev/dsk/c10t0d2 /dev/rdsk/c10t0d2



someone told me to use fcmsutil command, so I do not have to reboot HP9000 machine.
how to use fcmsutil?
Torsten.
Acclaimed Contributor

Re: VA7100 controller fault

First thing I would do is to upgrade from HP19 to HP22 - commandview upgrade included. In any case of doubt call HP (also called CE-assist for you) ;-)

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Mohanasundaram_1
Honored Contributor

Re: VA7100 controller fault

Hi,

I stand by what I said earlier. If it is abterm, you are bound to have serious issues in the array. Please refer to HP support immediately to prevent any data loss.

abterm=abnormal termination of the controller. This indicates that the controller was unable to determine the course of action for a particular event occurence. This is a serious event which needs immediate attention.

With regards,
Mohan.
Attitude, Not aptitude, determines your altitude