HPE 9000 and HPE e3000 Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

Continous scsi reset in L 2000 with sc10

 
SOLVED
Go to solution
Binukuttan VM_1
Frequent Advisor

Continous scsi reset in L 2000 with sc10

Hi all,

My server is giving intermitent scsi reset and reboots. I ran STM and no problem is getting isoltaed.
gives like this ...
SCSI: First party detected bus hang -- lbolt: 82020, bus: 4

i am attaching the syslog..if anybody have any thought pls advice
regds

binu
7 REPLIES 7
Robert-Jan Goossens
Honored Contributor

Re: Continous scsi reset in L 2000 with sc10

Try to execute below command.

/opt/resmon/bin/resdata -R 89849863 -r /system/events/memory/8 -n 89849857 -a

Regards,
Robert-Jan
Ranjith_5
Honored Contributor

Re: Continous scsi reset in L 2000 with sc10

Hi Binukkutta,

The log file you have attached has the following suggestions.

Feb 23 13:34:12 rcftrl1 EMS [1371]: ------ EMS Event Notification ------ Value: "MAJORWARNING (3)" for Resource: "/system/events/memory/8" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 89849863 -r /system/events/memory/8 -n 89849857 -a


So execute the same command and then get the EMS event details . This event details Would help you in finding out exactly what is creating problem.

Regards,
Syam
Andrew Fong
Advisor

Re: Continous scsi reset in L 2000 with sc10

I just experienced the same problem this Tuesday. We had lost connection to all the disks on one SC10, but luckily we have all disks mirrored to another SC10 and hence the system kept working for the whole day until HP showed up during the maintenance window. The problem is the Bus Controller Card (BCC) is faulty. HP CE replaced it and everything is back to normal after. During reboot, it looks like vgsync got run and all the problem lv with Stale Extent returned back to Currrent.
How are you ?
mits
Respected Contributor

Re: Continous scsi reset in L 2000 with sc10

Hi,

It was good to hear HP resolved your SCSI problem. But I have a small concern with your system. As other guys suggested your system logged an EMS memory monitor event.

It is not a SCSI or disk error events and some memory error events. Here is the description of the /system/events/memory.

It may be the memory single bit error, but I would suggest you anyway check the error detail by using command, /opt/resmon/bin/resdata -R 89849863 -r /system/events/memory/8 -n 89849857 -a

*** MONITOR /system/events/memory:
System Memory Monitor

This resource monitors events for system memory. Event monitoring
requests are created using the Monitoring Request Manager. Monitoring
requests to detect changes in device status are created using the
Peripheral Status Monitor (psmmon(1m)) and Event Monitoring Service (EMS).

For more information see the monitor man page, (dm_memory(1m)).

I hope this helps to prevent future hardware problem.
Binukuttan VM_1
Frequent Advisor

Re: Continous scsi reset in L 2000 with sc10

Hi all,

So i have to suspect the server memory. The EMS monitoring system logs says it is a single bit memory error.

Please tell, when i ran STM, it didn't logged any errors and not showing any HPMC error ???

CAn i suspect any disks in SC10 or SC10 Backplane.
awaiting your valuable reply....

thanks in advace and regds
Binu
Ranjith_5
Honored Contributor
Solution

Re: Continous scsi reset in L 2000 with sc10

Hi Binukkutta,

lbolt error on SC10 is a known problem. This happens normally when there is a disk firmware mismatch. If any disks are replaced on this JBOD recently, problem might have been started from this period because of a different firmware version on newly replaced drive.

To find the firmware version of hard disk I do

mstm --> Select the disk drive --> ( On the menu bar go to ) Tools --> Firmware Update --> Info.


This will display the FW version of your disk. Make sure that all the disks are having the same firmware version. If not you need to update the firmware version to make it matching with other disks.

But..See this is a very very risky process. You can even loss all your data if we are not doing properly. Best method is to get a similar disk which is having the same firmware version and replace it else you can ask HOTRC to do this for u. Make sure that all the disk are having the same part number too. There are a a lot of things needs to be considered before updating disk firmware. As mentioned consult HOTRC/IRC.

Before doing any activity take 2 sets of full backups including ignite and verify the backups.

Reply with your observations.

Regards,
Syam
mits
Respected Contributor

Re: Continous scsi reset in L 2000 with sc10

Hi,

Memory single bit error does not relate to the SCSI error. It is another problem. But the system should be able to recover the memory data since it is correctable error. I am not sure what log you checked with the STM. Do you mean there was no memory error entry in the memory error log or the PDT? And there should not be the HPMC since the single bit memory error is LPMC. You can follow the EMS message how you should deal with your memory error. If you have any concern on your memory error, you can contact HP support center.