Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

EXTENDED SENSE DATA RECEIVED

 
SOLVED
Go to solution
Joewee
Regular Advisor

EXTENDED SENSE DATA RECEIVED

Hi All,


Please find the error below. I know that this is a normal error and it can be ignored on a normal working condition. But my concern is, this error is coming very often. In the past few hours we received 13 times the same error. kindly suggest me what is supposed to be done on this.



******************************* ENTRY 1461. *******************************
ERROR SEQUENCE 11624. LOGGED ON: CPU_TYPE 00000007
DATE/TIME 22-FEB-2009 00:45:52.57 SYS_TYPE 0000000C
SYSTEM UPTIME: 4 DAYS 20:12:26
SCS NODE: XXXX OpenVMS AXP V7.1-1H1

HW_MODEL: 00000621 Hardware Model = 1569.

DEVICE ERROR AlphaServer 8400 5/440

GENERIC DK SUB-SYSTEM, UNIT _$1$DKG0:
DEC HSZ70


HW REVISION 5A373756
HW REVISION = V77Z
ERROR TYPE 05
EXTENDED SENSE DATA RECEIVED
SCSI ID 00
SCSI ID = 0.
SCSI LUN 00
SCSI LUN = 0.
SCSI SUBLUN 00
SCSI SUBLUN = 0.
PORT STATUS 00000001
%SYSTEM-S-NORMAL, NORMAL SUCCESSFUL
COMPLETION
SCSI CMD C800002A
000062FA
0003
WRITE EXTENDED
SCSI STATUS 00
GOOD

EXTENDED SENSE DATA

EXTENDED SENSE 00060070
98000000
00000000
000011A1
00000000
00000000
00000000
00000000
03FD0F01
00000041
00000000
00000000
00000000
20200000
20202020
3038475A
33373035
37563038
00005A37
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
UNIT ATTENTION
SENSE CODE = A1(X)
UCB$L_ERTCNT 00000010
16. RETRIES REMAINING
UCB$L_ERTMAX 00000010
16. RETRIES ALLOWABLE
ORB$L_OWNER 00010004
OWNER UIC [001,004]
UCB$L_CHAR 1C4D4008
DIRECTORY STRUCTURED
FILE ORIENTED
SHARABLE
AVAILABLE
MOUNTED
ERROR LOGGING
CAPABLE OF INPUT
CAPABLE OF OUTPUT
RANDOM ACCESS
UCB$L_STS 08021810
ONLINE
SOFTWARE VALID
UNLOAD AT DISMOUNT
UCB$L_OPCNT 0026964D
2528845. QIO'S THIS UNIT
UCB$L_ERRCNT 0000000E
14. ERRORS THIS UNIT
IRP$L_BCNT 0000042A
TRANSFER SIZE 1066. BYTE(S)
IRP$L_BOFF 00000000
TRANSFER PAGE ALIGNED
IRP$L_PID 000100AA
REQUESTOR "PID"
IRP$Q_IOSB 00000000
00000000 IOSB, 0. BYTE(S) TRANSFERRED



Thanks,
Joe


17 REPLIES 17
Joewee
Regular Advisor

Re: EXTENDED SENSE DATA RECEIVED

Added to that...

MRLB_SYSTEM> sho dev dkg

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
$1$DKG0: (XXXX) Mounted 16 DATA01 3589737 123 1
$1$DKG1: (XXXX) Mounted 15 DATA02 12659713 9 1
$1$DKG2: (XXXX) Mounted 6 DATA03 13546263 3 1
$1$DKG100: (XXXX) Mounted 6 DATA04 14665543 1 1
$1$DKG101: (XXXX) Mounted 6 DATA05 16570172 1 1
$1$DKG102: (XXXX) Mounted 6 DATA06 16713550 1 1
MRLB_SYSTEM>


All these disks are connected to once controller redundent pair which are HSZ70 type. Should this be some problem on the controller side. All the above error are logged in a span of few hours only.


Jur van der Burg
Respected Contributor

Re: EXTENDED SENSE DATA RECEIVED

Use DECEVENT to decode the error. It's a unit attention with ASC A1 / ASQ 11 which comes from the controller. And check the other disks as well, it could be one bad disk hanging up the bus.

Jur.
Wim Van den Wyngaert
Honored Contributor

Re: EXTENDED SENSE DATA RECEIVED

$ anal/disk/read xxx for the device.

If parity errors are shown, you have bad blocks on disk. You can try to mark the bad file as "set file/nomove" and replace it by a good file from another system.

If no parity errors are shown, the disk probably recovered well.

New disk sometimes give the errors during a few days and then it stops giving the error. Also when previous free space is allocated the first time you can encounter the error.

Advice : always protect your data with shadowing or raid.

Wim
Wim
marsh_1
Honored Contributor

Re: EXTENDED SENSE DATA RECEIVED

hi,

if you can get on to the storage controllers try 'run fmu' and do a 'show last most' command to display most recent error.

Joewee
Regular Advisor

Re: EXTENDED SENSE DATA RECEIVED

Hi Mark,

Please find the FMU out put. But nothing latest was there.


HSZ020> run fmu

Fault Management Utility

FMU> sho last most

Last Failure Entry: 4. Flags: 000FF981
Template: 1.(01) Description: Last Failure Event
Occurred on 01-JUL-2008 at 09:29:03
Power On Time: 10. Years, 86. Days, 9. Hours, 22. Minutes, 2. Seconds
Controller Model: HSZ70
Serial Number: ZG74904396 Hardware Version: H01(47)
Firmware Version: V77Z(00)
Informational Report
Instance Code: 01010302 Description:
An unrecoverable hardware detected fault occurred.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 1.(01)
Event Threshold: 2.(02) Classification:
HARD. Failure of a component that affects controller performance or
precludes access to a device connected to the controller is indicated.
Last Failure Code: 018700A0 (No Last Failure Parameters)
Last Failure Code: 018700A0 Description:
A processor interrupt was generated with an indication that the (//) RESET
button on the controller module was depressed.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 135.(87)
Restart Type: 2.(02) Description: Automatic hardware restart


HSZ019> run fmu

Fault Management Utility

FMU> sho last most

Last Failure Entry: 2. Flags: 000FF981
Template: 1.(01) Description: Last Failure Event
Occurred on 01-JUL-2008 at 09:28:52
Power On Time: 10. Years, 86. Days, 12. Hours, 7. Minutes, 15. Seconds
Controller Model: HSZ70
Serial Number: ZG80507380 Hardware Version: H01(47)
Firmware Version: V77Z(00)
Informational Report
Instance Code: 01010302 Description:
An unrecoverable hardware detected fault occurred.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 1.(01)
Event Threshold: 2.(02) Classification:
HARD. Failure of a component that affects controller performance or
precludes access to a device connected to the controller is indicated.
Last Failure Code: 018700A0 (No Last Failure Parameters)
Last Failure Code: 018700A0 Description:
A processor interrupt was generated with an indication that the (//) RESET
button on the controller module was depressed.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 135.(87)
Restart Type: 2.(02) Description: Automatic hardware restart


Hi Wim,

Ana/disk/read is in progress.


Hi Jur,

Im not sure how to decode the error with the decevent.

cany you help me with that please?




Thanks for all your replies.


Regards,
Joe
Volker Halle
Honored Contributor

Re: EXTENDED SENSE DATA RECEIVED

Joe,

DECevent is an HP analysis tool and can be downloaded and installed on your OpenVMS system.

http://h18023.www1.hp.com/support/svctools/decevent/index.html

Volker.
Volker Halle
Honored Contributor
Solution

Re: EXTENDED SENSE DATA RECEIVED

Joe,

note that you can also install DECevent on a Windows System and run the OpenVMS ERRLOG.SYS analysis under Windows.

See the DECevent home page...

Volker.
Joewee
Regular Advisor

Re: EXTENDED SENSE DATA RECEIVED

Volker,

Many thanks. I will check the same now.

Wil keep you all updated.

thanks
Joe
Wim Van den Wyngaert
Honored Contributor

Re: EXTENDED SENSE DATA RECEIVED

fmu show last all full could show even more info.

Wim
Wim
Joewee
Regular Advisor

Re: EXTENDED SENSE DATA RECEIVED

Wim,

Thanks and

ana/disk/read gave only the warning.


SYSTEM> anal/disk/read $1$DKG0:
Analyze/Disk_Structure for _$1$DKG0: started on 23-FEB-2009 08:42:45.32

%ANALDISK-I-OPENQUOTA, error opening QUOTA.SYS
-SYSTEM-W-NOSUCHFILE, no such file
%ANALDISK-W-OPENFILE, file (5739,34,1)
error opening file for read check
-SYSTEM-W-ACCONFLICT, file access conflict
%ANALDISK-W-FUTCREDAT, file (234230,1,1) [ORACLE7_MRLB1.DB_MRLB1]SYSTEM_STARTUP_SQLNETV2.LOG;8705
creation date is in the future
%ANALDISK-W-FREESPADRIFT, free block count of 3589329 is incorrect (RVN 1);
the correct value is 3589261


Any suggestiond on this.


Joe.
Joewee
Regular Advisor

Re: EXTENDED SENSE DATA RECEIVED

And i noticed one more think now.

FMU> exit

FMU -- Normal termination, status: 1.
Power Supply failure detected.
Fan failure cleared.

HSZ019>


I cleared this message through

> clear cli

but it popped up again.

This could be the problem for this?
marsh_1
Honored Contributor

Re: EXTENDED SENSE DATA RECEIVED

hi,

try clear cli again , if it pops up again can you run fmu again and do show parameters

Richard Brodie_1
Honored Contributor

Re: EXTENDED SENSE DATA RECEIVED

ASC A1 / ASCQ 11 is indeed "fan fault is fixed". I would pull the other ASC/ASCQ pairs, and see what else shows up. Even without DECevent, they are easy enough to read off.

Volker Halle
Honored Contributor

Re: EXTENDED SENSE DATA RECEIVED

Joe,

in a previous entry I wrote:

'note that you can also install DECevent on a Windows System and run the OpenVMS ERRLOG.SYS analysis under Windows.'

I was apparently thinking of WEBES SEA (System Event Analyzer), where you can do exactly this.

After scanning the DECevent NT user guide, I'm not sure, that you can actually analyze OpenVMS Alpha ERRLOG.SYS files with this tool under Windows.

Volker.
Joewee
Regular Advisor

Re: EXTENDED SENSE DATA RECEIVED

Mark,

I cleared again and it haven came back. I was monitoring the same for quite some time now but it haven popped out.

Richard,

Im not sure of what is "ASC/ASCQ", Should I have to do anything physicall for this??

Volker,

Thanks, I tried to install it on windows but it said some file was missing and dont have much rights on this PC so i left it there itself.


All,

Now there were no more errors on any of the disks. But still out of my curiousity i would like to know that, what made the disks to log so much of errors in that short span of time. And what can be done to prevent it in the future if possible.

Thanks for all your replies.


Joe.
Richard Brodie_1
Honored Contributor

Re: EXTENDED SENSE DATA RECEIVED

ASC Additional sense code
ASCQ Additional sense code qualifier

This is SCSI standards speak for error code/subcode. Jur pulled these out using DECevent but your original report is
smart enough to decode the ASC at least as:

SENSE CODE = A1(X)

You can find the ASC/ASCQ values in the raw data: 12th and 13th bytes of the data block returned.

EXTENDED SENSE 00060070
98000000
00000000
000011A1
^ ^
ASCQ ASC

If you see other codes you can look them up in the service guide: http://h18004.www1.hp.com/products/storageworks/techdoc/controllers/EK-HSZ70-SV-B01.html
Wim Van den Wyngaert
Honored Contributor

Re: EXTENDED SENSE DATA RECEIVED

My guess is that the first write gave "extended sense". Thus a rewrite was done and this worked. Did the disk move the block to the bad block list or did the rewrite worked better than the first write ?

My expierence is that often a range of bad blocks is found. And as long as blocks are allocated from this range, you have the problem.

fwiw

Wim
Wim