Operating System - Tru64 Unix
1753923 Members
8517 Online
108810 Solutions
New Discussion юеВ

QBB0: GP HSlink overflow error of SoftQbb0 (HardQbb0)

 
Vicente_18
Occasional Contributor

QBB0: GP HSlink overflow error of SoftQbb0 (HardQbb0)

Hi all,

We have 2 alpha servers GS160. When backup is running (12 vdumps in parallel )the system crash, the binary file reports next event:

Description: Console Data Log Event at Thu 30 Mar 2006 04:25:51 GMT+02:00
File: /var/adm/binary.errlog
==================================================================
==============

COMMON EVENT HEADER (CEH) V2.0
Event_Leader xFFFF FFFE
Header_Length 260
Event_Length 752
Header_Rev_Major 2
Header_Rev_Minor 0
OS_Type 1 -- Tru64 UNIX
Hardware_Arch 4 -- Alpha
CEH_Vendor_ID 3,564 -- Hewlett-Packard Company
Hdwr_Sys_Type 35 -- GS40/80/160/320 Series
Logging_CPU 0 -- CPU Logging this Event
CPUs_In_Active_Set 1
Major_Class 113
Minor_Class 0
Entry_Type 113 -- Console Data Log Event
DSR_Msg_Num 1,968 -- AlphaServer GS160
Chip_Type 11 -- EV67 - 21264A
CEH_Device 255
CEH_Device_ID_0 x0000 03FF
CEH_Device_ID_1 x0000 0007
CEH_Device_ID_2 x0000 0007
Unique_ID_Count 0
Unique_ID_Prefix 15,000
Num_Strings 5

TLV Section of CEH
TLV_DSR_String AlphaServer GS160 6/731
TLV_OS_Version Compaq Tru64 UNIX V5.1B (Rev. 2650)
TLV_Sys_Serial_Num AY11204820
TLV_Time_as_Local Thu 30 Mar 2006 04:25:51 GMT+02:00
TLV_Computer_Name su010082
Entry_Type 113

Console_Data_log

START OF SUBPACKETS IN THIS EVENT

Halt Frame Header Subpacket - V1.0
Time_Stamp x0000 3603 1E02 0B25 Time Stamp
Seconds[7:0] 37 Seconds
Minutes[15:8] 11 Minutes
Hours[23:16] 2 Hours Unix = GMT Ovms = Local
Day[31:24] 30 Day
Month[39:32] 3 March
Year[47:40] 54 2006

System Machine Check Error Frame Subpacket - Version 1
whami 0 CPU Reporting Error
frame_size x0000 00E8
frame_flags x0000 0000
processor_offset x0000 0018
system_offset x0000 00A0
mchk_code x0000 0200
ev6_mchk_code[31:0] x200 660 - System Fault
frame_revision x0000 0001 GS80-160-320 BitToText Revision=2106.2002.01
i_stat x0000 0000 0000 0000 IBox Status Register
dc_stat x0000 0000 0000 0000 Dcache Status Register
c_addr x0000 0000 0000 0000 Cbox read register field
error_address[42:6] x0 Error Address of last reported ECC or Parity error
c_syndrome_1 x0000 0000 0000 0000 CBox Syndrome 1
upper_qw_syndrome[7:0]x0 Syndrome for Upper Quadword
c_syndrome_0 x0000 0000 0000 0000 Cbox Syndrome 0
lower_qw_syndrome[7:0]x0 Syndrome for Lower Quadword
c_stat x0000 0000 0000 0000 CBox Read C_STAT
c_sts x0000 0000 0000 0000 CBox Read Register C_STS
block_status[3:0] x0 Shared
mm_stat x0000 0000 0000 0000 Memory Management Status Register
opcode[9:4] x0 Opcode of the Instruction that Caused the Error
exc_addr x0000 0000 0000 0000 Exception Address Register
pc[63:2] x0 Exception Address
ier_cm x0000 0000 0000 0000 Interrupt Enable and Current Processor Mode Register
cm[4:3] x0 Kernel
asten[13] x0 AST Interrupt Enable
sien[28:14] x0 Software Interrupt Enables
pcen[30:29] x0 Performance Counter Interrupt Enables
eien[38:33] x0 External Interrupt Enable
isum x0000 0000 0000 0000 Interrupt Summary Register
astk[3] x0
aste[4] x0
asts[9] x0
astu[10] x0
si[28:14] x0
pc[30:29] x0
cr[31] x0
sl[32] x0
ei[38:33] x0
pal_base x0000 0000 0000 0000 Pal Base Register
pal_base[43:15] x0 Base Physical Address for PALcode
i_ctl x0000 0000 0000 0000 Ibox Control Register
ic_en[2:1] x0
spe[5:3] x0
sde[7:6] x0
sbe[9:8] x0
bp_mode[11:10] x0
hwe[12] x0
sl_xmit[13] x0
sl_rcv[14] x0
va_48[15] x0
va_form_32[16] x0
single_issue_h[17] x0
pct0_en[18] x0
pct1_en[19] x0
call_pal_r23[20] x0
mchk_en[21] x0
tb_mb_en[22] x0
bist_fail[23] x0
chip_id[29:24] x0 ChipId = EV6 PASS 1
vptp[47:30] x0
sext[63:48] x0
process_context x0000 0000 0000 0000 Process Context Register
ppce[1] x0 Process Performance Counting Enable
fpe[2] x0 Floating Point Enable
aster[8:5] x0 AST Enable
astrr[12:9] x0 AST Request
asn[46:39] x0 Address Space Number
uncorr_cpu_error_sum x0000 0000 0000 0001 Uncorrectable Error or Fault Summary
QBB0[0] x1 QBB0 uncorrectable Error or Fault
QBB0_csrs_to_be_logged x0000 0000 0001 0000 Registers logged for QBB0:
global_port[16] x1 Global Port
QBB1_csrs_to_be_logged x0000 0000 0000 0000
QBB2_csrs_to_be_logged x0000 0000 0000 0000
QBB3_csrs_to_be_logged x0000 0000 0000 0000
QBB4_csrs_to_be_logged x0000 0000 0000 0000
QBB5_csrs_to_be_logged x0000 0000 0000 0000
QBB6_csrs_to_be_logged x0000 0000 0000 0000
QBB7_csrs_to_be_logged x0000 0000 0000 0000

System Error Frame Header Subpacket - V1.0

Global Port Error Frame Subpacket - Version 2
base_physical_address x0000 0FFF FFC0 0000 Base physical addess
entity[22:18] x10 Global Port Module (GP)
qbb_id[41:36] x3F QBB0
GPA_GPL_ERR_SUM x0000 0000 0000 0000 GPA GPLink Error Summary Register
gpl_fail_gpd[0] x0 Failing GPD = GPD0
GPA_EXT_OVFL_ERR_SUM x0000 0000 0000 0004 GPA Extended Overflow Error Summary Register
Q1T_Ovfl_Err[2] x1 Q1 Table overflow error
GPA_HSL_ERR_SUM x0000 0000 0000 0000
hsl_fail_gpd[0] x0 Failing GPD = GPD0
GPA_GPL_ERR_ADDR0 x0000 0000 0000 0000 GPA GPLink Error Address Register 0
addr_13_6[7:0] x0 Address <13:6>
GPA_GPL_ERR_ADDR1 x0000 0000 0000 0000 GPA GPLink Error Address Register 1
addr_21_14[7:0] x0 Address <21:14>
GPA_GPL_ERR_ADDR2 x0000 0000 0000 0000 GPA GPLink Error Address Register 2
addr_29_22[7:0] x0 Address <29:22>
GPA_GPL_ERR_ADDR3 x0000 0000 0000 0000 GPA GPLink Error Address Register 3
addr_37_30[7:0] x0 Address <37:30>
GPA_GPL_ERR_ADDR4 x0000 0000 0000 0000 GPA GPLink Error Address Register 4
cmd[6:0] x0 Command=NOP (dirty cid is invalid)
addr_38[7] x0 Address <38>
GPA_GPL_ERR_ADDR5 x0000 0000 0000 0000 GPA GPLink Error Address Register 5
source_cid[5:0] x0 Source Commander ID = CPU0 QBB0
GPA_GPL_ERR_ADDR6 x0000 0000 0000 0000 GPA GPLink Error Address Register 6
dirty_cid[5:0] x0 Dirty Commander ID = CPU0 QBB0
GPA_HSL_ERR_ADDR0 x0000 0000 0000 0000 GPA HSLink Error Address Register 0
addr_13_6[7:0] x0 Address <13:6>
GPA_HSL_ERR_ADDR1 x0000 0000 0000 0000 GPA HSLink Error Address Register 1
addr_21_14[7:0] x0 Address <21:14>
GPA_HSL_ERR_ADDR2 x0000 0000 0000 0000 GPA HSLink Error Address Register 2
addr_29_22[7:0] x0 Address <29:22>
GPA_HSL_ERR_ADDR3 x0000 0000 0000 0000 GPA HSLink Error Address Register 3
addr_37_30[7:0] x0 Address <37:30>
GPA_HSL_ERR_ADDR4 x0000 0000 0000 0000 GPA HSLink Error Address Register 4
cmd[6:0] x0 Command = NOP (dirty cid is invalid)
address_38[7] x0
GPA_HSL_ERR_ADDR5 x0000 0000 0000 0000 GPA HSLink Error Address Register 5
source_cid[5:0] x0 Source Commander ID = CPU0 QBB0
GPA_HSL_ERR_ADDR6 x0000 0000 0000 0000 GPA HSLink Error Address Register 6
dirty_cid[5:0] x0 Dirty commander ID = CPU0 QBB0
GPD_GPL_ERR_SUM x0000 0000 0000 0000 GPD GPLink Error Summary Register
slice0_err_summ[31:0]x0 Slice 0 Error Summary
qw_in_err[6:4] x0 Slice 0 Quadword in Error
syndrome[15:8] x0 Slice 0 Syndrome
slice1_err_sum[63:32]x0 Slice 1 Error Summary
qw_in_err[38:36] x0 Slice 1 Quadword in Error
syndrome[47:40] x0 Slice 1 Syndrome
GPD_HSL_ERR_SUM x0000 0000 0000 0000 GPD HSLink Error Summary Register
slice0_err_sum[31:0]x0 Slice 0 Error Summary
qw_in_err[6:4] x0 Slice 0 Quadword in Error
syndrome[15:8] x0 Slice 0 Syndrome
slice1_err_sum[63:32]x0 Slice 1 Error Summary
qw_in_err[38:36] x0 Slice 1 Quadword in Error
syndrome[47:40] x0 Slice 1 Syndrome


----------------------------------------------

The system event analyzer reports possibles problems in Global Port Cables or Global Port Module or Hierarchical Switch.
We have change all the hardware above and the QBB but the problem persists.
Both GS160 has the same error, i think this is not a hardware error. Any idea?.
2 REPLIES 2
Hein van den Heuvel
Honored Contributor

Re: QBB0: GP HSlink overflow error of SoftQbb0 (HardQbb0)

Welcome to the Tru64 forum.

I agree with your assesment that if two individual system show much similar errors, then it is more likely to be software than hardware. I suppose it is possible that both systems were created with inccrrect configs (PCI slot usage?), but i find that highly unlikely.

I hope you already officially reported this even to HP Support?

This question is beyond the scope of this forum, but you may be lucky and get a 'see this before' answer, while support sorts it out.

If you have not reported the problem to HP yet, please do so now. You have recent software, serious hardware, you deserve the best support you can get, and you deserve it now. Do not only wait around here.

Regards,
Hein.


Ivan Ferreira
Honored Contributor

Re: QBB0: GP HSlink overflow error of SoftQbb0 (HardQbb0)

I would suggest installing WEBES. WEBES can translate the binary.errlog and generate very usefull reports indicating the component with problems.

For example:

wsea n analyze binary.errlog out report.txt
wsea n translate binary.errlog out translate.txt
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?