HP-UX

log 분석 문의 드립니다.

 
최경민_1
비정기 조언자

log 분석 문의 드립니다.

현재 vpar(2partitions)로 구성된 OS11.23의 rx8640장비 입니다.

log를 살펴 보다가 잘 모르겠어서 고수님들께 도움 요청 드립니다.

대략 어떤 내용이고 어떤 조치를 취해야 하는지 조언 부탁드립니다.

--------------------------------------------------------

>------------ Event Monitoring Service Event Notification ------------<



Notification Time: Fri Jun 12 07:54:31 2009



kaka01 sent Event Monitor notification information:



/system/events/ipmi_fpl/ipmi_fpl is >= 1.

Its current value is INFORMATION(1).







Event data from monitor:



Event Time..........: Fri Jun 12 07:54:30 2009

Severity............: INFORMATION

Monitor.............: fpl_em

Event #.............: 267

System..............: kaka01



Summary:

INIT initiated





Description of Error:



This is the equivalent of a TOC event in the PA RISC Architecture. On IPF

systems, this event is called an INIT.

This event can be triggered by the "tc" command from the MP, or from the button

labeled "TOC" :wor "Transfer of Control" on the Management card or bezel of the

system. There are also other causes of an INIT generated by software.

Data: Local CPU Number



Probable Cause / Recommended Action:





Software has requested an INIT or the INIT button has been pressed.

No action is required.





Additional Event Data:

System IP Address...: 10.223.15.11

Event Id............: 0x4a318b2700000000

Monitor Version.....: A.01.00

Event Class.........: System

Client Configuration File...........:

/var/stm/config/tools/monitor/default_fpl_em.clcfg

Client Configuration File Version...: A.01.00

Qualification criteria met.

Number of events..: 1

Associated OS error log entry id(s):

None

Additional System Data:

System Model Number.............: ia64 hp server rx8640

EMS Version.....................: A.04.20

STM Version.....................: C.60.00

System Serial Number............: SGH4729NCY

Latest information on this event:

http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#267



v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v





IPMI event hex: 0xf680007913e00000 0x000000000000000b

Time Stamp: Thu Jun 11 22:05:29 2009

Event keyword: INIT_INITIATED

Alert level name: Fatal

Reporting vers: 1

Data field type: Implementation dependent data field

Decoded data field:

Reporting entity ID: 19 ( Cab 0 Cell 1 CPU 3 )

Reporting entity Full Name: System Firmware

IPMI Event ID : 121 (0x79)





>---------- End Event Monitoring Service Event Notification ----------<



>------------ Event Monitoring Service Event Notification ------------<



Notification Time: Fri Jun 12 07:54:31 2009



kaka01 sent Event Monitor notification information:



/system/events/ipmi_fpl/ipmi_fpl is >= 1.

Its current value is SERIOUS(4).







Event data from monitor:



Event Time..........: Fri Jun 12 07:54:31 2009

Severity............: SERIOUS

Monitor.............: fpl_em

Event #.............: 548

System..............: kaka01



Summary:

OS legacy PA hex fault code (Bxxx)





Description of Error:



OS legacy PA hex fault code (Bxxx). Possible I/O error or system panic



Probable Cause / Recommended Action:





fault/panic

-





Additional Event Data:

System IP Address...: 10.223.15.11

Event Id............: 0x4a318b2700000003

Monitor Version.....: A.01.00

Event Class.........: System

Client Configuration File...........:

/var/stm/config/tools/monitor/default_fpl_em.clcfg

Client Configuration File Version...: A.01.00

Qualification criteria met.

Number of events..: 1

Associated OS error log entry id(s):

None

Additional System Data:

System Model Number.............: ia64 hp server rx8640

EMS Version.....................: A.04.20

STM Version.....................: C.60.00

System Serial Number............: SGH4729NCY

Latest information on this event:

http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#548



v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v





IPMI event hex: 0xbf80033812e00000 0x000000000002b200

Time Stamp: Thu Jun 11 22:07:31 2009

Event keyword: HP-UX_HEX_FAULT_CODE

Alert level name: Critical

Reporting vers: 1

Data field type: Legacy 20-bit PA HEX chassis code

Decoded data field: Legacy PA HEX Chassis Code = 0x2b200

Reporting entity ID: 18 ( Cab 0 Cell 1 CPU 2 )

Reporting entity Full Name: HP-UX Kernel

IPMI Event ID : 824 (0x338)





>---------- End Event Monitoring Service Event Notification ----------<



>------------ Event Monitoring Service Event Notification ------------<



Notification Time: Fri Jun 12 07:54:31 2009



kaka01 sent Event Monitor notification information:



/system/events/ipmi_fpl/ipmi_fpl is >= 1.

Its current value is MAJORWARNING(3).







Event data from monitor:



Event Time..........: Fri Jun 12 07:54:31 2009

Severity............: MAJORWARNING

Monitor.............: fpl_em

Event #.............: 547

System..............: kaka01



Summary:

OS crashdump started (D700)





Description of Error:



OS crashdump started (D700)



Probable Cause / Recommended Action:





panic occurred

-





Additional Event Data:

System IP Address...: 10.223.15.11

Event Id............: 0x4a318b2700000006

Monitor Version.....: A.01.00

Event Class.........: System

Client Configuration File...........:

/var/stm/config/tools/monitor/default_fpl_em.clcfg

Client Configuration File Version...: A.01.00

Qualification criteria met.

Number of events..: 1

Associated OS error log entry id(s):

None

Additional System Data:

System Model Number.............: ia64 hp server rx8640

EMS Version.....................: A.04.20

STM Version.....................: C.60.00

System Serial Number............: SGH4729NCY

Latest information on this event:

http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#547



v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v





IPMI event hex: 0x7f80033712e00000 0x00000000000ad700

Time Stamp: Thu Jun 11 22:07:33 2009

Event keyword: HP-UX_CRASHDUMP_STARTED

Alert level name: Warning

Reporting vers: 1

Data field type: Legacy 20-bit PA HEX chassis code

Decoded data field: Legacy PA HEX Chassis Code = 0xad700

Reporting entity ID: 18 ( Cab 0 Cell 1 CPU 2 )

Reporting entity Full Name: HP-UX Kernel

IPMI Event ID : 823 (0x337)





>---------- End Event Monitoring Service Event Notification ----------<



>------------ Event Monitoring Service Event Notification ------------<



Notification Time: Fri Jun 12 07:54:31 2009



kaka01 sent Event Monitor notification information:



/system/events/ipmi_fpl/ipmi_fpl is >= 1.

Its current value is INFORMATION(1).







Event data from monitor:



Event Time..........: Fri Jun 12 07:54:31 2009

Severity............: INFORMATION

Monitor.............: fpl_em

Event #.............: 549

System..............: kaka01



Summary:

OS dump status (EFxx)





Description of Error:



A memory dump occurred on the system, or on one partition in a cellular system.

The dump occurred due to either a non-recoverable error or by user request.

EFxx is the OS dump status (the success/failure of the writing of the dump).

EF00 = success (followed by either EF0A = successful dump with sync, or EF09 =

successful dump without sync), EFFF = a general error, EFFE = dump path

assertion failure, EFFD = no dump was taken by default, choice or failure, EFFC

= dump was aborted by user.



Probable Cause / Recommended Action:





panic, mca, or INIT path: attempt to write out the dump is complete

No action required





Additional Event Data:

System IP Address...: 10.223.15.11

Event Id............: 0x4a318b2700000009

Monitor Version.....: A.01.00

Event Class.........: System

Client Configuration File...........:

/var/stm/config/tools/monitor/default_fpl_em.clcfg

Client Configuration File Version...: A.01.00

Qualification criteria met.

Number of events..: 1

Associated OS error log entry id(s):

None

Additional System Data:

System Model Number.............: ia64 hp server rx8640

EMS Version.....................: A.04.20

STM Version.....................: C.60.00

System Serial Number............: SGH4729NCY

Latest information on this event:

http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#549



v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v





IPMI event hex: 0x7f80033912e00000 0x00000000000aef00

Time Stamp: Thu Jun 11 22:09:35 2009

Event keyword: HP-UX_DUMP_STATUS

Alert level name: Warning

Reporting vers: 1

Data field type: Legacy 20-bit PA HEX chassis code

Decoded data field: Legacy PA HEX Chassis Code = 0xaef00

Reporting entity ID: 18 ( Cab 0 Cell 1 CPU 2 )

Reporting entity Full Name: HP-UX Kernel

IPMI Event ID : 825 (0x339)





>---------- End Event Monitoring Service Event Notification ----------<



>------------ Event Monitoring Service Event Notification ------------<



Notification Time: Fri Jun 12 07:54:31 2009



kaka01 sent Event Monitor notification information:



/system/events/ipmi_fpl/ipmi_fpl is >= 1.

Its current value is MINORWARNING(2).







Event data from monitor:



Event Time..........: Fri Jun 12 07:54:31 2009

Severity............: MINORWARNING

Monitor.............: fpl_em

Event #.............: 349

System..............: kaka01



Summary:

Firmware is adding a DEGRADED cpu node to the device tree.





Description of Error:



Firmware is adding a device tree node for a CPU that is degraded in

functionality. The cpu should not be trusted and will not be active in the

system.



Probable Cause / Recommended Action:





A CPU that is not fully functional is installed in the cell board.

Replace cpu indicated in data field of event.





Additional Event Data:

System IP Address...: 10.223.15.11

Event Id............: 0x4a318b270000000c

Monitor Version.....: A.01.00

Event Class.........: System

Client Configuration File...........:

/var/stm/config/tools/monitor/default_fpl_em.clcfg

Client Configuration File Version...: A.01.00

Qualification criteria met.

Number of events..: 1

Associated OS error log entry id(s):

None

Additional System Data:

System Model Number.............: ia64 hp server rx8640

EMS Version.....................: A.04.20

STM Version.....................: C.60.00

System Serial Number............: SGH4729NCY

Latest information on this event:

http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#349



v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v





IPMI event hex: 0x6380014f10e00000 0x0000000000000004

Time Stamp: Thu Jun 11 22:09:56 2009

Event keyword: FW_INSTALLED_CPU_DEGRADED

Alert level name: Warning

Reporting vers: 1

Data field type: Actual data

Decoded data field:

Reporting entity ID: 16 ( Cab 0 Cell 1 CPU 0 )

Reporting entity Full Name: System Firmware

IPMI Event ID : 335 (0x14f)





>---------- End Event Monitoring Service Event Notification ----------<



>------------ Event Monitoring Service Event Notification ------------<



Notification Time: Fri Jun 12 07:59:32 2009



kaka01 sent Event Monitor notification information:



/system/events/ipmi_fpl/ipmi_fpl is >= 1.

Its current value is MAJORWARNING(3).







Event data from monitor:



Event Time..........: Fri Jun 12 07:59:32 2009

Severity............: MAJORWARNING

Monitor.............: fpl_em

Event #.............: 832

System..............: kaka01



Summary:

A dimm or CPU has been deconfigured or failed testing





Description of Error:



A dimm or CPU has failed and is not operational for the system. This event is

emitted prior to determining if the cell should be integrated into the

Partition.



Probable Cause / Recommended Action:





A deconfigured dimm or cpu has been detected. Examine earlier events to isolate

the problem.

-





Additional Event Data:

System IP Address...: 10.223.15.11

Event Id............: 0x4a318c5400000000

Monitor Version.....: A.01.00

Event Class.........: System

Client Configuration File...........:

/var/stm/config/tools/monitor/default_fpl_em.clcfg

Client Configuration File Version...: A.01.00

Qualification criteria met.

Number of events..: 1

Associated OS error log entry id(s):

None

Additional System Data:

System Model Number.............: ia64 hp server rx8640

EMS Version.....................: A.04.20

STM Version.....................: C.60.00

System Serial Number............: SGH4729NCY

Latest information on this event:

http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#832



v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v





IPMI event hex: 0x6b000efd10e00000 0x010000004a3180ef

Time Stamp: Thu Jun 11 22:10:55 2009

Event keyword: CELL_HW_DEGRADED

Alert level name: Warning

Reporting vers: 1

Data field type: Timestamp

Decoded data field: Thu Jun 11 22:10:55 2009

Reporting entity ID: 16 ( Cab 0 Cell 1 CPU 0 )

Reporting entity Full Name: System Firmware

IPMI Event ID : 3837 (0xefd)





>---------- End Event Monitoring Service Event Notification ----------<



3 응답 3
한일형
임시 조언자

log 분석 문의 드립니다.

안녕하세요.



우선 crash dump를 발생시키면서 리부팅이 된적이 있으신지



확인해보시기 바랍니다.



dump는 /var/adm/crash에서 확인해보시고 dump가 발생했다면



hp에 로그와 함께 분석 의뢰 해보시기 바랍니다.
monoworld
정기 조언자

log 분석 문의 드립니다.

로그 내용 보면 CPU fail 발생하여 TOC가 발생한거 같습니다.

TOC는 OS Panic 등으로 리부팅 필요할때나 TOC 커맨드 등이

입력 될때 발생합니다.



리부팅 후 OS에서 기존 CPU 갯수와 현재 CPU 갯수 확인해 보시면

차이 날거 같고요. H/W fail 일경우 지원 엔지니어와 상담해

보시면 해결 가능 할거라 생각 됩니다.



cpu갯수는 glance 등을 이용하여 확인해보세요.
최경민_1
비정기 조언자

log 분석 문의 드립니다.

한일형님! monoworld님! 답변 주셔서 감사합니다.