3PAR StoreServ Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

Nodes with Dump or HBA core files

 
SOLVED
Go to solution
Highlighted
SNPP
Occasional Advisor

Nodes with Dump or HBA core files

Hello. I have:
HP 3PAR 7400c (4 nodes) 3.3.1.410 (MU2)+P30,P32,P34,P36,P37,P38,P39 + Upgrade_Tool_180809.U014
VSP 5.0.4.0-25509.

Recently installed on 3PAR patch P39 and update Service Precessor 5.0.3.1-25112 to 5.0.4.0-25509

After i have warning:

admithw

Spoiler
2018-11-23 08:09:55 +03 Created task.
2018-11-23 08:09:55 +03 Updated Executing "admithw -reporterror" as 3:59765
2018-11-23 08:09:55 +03 Updated Checking nodes...
2018-11-23 08:09:55 +03 Updated Checking volumes...
2018-11-23 08:09:55 +03 Updated Checking system LDs...
2018-11-23 08:09:55 +03 Updated Checking ports...
2018-11-23 08:09:55 +03 Updated Checking SAS HBA firmware...
2018-11-23 08:09:55 +03 Updated 4 Port(s) require firmware upgrades.
2018-11-23 08:09:55 +03 Updated This will require approximately 6 minute(s).
2018-11-23 08:11:56 +03 Updated Checking state of disks...
2018-11-23 08:11:56 +03 Updated Checking for drive table upgrade packages.
2018-11-23 08:11:56 +03 Updated Package check completed
2018-11-23 08:11:56 +03 Updated Checking cabling...
2018-11-23 08:11:56 +03 Updated Checking if this is an upgrade that added new types of drives...
2018-11-23 08:11:57 +03 Updated Checking for cage ordering...
2018-11-23 08:11:57 +03 Updated Checking for disks to admit...
2018-11-23 08:11:57 +03 Updated Checking for drive table upgrade packages.
2018-11-23 08:11:57 +03 Updated Package check completed
2018-11-23 08:11:58 +03 Updated 0 disks admitted
2018-11-23 08:11:58 +03 Updated
2018-11-23 08:11:58 +03 Updated Checking admin volume...
2018-11-23 08:11:58 +03 Updated Admin volume exists.
2018-11-23 08:11:58 +03 Updated Checking cage firmware...
2018-11-23 08:11:58 +03 Updated Checking if logging LDs need to be created...
2018-11-23 08:11:58 +03 Updated
2018-11-23 08:11:58 +03 Updated System has less cages than nodes; using -ha mag for logging LDs.
2018-11-23 08:11:58 +03 Updated
2018-11-23 08:11:58 +03 Updated No new logging LDs need to be created
2018-11-23 08:11:58 +03 Updated Checking if preserved data LDs need to be created...
2018-11-23 08:11:58 +03 Updated
2018-11-23 08:11:58 +03 Updated System has less cages than nodes; using -ha mag for preserved data LDs.
2018-11-23 08:11:58 +03 Updated
2018-11-23 08:11:58 +03 Updated No new preserved data LDs need to be created
2018-11-23 08:12:12 +03 Updated Checking if system scheduled tasks need to be created...
2018-11-23 08:12:12 +03 Updated Checking if the rights assigned to extended roles need to be updated...
2018-11-23 08:12:12 +03 Updated No need to update extended roles rights.
2018-11-23 08:12:12 +03 Updated Checking spares...
2018-11-23 08:12:12 +03 Updated Rebalancing and adding FC spares...
2018-11-23 08:12:16 +03 Updated FC spare chunklets rebalanced; number of FC spare chunklets increased by 0 for a total of 2232.
2018-11-23 08:12:16 +03 Updated Rebalancing and adding NL spares...
2018-11-23 08:12:19 +03 Updated NL spare chunklets rebalanced; number of NL spare chunklets increased by 0 for a total of 4990.
2018-11-23 08:12:19 +03 Updated Rebalancing and adding SSD spares...
2018-11-23 08:12:19 +03 Updated No SSD PDs present
2018-11-23 08:12:19 +03 Updated Checking default CPGs...
2018-11-23 08:12:19 +03 Updated Checking srdata volume...
2018-11-23 08:12:19 +03 Updated System Reporter data volume exists.
2018-11-23 08:12:20 +03 Updated Checking default AO configuration...
2018-11-23 08:12:20 +03 Updated Checking fibre channel ports configuration...
2018-11-23 08:12:23 +03 Updated Checking system health...
2018-11-23 08:12:23 +03 Updated Checking alert
2018-11-23 08:12:23 +03 Updated Checking ao
2018-11-23 08:12:23 +03 Updated Checking cabling
2018-11-23 08:12:23 +03 Updated Checking cage
2018-11-23 08:12:24 +03 Updated Checking cert
2018-11-23 08:12:24 +03 Updated Checking dar
2018-11-23 08:12:25 +03 Updated Checking date
2018-11-23 08:12:29 +03 Updated Checking file
2018-11-23 08:12:29 +03 Updated Checking fs
2018-11-23 08:12:29 +03 Updated Checking host
2018-11-23 08:12:30 +03 Updated Checking ld
2018-11-23 08:12:30 +03 Updated Checking license
2018-11-23 08:12:30 +03 Updated Checking network
2018-11-23 08:12:30 +03 Updated Checking node
2018-11-23 08:12:35 +03 Updated Checking pd
2018-11-23 08:12:41 +03 Updated Checking pdch
2018-11-23 08:12:45 +03 Updated Checking port
2018-11-23 08:12:45 +03 Updated Checking qos
2018-11-23 08:12:45 +03 Updated Checking rc
2018-11-23 08:12:45 +03 Updated Checking snmp
2018-11-23 08:12:45 +03 Updated Checking task
2018-11-23 08:12:46 +03 Updated Checking vlun
2018-11-23 08:12:46 +03 Updated Checking vv
2018-11-23 08:12:46 +03 Updated Checking sp
2018-11-23 08:12:46 +03 Updated Component -------Summary Description------- Qty
2018-11-23 08:12:46 +03 Updated File Nodes with Dump or HBA core files 1
2018-11-23 08:12:46 +03 Updated -----------------------------------------------
2018-11-23 08:12:46 +03 Updated 1 total 1
2018-11-23 08:12:46 +03 Updated
2018-11-23 08:12:46 +03 Updated
2018-11-23 08:12:46 +03 Updated admithw has completed.
2018-11-23 08:12:46 +03 Completed scheduled task.


checkhealth -detail

Spoiler
Checking alert
Checking ao
Checking cabling
Checking cage
Checking cert
Checking dar
Checking date
Checking file
Checking fs
Checking host
Checking ld
Checking license
Checking network
Checking node
Checking pd
Checking pdch
Checking port
Checking qos
Checking rc
Checking snmp
Checking task
Checking vlun
Checking vv
Checking sp
Component -------Summary Description------- Qty
File      Nodes with Dump or HBA core files   1
-----------------------------------------------
        1 total                               1

Component -Identifier- ----Detailed Description----
File      node:3       Dump or HBA core files found
---------------------------------------------------
        1 total


In the documentation "HP 3PAR StoreServ 7000 and 7000cStorage Troubleshooting Guide":

This condition might be transient because the Service Processor retrieves the files and cleansup the dump directory. If the SP is not gathering the dump files, check the condition and state of the SP.

There are no active tasks on the Service Precessor.
The reboot Service Precessor and Node 3 did not help.

Thanks in advance for your help about this!

5 REPLIES 5
Highlighted
sanj_s
HPE Pro

Re: Nodes with Dump or HBA core files

Hi,

This looks like the hardware issue with the HBA, however please send me with the serial number of the array through a private message by clicking here , so that I can try to check further.

Regards,

I am an HPE Employee


Accept or Kudo
Highlighted
SNPP
Occasional Advisor

Re: Nodes with Dump or HBA core files

Hi,  sanj_s
Sorry, but my 3PAR and SP does not have access to the Internet.

Highlighted
sanj_s
HPE Pro

Re: Nodes with Dump or HBA core files

Hi,

This needs more analysis hence I would request you to log a case with HPE support.

Regards,

I am an HPE Employee.

 


Accept or Kudo
Highlighted
SNPP
Occasional Advisor

Re: Nodes with Dump or HBA core files

I began to study this problem in more detail and found the following.
I turned on the display of service messages and in the tasks I saw:
3PAR_update_error.png
COREDUMP: recovered files:
/var/core/proc/saved/analysis.cimserver.24076.Nov23-08:08:12
/var/core/proc/saved/core.cimserver.24076.Nov23-08:08:12
/var/core/proc/saved/log.cimserver.24076.Nov23-08:08:12.

After manual reboot node 3:

3PAR_bios_error.png

BIOS log entry stored in /pr_mnt/bioslogs/idelog.node3.2018-Nov23-10-12-53

Is it possible to access these files to get detailed information?

Highlighted
SNPP
Occasional Advisor
Solution

Re: Nodes with Dump or HBA core files

Recently contacted HPE support.
Successfully solved my problem.

Connected via SSH to the Service Processor using hpesupport user:

login as: hpesupport
hpesupport@VSP password: *******

The programs included with the HPE Linux system are free software; the
exact license terms for each program are described in the individual
files in /usr/share/doc/*/copyright.

HPE Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
Last login: Mon Dec 10 10:22:33 2018 from IP_MyComputer

Connected via Service Processor to the 3PAR using root user:

hpesupport@VSP:~$ ssh -i /home/hpesupport/.ssh/3PAR_SN root@3PAR_IP
The authenticity of host '3PAR_IP (3PAR_IP)' can't be established.
RSA key fingerprint is SHA256:******************************************.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '3PAR_IP' (RSA) to the list of known hosts.
root@3PAR_IP's password: ***********
Mon Dec 10 10:23:54 +04 2018

Сheck the status and look at the error messages:

root@3PAR_SN-2 Mon Dec 10 10:23:54:~# checkhealth -detail
Checking alert
Checking ao
Checking cabling
Checking cage
Checking cert
Checking dar
Checking date
Checking file
Checking fs
Checking host
Checking ld
Checking license
Checking network
Checking node
Checking pd
Checking pdch
Checking port
Checking qos
Checking rc
Checking snmp
Checking task
Checking vlun
Checking vv
Checking sp
Component -------Summary Description------- Qty
File      Nodes with Dump or HBA core files   1
-----------------------------------------------
        1 total                               1
Component -Identifier- ----Detailed Description----
File      node:3       Dump or HBA core files found
---------------------------------------------------
        1 total

Switch to Node 3:

root@3PAR_SN-2 Mon Dec 10 10:24:32:~# rsh node3
Linux 3PAR_SN-3 2.6.32 #1 SMP Mon May 7 16:56:42 PDT 2018 x86_64

Usage of the HPE 3Par Operating System is subject to the terms of
the End-User License Agreement (EULA), with the exclusion of software licensed
under other terms.  The EULA can be found in /usr/share/doc/tpd/copyright.

Many of the programs included with the HPE 3Par Operating System are
freely redistributable; the exact distribution terms for each program
are described in the individual files in /usr/share/doc/*/copyright.

Look at the contents of the directory /var/core/proc/saved/:

root@3PAR_SN-3 Mon Dec 10 10:24:42:~# cd /var/core
root@3PAR_SN-3 Mon Dec 10 10:24:49:/var/core# cd proc
root@3PAR_SN-3 Mon Dec 10 10:24:55:/var/core/proc# cd saved
root@3PAR_SN-3 Mon Dec 10 10:25:00:/var/core/proc/saved# ls -lSah
total 53M
-rw-r--r-- 1 root root 209M Nov 23 08:08 core.cimserver.24076.Nov23-08:08:12
-rw------- 1 root root 1.5M Nov 23 08:08 log.cimserver.24076.Nov23-08:08:12
-rw-r--r-- 1 root root 4.8K Nov 23 08:08 analysis.cimserver.24076.Nov23-08:08:12

Create a backup file:

root@3PAR_SN-3 Mon Dec 10 10:25:03:/var/core/proc/saved# cli sendhome 3 /var/core/proc/saved/core.cimserver.24076.Nov23-08\:08\:12
root@3PAR_SN-3 Mon Dec 10 10:27:02:/var/core/proc/saved# cli sendhome 3 /var/core/proc/saved/log.cimserver.24076.Nov23-08\:08\:12
root@3PAR_SN-3 Mon Dec 10 10:27:13:/var/core/proc/saved# cli sendhome 3 /var/core/proc/saved/analysis.cimserver.24076.Nov23-08\:08\:12

Delete files:

root@3PAR_SN-3 Mon Dec 10 10:27:45:/var/core/proc/saved# rm /var/core/proc/saved/*
root@3PAR_SN-3 Mon Dec 10 10:28:16:/var/core/proc/saved# ls

Сheck the status:

root@3PAR_SN-3 Mon Dec 10 10:28:18:/var/core/proc/saved# checkhealth -detail
Checking alert
Checking ao
Checking cabling
Checking cage
Checking cert
Checking dar
Checking date
Checking file
Checking fs
Checking host
Checking ld
Checking license
Checking network
Checking node
Checking pd
Checking pdch
Checking port
Checking qos
Checking rc
Checking snmp
Checking task
Checking vlun
Checking vv
Checking sp
System is healthy

Ending session:

root@3PAR_SN-3 Mon Dec 10 10:28:51:/var/core/proc/saved# exit
logout
rlogin: connection closed.
root@3PAR_SN-2 Mon Dec 10 10:30:00:~# exit
logout
Connection to 3PAR_IP closed.

Thx.