BladeSystem - General
1754312 Members
2792 Online
108813 Solutions
New Discussion

BL460 gen 8 esx´i 6.5 crashes - FC timeout

 
wavecomas
Frequent Visitor

BL460 gen 8 esx´i 6.5 crashes - FC timeout

Hello everyone.

Does anyone have comment or advice for fallowing ?

Our new esx hosts are crashing when IO gets more workloads like replication or servers are doing backup Seems that datastores get timed out. Im able to login in with ssh but all commands are stucked. Sometimes we get purple screen also. Installation is fresh from latest hp customized image relased in may 2017.
It happends with all servers. They are HP bl460 gen8 with 2xE52670 cpus.
All are latest firmwares :
Smart Array P220i Controller 8.0
HP FlexFabric 10Gb 2-port 554FLB Adapter 11.1.183.23 Embedded
HP QMH2572 8Gb FC HBA v. 8.02.00 Slot 1
iLO 2.53 May 03 2017
System ROM I31 06/01/2015  

I applied every kind of tweaks: like vmware  KB 2149043 but its not helping. Also set Numa.FollowCoresPerSocket 1 , increased some buffers.

I can see in vmkernel.log  

2017-05-21T16:11:51.403Z cpu22:65607)ScsiDeviceIO: 2927: Cmd(0x439d0080d2c0) 0x2a, CmdSN 0x800e001c from world 70695 to dev "naa.600c0ff0002726a2690b0d5801000000" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2017-05-21T16:11:51.581Z cpu9:65594)ScsiDeviceIO: 2927: Cmd(0x439500fdc3c0) 0x8a, CmdSN 0x800e0003 from world 72763 to dev "naa.600c0ff0002726a2690b0d5801000000" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2017-05-21T16:11:51.833Z cpu9:65594)ScsiDeviceIO: 2927: Cmd(0x439500e36ec0) 0x8a, CmdSN 0x800e002b from world 72763 to dev "naa.600c0ff0002726a2690b0d5801000000" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2017-05-21T16:11:51.861Z cpu9:65594)ScsiDeviceIO: 2927: Cmd(0x439500e99240) 0x8a, CmdSN 0x800e0029 from world 72763 to dev "naa.600c0ff0002726a2690b0d5801000000" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2017-05-21T16:11:53.741Z cpu9:65594)ScsiDeviceIO: 2927: Cmd(0x43950c7a0580) 0x8a, CmdSN 0x800e0026 from world 72763 to dev "naa.600c0ff0002726a2690b0d5801000000" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2017-05-21T16:11:53.769Z cpu9:65594)ScsiDeviceIO: 2927: Cmd(0x439515797d40) 0x8a, CmdSN 0x800e002f from world 72763 to dev "naa.600c0ff0002726a2690b0d5801000000" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2017-05-21T16:11:54.987Z cpu31:65616)ScsiDeviceIO: 2962: Cmd(0x439500e6ac40) 0x12, CmdSN 0x20e1 from world 0 to dev "naa.600c0ff00027270470efb15801000000" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.

2017-05-21T16:12:02.296Z cpu28:73453)WARNING: VSCSI: 3488: handle 8218(vscsi0:0):WaitForCIF: Issuing reset;  number of CIF:28
2017-05-21T16:12:02.296Z cpu28:73453)WARNING: VSCSI: 2645: handle 8218(vscsi0:0):Ignoring double reset
2017-05-21T16:12:13.197Z cpu5:70670)HBX: 2956: 'ssd1': HB at offset 3575808 - Waiting for timed out HB:
2017-05-21T16:12:13.197Z cpu5:70670)  [HB state abcdef02 offset 3575808 gen 33 stampUS 16052366931 uuid 59217d07-3ebfa0b3-c844-d89d676c1100 jrnl <FB 17825600> drv 14.81 lockImpl 4 ip 172.16.124.14]
2017-05-21T16:12:13.996Z cpu5:65791)HBX: 2956: 'ssd2': HB at offset 3575808 - Waiting for timed out HB:
2017-05-21T16:12:13.996Z cpu5:65791)  [HB state abcdef02 offset 3575808 gen 33 stampUS 16049366467 uuid 59217d07-3ebfa0b3-c844-d89d676c1100 jrnl <FB 4058856> drv 14.81 lockImpl 4 ip 172.16.124.14]
2017-05-21T16:12:13.998Z cpu27:65792)HBX: 2956: 'ssd2': HB at offset 3575808 - Waiting for timed out HB:
2017-05-21T16:12:13.998Z cpu27:65792)  [HB state abcdef02 offset 3575808 gen 33 stampUS 16049366467 uuid 59217d07-3ebfa0b3-c844-d89d676c1100 jrnl <FB 4058856> drv 14.81 lockImpl 4 ip 172.16.124.14]
2017-05-21T16:12:18.717Z cpu16:70671)HBX: 2956: 'ssd2': HB at offset 3575808 - Waiting for timed out HB:
2017-05-21T16:12:18.717Z cpu16:70671)  [HB state abcdef02 offset 3575808 gen 33 stampUS 16049366467 uuid 59217d07-3ebfa0b3-c844-d89d676c1100 jrnl <FB 4058856> drv 14.81 lockImpl 4 ip 172.16.124.14]