Re: 5-6 seconds timeout (scsi_cmd/nmp errors) in ESXi 4.1 host and guests (BL460g6)

compiler · ‎03-28-2011

Hi all.

We're having strange timeouts in our Linux guests (Ubuntu 10.04 x86_64). A simple "df -h", or saving a file in vim, takes (not always) 5 seconds to execute. The machine continues being responsive (I can write characteres and they appear on the terminal, but I must wait for the prev command to execute).

I've executed a simple "df" in the host machine (Blade BL460g6 with ESXi 4.1u1 with only 2 RAID1 local disks), and it also takes (sometimes) the same 5-6 seconds to execute it. Those messages appear in /var/log/messages:

Mar 28 08:23:25 vmkernel: 19:17:32:35.454 cpu0:4112)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4102bf226d40) to NMP device "naa.600508b100103038363234383
4301100" failed on physical path "vmhba0:C0:T0:L0" H:0x1 D:0x0 P:0x0 Possible sense data: 0
Mar 28 08:23:47 x0 0x0 0x0.
Mar 28 08:23:25 vmkernel: 19:17:32:35.454 cpu0:4112)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4102bf244940) to NMP device "naa.600508b100103038363234383
4301100" failed on physical path "vmhba0:C0:T0:L0" H:0x1 D:0x0 P:0x0 Possible sense data: 0
Mar 28 08:23:47 x0 0x0 0x0.
Mar 28 08:23:26 vmkernel: 19:17:32:36.446 cpu1:4468)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.600508b1001030383632343834301100"
- issuing command 0x4102bfa0a740
Mar 28 08:23:26 vmkernel: 19:17:32:36.451 cpu0:4112)NMP: nmpCompleteRetryForPath: Retry world recovered device "naa.600508b1001030383632343834301100"
Mar 28 08:23:26 vmkernel: 19:17:32:36.451 cpu0:4112)scsi_cmd_alloc returned NULL!
Mar 28 08:23:26 vmkernel: 19:17:32:36.451 cpu0:4112)scsi_cmd_alloc returned NULL!
Mar 28 08:23:26 vmkernel: 19:17:32:36.451 cpu0:4112)scsi_cmd_alloc returned NULL!
Mar 28 08:23:26 vmkernel: 19:17:32:36.451 cpu0:4112)scsi_cmd_alloc returned NULL!
Mar 28 08:23:26 vmkernel: 19:17:32:36.451 cpu0:4112)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4102bfa7e140) to NMP device "naa.600508b100103038363234383
4301100" failed on physical path "vmhba0:C0:T0:L0" H:0x1 D:0x0 P:0x0 Possible sense data: 0
Mar 28 08:23:47 x0 0x0 0x0.
Mar 28 08:23:26 vmkernel: 19:17:32:36.451 cpu0:4112)WARNING: NMP: nmp_DeviceRetryCommand: Device "naa.600508b1001030383632343834301100": awaiting fast path sta
te update for failover with I/O blocked. No prior reservation exists on the device.
Mar 28 08:23:26 vmkernel: 19:17:32:36.451 cpu0:4112)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4102bf246140) to NMP device "naa.600508b100103038363234383
4301100" failed on physical path "vmhba0:C0:T0:L0" H:0x1 D:0x0 P:0x0 Possible sense data: 0
Mar 28 08:23:47 x0 0x0 0x0.
Mar 28 08:23:26 vmkernel: 19:17:32:36.451 cpu0:4112)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4102bf2a0c40) to NMP device "naa.600508b100103038363234383
4301100" failed on physical path "vmhba0:C0:T0:L0" H:0x1 D:0x0 P:0x0 Possible sense data: 0
Mar 28 08:23:47 x0 0x0 0x0.
Mar 28 08:23:26 vmkernel: 19:17:32:36.451 cpu0:4112)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4102bf2ca140) to NMP device "naa.600508b100103038363234383
4301100" failed on physical path "vmhba0:C0:T0:L0" H:0x1 D:0x0 P:0x0 Possible sense data: 0
Mar 28 08:23:47 x0 0x0 0x0.
Mar 28 08:23:27 vmkernel: 19:17:32:37.446 cpu1:4468)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.600508b1001030383632343834301100"
- issuing command 0x4102bfa7e140
Mar 28 08:23:27 vmkernel: 19:17:32:37.454 cpu0:4112)NMP: nmpCompleteRetryForPath: Retry world recovered device "naa.600508b1001030383632343834301100"
(...)

We have only local RAID 1 disks (not SAN storage), and we have lots of other ESXi running perfectly...

Googling about the error talks about buying the "Battery Backed Write Caché", but we don't have it in our other ESXis and they don't present the same errors or timeouts.

Thanks for any help.

compiler · ‎03-29-2011

Answering myself... (in case I can help others with the same problem).

Following [1] recommendations, we've bought and installed the Battery Backed Write Cache Module (BBWC) and the problem dissapeared:

[1] http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c01832427〈=en&cc=us&taskId=135&prodSeriesId=420496&prodTypeId=18964

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: 5-6 seconds timeout (scsi_cmd/nmp errors) in ESXi 4.1 host and guests (BL460g6)

5-6 seconds timeout (scsi_cmd/nmp errors) in ESXi 4.1 host and guests (BL460g6)

Re: 5-6 seconds timeout (scsi_cmd/nmp errors) in ESXi 4.1 host and guests (BL460g6)