Operating System - VMware
1819504 Members
3281 Online
109603 Solutions
New Discussion

[LINUX KERNEL] Add. Sense: Invalid field in cdb

 
RAPHAELLEB
Collector

[LINUX KERNEL] Add. Sense: Invalid field in cdb

Hello,

 

Issue encountered :

Since the middle of november 2024, arround 75% of my virtual machines had a SCSI message spam to journalctl logs :

Feb 25 16:29:06 virtualmachine kernel: sd 0:0:1:0: [sdb] tag#667 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Feb 25 16:29:06 virtualmachine kernel: sd 0:0:1:0: [sdb] tag#667 Sense Key : Illegal Request [current] 
Feb 25 16:29:06 virtualmachine kernel: sd 0:0:1:0: [sdb] tag#667 Add. Sense: Invalid field in cdb
Feb 25 16:29:06 virtualmachine kernel: sd 0:0:1:0: [sdb] tag#667 CDB: Write same(16) 93 08 X X X X X X X X 00 00 00 18 00 00
Feb 25 16:29:06 virtualmachine kernel: blk_update_request: critical target error, dev sdb, sector 14855984 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 


It's always the same disque : /dev/sdb (2nd disk) and the same error messages :

 

 

FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Sense Key : Illegal Request [current] 
Add. Sense: Invalid field in cdb
CDB: Write same(16) 93 08
blk_update_request: critical target error, dev sdb, sector X op 0x9:(WRITE_ZEROES)

 

 

 

At what point ?

  • It's totally at random times of the day. Virtual machines can had arround 1000 lines in journalctl like 0 lines
  • There are no logs that seem to be related to the problem in vmkernel.log of all ESXI.
  • There is no VMWARE ESXI and datastores in common. It affect 3/4 of virtual machines at all ESXI and datastores
  • Messages can be visible at all different templates of virtual machines (databases, app, gui...).
  • Some virtual machines belong to instances (bunch of arround 24 virtual machines) do the same work as others with same version and configurations but don't have these messages.

 

Technical environnement specifications :

  • Storage vendor : HP 3PAR 9450
  • OS storage vendor version : 3.3.2 MU1 (P15)
  • SAN switch vendor between storage and bladecenter  : Brocade
  • SAN switch vendor integreted to BladeCenter : Brocade 16Gb/28c SAN Switch
  • BladeCenter vendor / model : HPE  BladeSystem c7000 Enclosure G3
  • BladeCenter firmware version : 4.90
  • Blade Servers vendor and model : HPE ProLiant BL460c Gen9 (32 servers) & ProLiant BL460c Gen10 (16 servers).
  • OS vendor installed on blade servers : VMware ESXi 7.0.3 Build 23794027
  • Datastores type : VMFS 5 & 6
  • Disk provision type on guest OS side : Thick Provsion Lazy zeroed
  • OS guest (virtual machines) vendor and version : Linux RedHat 8.4 to Linux RedHat 8.10
  • Kernel version on OS guest : 

4.18.0-305.el8.x86_64 and

4.18.0-425.3.1.el8.x86_64 and

4.18.0-553.8.1.el8_10.x86_64

  • VMWARE TOOLS version: 12.3.5.46049 (build-22544099)
  • Hardware compatibility : Vers 19
  • disk architecture inside Linux virtual machines : LVM

 

Firmware of blade servers gen9 :

  • HP FlexFabric 10Gb 2-port 534M Adapter 7.18.82 Slot 1
  • HP FlexFabric 10Gb 2-port 536FLB Adapter 7.18.82 Embedded
  • HP QMH2572 8Gb 2P FC HBA - FC 08.08.01 Slot 2
  • iLO 2.82 Feb 06 2023 System Board
  • Intelligent Platform Abstraction Data 25.00 System Board
  • Intelligent Provisioning 2.50.164 System Board
  • Power Management Controller Firmware 1.0.9 System Board
  • Power Management Controller FW Bootloader 1.0 System Board
  • Redundant System ROM I36 v2.60 (05/21/2018) System Board
  • SAS Programmable Logic Device Version 0x03 System Board
  • Server Platform Services (SPS) Firmware 3.1.3.21.4 System Board
  • Smart HBA H244br 7.00 Embedded
  • System Programmable Logic Device Version 0x17 System Board
  • System ROM I36 v2.90 (04/29/2021) System Board

 

Firmware of blade servers gen10 :

  • Drive HPG4 Port=1I:Box=1:Bay=1  
  • Drive HPG4 Port=1I:Box=1:Bay=2  
  • Embedded Video Controller 2.5 Embedded Device  
  • HP FlexFabric 10Gb 2-port 534M Adapter 7.18.82 Mezzanine Slot 2  
  • HP FlexFabric 10Gb 2-port 536FLB Adapter 7.18.82 Embedded ALOM  
  • HP QMH2672 16Gb FC HBA for BladeSystem c-Class 8.08.232 Mezzanine Slot 1  
  • HPE Smart Array P204i-b SR Gen10 4.11 Embedded RAID  
  • HPE Smart Storage Energy Pack 1 Firmware 0.70 Embedded Device  
  • iLO 5 2.55 Oct 01 2021 System Board   Innovation Engine (IE) Firmware 0.2.2.3 System Board  
  • Intelligent Platform Abstraction Data 9.4.0 Build 18 System Board  
  • Intelligent Provisioning 3.31.63 System Board  
  • Power Management Controller Firmware 1.0.7 System Board  
  • Power Management Controller FW Bootloader 1.1 System Board  
  • Redundant System ROM I41 v2.54 (09/03/2021) System Board  
  • Server Platform Services (SPS) Descriptor 1.2 0 System Board  
  • Server Platform Services (SPS) Firmware 4.1.4.505 System Board  
  • System Programmable Logic Device 0x1E System Board  
  • System ROM I41 v3.34 (09/30/2024) System Board

 

Impact

Actually, no impact has been detected but since the word "critical" is in the message, it sends a large number of tickets to our monitoring tool.

And it can be difficult for application vendor when there is an application issue to debug it.

 

What i tried

First of all, it's very difficult to know who's causing the problem, as there are many intermediaries between storage and Linux virtual machines.

At my side, I'm only in charge of bladecenters up to Linux virtual machines. The part of SAN switchs and storage are managed by other team in my company. But i work with them to resolv it actually.

 

As I described above, it's not new.

The issue as been detected arround the middle of november 2024.

 

For the story :

  • Begin of september 2024 to begin of december 2024 : Server application major updates of 5 instances (bunch of arround 24 virtual machines), which also require OS, kernel and system package updates.
  • End of october 2024 to end of november 2024 (48 ESXI to update) :  Minor VMWARE ESXI Update - 7.0.3 build 19482537 to build 23794027.
  • Middle of november : Massive tickets from monitoring tool as been detected on same error message.
  • Middle of novembre : A VMWARE ticket has been opened. They said : VMWARE TOOLS is not updated at latest version. I updated it during application updates of 5 instances.
  • End of january 2025 : Messages come back again on this 5 updated instances (not all virtual machines) + messages stays on not updated virtual servers instances.
  • Begin of febuary 2025 : A VMWARE ticket has been opened. They said to ask to storage vendor.
  • Begin of febuary 2025 : A RedHat ticket has been opened. They said to ask to storage vendor too.
  • Middle of febuary 2025 :  A ticket has been opened at storage team of my company side. It's always in progress.

 

It probably begin before the middle of november but it started to get really binding in the middle of november.

 

Howerver, someone from my company analyze my issue and tell me :

It's weird because :

1 - Your virtual machine disks are on Thick Provision

2 - Verification of VPD in-guest :

 

[root@virtualmachine ~]# sg_vpd --page=0xb2 /dev/sdb
Logical block provisioning VPD page (SBC):
  Unmap command supported (LBPU): 0

  Write same (16) with unmap bit supported (LBPWS): 0   <------ 

  Write same (10) with unmap bit supported (LBPWS10): 0  
  Logical block provisioning read zeros (LBPRZ): 0
  Anchored LBAs supported (ANC_SUP): 0
  Threshold exponent: 1
  Descriptor present (DP): 0
  Minimum percentage: 0 [not reported]
  Provisioning type: 0 (not known or fully provisioned)
  Threshold percentage: 0 [percentages not supported]

 

3 - Why the OS / app seems to send des WRITE_SAME(16) ?

Example : 

 

 

Feb 01 20:33:51 virtualmachine kernel: sd 0:0:1:0: [sdb] tag#130 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Feb 01 20:33:51 virtualmachine kernel: sd 0:0:1:0: [sdb] tag#130 Sense Key : Illegal Request [current]
Feb 01 20:33:51 virtualmachine kernel: sd 0:0:1:0: [sdb] tag#130 Add. Sense: Invalid field in cdb
Feb 01 20:33:51 virtualmachine kernel: sd 0:0:1:0: [sdb] tag#130 CDB: Write same(16) 93 08 X X X X X X X X 00 00 00 08 00 00
Feb 01 20:33:51 virtualmachine kernel: blk_update_request: critical target error, dev sdb, sector 30670800 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0

 

 

It's indeed the storage driver that responds in error on write_same (driver_sense) but what we need to understand is why the OS sends these SCSI commands when it's told that this is not supported?

 

I analyze If a service was the cause of the problem by use pidstat command :

 

 

pidstat -d 1

 

 

 

But no service or process seems to write on disk when message come :

 

 

Feb 25 16:24:04 virtualmachine kernel: blk_update_request: critical target error, dev sdb

04:24:02 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
04:24:03 PM   247   2152101      0.00     16.00      0.00       0  java

04:24:03 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
04:24:04 PM     0      1227    256.00    144.00      0.00       0  systemd-journal
04:24:04 PM   247   2152101      0.00   2052.00      0.00       0  java

04:24:04 PM     0   3220187      0.00      0.00      0.00       1  kworker/u256:0-flush-253:3
04:24:04 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
04:24:05 PM     0   3672057      0.00      8.00      0.00       0  rsyslogd

04:24:05 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
04:24:06 PM     0      1230      0.00     32.00      0.00       0  jbd2/dm-3-8

 

 

 

 

 

Feb 25 12:01:36 othervirtualmachine kernel: blk_update_request: critical target error, dev sdb,

12:01:33 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
12:01:34 PM     0      1214      0.00     24.00      0.00       0  jbd2/dm-10-8
12:01:34 PM     0      1257      0.00     60.00      0.00       0  systemd-journal
12:01:34 PM     0      2588      0.00      0.00      8.00       0  xxxxxxx

12:01:34 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command

12:01:35 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
12:01:36 PM     0      1257      0.00    116.00      0.00       0  systemd-journal
12:01:36 PM     0   1449141      0.00      0.00      0.00       1  kworker/u256:1-events_unbound

12:01:36 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
12:01:37 PM     0   1369065      0.00      4.00      0.00       0  pidstat
12:01:37 PM     0   3553469      0.00      4.00      0.00       0  vmtoolsd

12:01:37 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
12:01:38 PM     0       757      0.00     24.00      0.00       1  jbd2/dm-0-8
12:01:38 PM     0      1235      0.00     32.00      0.00       0  jbd2/dm-13-8
12:01:38 PM     0      1239      0.00      4.00      0.00       0  jbd2/dm-8-8
12:01:38 PM     0      1251      0.00     12.00      0.00       0  jbd2/dm-14-8
12:01:38 PM     0      1257      0.00     16.00      0.00       0  systemd-journal
12:01:38 PM     0      1260      0.00     48.00      0.00       0  jbd2/dm-6-8

 

 

 

 

 

Feb 25 09:53:44 othervirtualmachine kernel: blk_update_request: critical target error, dev sdb,

09:53:41 AM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
09:53:42 AM     0      1079      0.00      0.00      0.00       1  jbd2/dm-4-8
09:53:42 AM    27    275798    578.22    827.72      0.00       0  mysqld

09:53:42 AM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
09:53:43 AM    27    275798    376.00    656.00      0.00       0  mysqld

09:53:43 AM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
09:53:44 AM    27    275798     88.00    416.00      0.00       0  mysqld

09:53:44 AM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
09:53:45 AM     0      1047      0.00      4.00      0.00       0  jbd2/dm-5-8
09:53:45 AM     0      1059    348.00    156.00      0.00       0  systemd-journal
09:53:45 AM     0      1064      0.00      8.00      0.00       0  jbd2/dm-9-8
09:53:45 AM     0      1079      0.00      4.00      0.00       1  jbd2/dm-4-8
09:53:45 AM    27    275798    472.00    760.00      0.00       0  mysqld
09:53:45 AM     0   1135691      0.00      4.00      0.00       0  pidstat
09:53:45 AM     0   1143255      0.00      0.00      0.00       1  kworker/u256:0-events_unbound

09:53:45 AM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
09:53:46 AM    27    275798  13408.00    948.00      0.00       0  mysqld
09:53:46 AM     0   1169146      0.00      0.00      0.00       1  kworker/0:2-events_power_efficient

09:53:46 AM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
09:53:47 AM    27    275798  18688.00    744.00      0.00       0  mysqld

 

 

 

Another remark :

  • I can force the messages when i tried to format a LVM partition to ext4fs and remount it.
  • And when i tried, on another disk (/devsdc for example), to create new partition and link it to a new formated folder.
  • Messages seem to go away on virtual machines reboot but come back 2 weeks later

 

Can you help me to find out what tests can I still perform on the virtual servers ?
Or on VMWARE side ?

 

Regards.

3 REPLIES 3
utnoor
HPE Pro

[LINUX KERNEL] Add. Sense: Invalid field in cdb

Hi RAPHAELLEB,

The error messages indicate that kernel could not able to attach the disk, as it could be an issue from other side not from OS end,

kindly ask the VMware, 3PAR to review from their end. 

Please remember "it's not an OS issue"



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
RAPHAELLEB
Collector

Re: [LINUX KERNEL] Add. Sense: Invalid field in cdb

Hello,

I forgot to precise some informations :

Storage is shared internally to other projects with other ESXI platform. No other projects have this problem.
So the problem doesn't seems to be storage issue, since ESXI doesn't have any errors logs.

 

Apparently, I finally found out the difference between the Linux virtual machines that have the problem and those that don't.
It seems that the file "/sys/class/scsi_disk/0\:0\:1\:0/provisioning_mode" is set to "disabled" when they have the problem. The rest that don't have the problem are set to “unmap” or “full”.

 

The "disabled" mode virtual machines seems to switch to "full" mode after a simple reboot.

 

But that doesn't explain why they randomly change the provisioning_mode configuration in VMs when they have the same OS template with the same packages.

utnoor
HPE Pro

Re: [LINUX KERNEL] Add. Sense: Invalid field in cdb

Hi RAPHAELLEB,

If you have Red Hat login ID  you can review this article => https://access.redhat.com/solutions/1256863

As per Red Hat verified article it's not an OS issue, Those messages indicate that the server successfully submitted the IO to the target, but the target REJECTED our IO request with an error message.

This may indicate that the LUNS have been unpresented from the storage system, but the server was not aware of this change.

you storage vendor to ascertain; why and under what circumstances storage is returning the Illegal Request sense to the server

we can review the SCSI error code at the article ==> https://www.t10.org/lists/asc-num.htm

 

 

 

 



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo