HPE SimpliVity
1833758 Members
2200 Online
110063 Solutions
New Discussion

Re: ESXi freeze causing Omnistack problem?

 
KptH
Occasional Advisor

ESXi freeze causing Omnistack problem?

Hi Simplivity gurus!

Simplivity 4.1.3

ESXi, 7.0U3i

Running on DL380 Gen10

Had an incident earlier today, never seen before, where one out of two OVC in a cluster suddenly stopped working for a short period of time. A failover took place and all the VM’s survived (just some minor hiccups) and after just a few more seconds the frozen OVC was up’n running again.

Looking in the Omnistack log i found this kind of logging indicating freezing taking place in ESXi?

Sep 19 13:49:10 omnicube-ip162-162 kernel: [12193571.460600] watchdog: BUG: soft lockup - CPU#3 stuck for 27s! [HangDetector:1143]

Sep 19 13:49:10 omnicube-ip162-162 kernel: [12193571.460669] watchdog: BUG: soft lockup - CPU#0 stuck for 31s! [vmtoolsd:883]

Sep 19 13:49:10 omnicube-ip162-162 kernel: [12193571.460716] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [irqbalance:727]

Sep 19 13:49:10 omnicube-ip162-162 kernel: [12193571.460717] Modules linked in: tiadriver(O) ses enclosure xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c xt_tcpmss iptable_filter vmw_vsock_vmci_transport vsock smartpqi(O) scsi_transport_sas binfmt_misc megaraid_sas(O) nls_iso8859_1 crct10dif_pclmul crc32_pclmul ghash_clmulni_int

el pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd joydev vmw_balloon psmouse vmxnet3 vmw_vmci pata_acpi

Sep 19 13:49:10 omnicube-ip162-162 kernel: [12193571.460737] watchdog: BUG: soft lockup - CPU#2 stuck for 31s! [swapper/2:0]

Sep 19 13:49:10 omnicube-ip162-162 kernel: [12193571.461888] CPU: 3 PID: 1143 Comm: HangDetector Tainted: G           O L  4.14.275-svt1 #2

Sep 19 13:49:10 omnicube-ip162-162 kernel: [12193571.461889] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.18227214.B64.2106252220 06/25/2021

In ESXi vmkernel.log I found this;

2023-09-19T13:47:18.286Z cpu48:2104206)FSS: 7377: Failed to open file 'hpilo-d0ccb4'; Requested flags 0x5, world: 2104206 [amsd], (Existing flags 0x5, world: 2104177 [sut]): Busy

2023-09-19T13:47:18.286Z cpu48:2104206)FSS: 7377: Failed to open file 'hpilo-d0ccb3'; Requested flags 0x5, world: 2104206 [amsd], (Existing flags 0x5, world: 2104177 [sut]): Busy

2023-09-19T13:47:18.413Z cpu48:2104206)FSS: 7377: Failed to open file 'hpilo-d0ccb4'; Requested flags 0x5, world: 2104206 [amsd], (Existing flags 0x5, world: 2104177 [sut]): Busy

2023-09-19T13:47:18.413Z cpu48:2104206)FSS: 7377: Failed to open file 'hpilo-d0ccb3'; Requested flags 0x5, world: 2104206 [amsd], (Existing flags 0x5, world: 2104177 [sut]): Busy

2023-09-19T13:48:28.027Z cpu46:36206693)WARNING: VisorFS: 1094: Attempt to setattr non sticky dir/file from tar mount

2023-09-19T13:48:49.118Z cpu39:2108722 opID=f175a48a)NFS: 6257: Status:No connection. Retrying synchronous write I/O 1 of 25 times

When googling “Failed to open file 'hpilo-d0ccb4'” I found some references to problems in the ilo-driver causing ESXi (6.7...) to freeze. Anyone who has experienced this problem running 4.1.3 + ESXi 7.0U3i + SVTSP-2023_0110? A known problem solved if updating the stack to the latest 7.0U3n / 2023_0913?

Or my analysis is maybe totally wrong? Other ideas about what was going on?

///Kpt H

5 REPLIES 5
gustenar
HPE Pro

Re: ESXi freeze causing Omnistack problem?

Hello @KptH

 

Yea, this is indeed a known condition. Check VMware's KB:

https://kb.vmware.com/s/article/91212

 

There is a workaround there, but the fix is included on version VMware vSphere ESXi 7.0 U3n which we now support. 

 



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
KptH
Occasional Advisor

Re: ESXi freeze causing Omnistack problem?

Hi @gustenar!

Thanks a lot for the link!

Do you have any insights into where the particular circular dependency taking place in a SVT-environment (the KB says "There is a circular dependency between a resource provided by a VM and the ESXi host consuming this resource (NFS mount)." I mean if ESXi logging not taking place in VMFS-datastores hosted by the OVC there shouldn't be any, or?

BR

Kpt H

 

gustenar
HPE Pro

Re: ESXi freeze causing Omnistack problem?

In a SimpliVity system all the datastores used by VMs are NFS datastores. Only the OVC lives in a VMFS datastore. 



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
KptH
Occasional Advisor

Re: ESXi freeze causing Omnistack problem?

@gustenar Right, my mistake! But still, where is the "circular dependency" taking place between ESXi and OVC?

BR

Kpt. H

Vinky_99
Esteemed Contributor

Re: ESXi freeze causing Omnistack problem?

@KptH

Good day! 

The term "circular dependency" in the context of SimpliVity and ESXi may not refer to a literal circular dependency between ESXi and the OVCs but rather a dependency loop or issue that involves the interaction between these components. In the context you provided, it's not a traditional circular dependency like you might encounter in software development or configuration management.

The specific log message you mentioned, "There is a circular dependency between a resource provided by a VM and the ESXi host consuming this resource (NFS mount)," does not describe a circular dependency between ESXi and OVCs but rather a circular dependency within the VMs or between VMs and the ESXi hosts related to NFS mounts.

To clarify, here's what the message implies:

1. Resource Provided by a VM: One or more VMs in your environment are providing a resource. This resource could be an NFS share, a network service, or something similar.

2. ESXi Host Consuming This Resource: The ESXi hosts are consuming or utilizing this resource. In the context of your log message, it appears to be related to an NFS mount. ESXi hosts often mount NFS shares from storage devices or other VMs to access data.

3. Circular Dependency: The message indicates that there is a circular dependency or loop within this configuration. For example, VM A might be exporting an NFS share, and VM B or the same ESXi host might be consuming that share, creating a loop or dependency that can cause instability or issues.

To pinpoint the exact source of this circular dependency, you would need to investigate your VM configurations, including their NFS shares, and the ESXi host configurations to ensure that NFS mounts are correctly set up and there are no unexpected dependencies or loops. Check VM configurations, ESXi host settings, and the NFS server configurations to identify and rectify any misconfigurations or circular dependencies.

Hope this helps! Let me know...

These are my opinions so use it at your own risk.