HPE Morpheus VM Essentials
1844111 Members
3045 Online
110227 Solutions
New Discussion

Any known issues with GFS2 Pools and Reservation Conflicts on all 3 Nodes of the Cluster.

 
I123Habc
Occasional Advisor

Any known issues with GFS2 Pools and Reservation Conflicts on all 3 Nodes of the Cluster.

I have been having issues with the GFS2 cluster volume on 3 nodes of HPE-VM presented via a virtual iSCSI device whereby the GFS2 pool was set up with the Morpheus WebUI.

At first worked seemlessly and then there started reservation confilicts and then restarting various sevices etc and disabling and re-enabling the clone got the GFS2 back mounted and everything seemed fine until the power on of a VM and then the access was dropped once again.

Literally then just migrated the VM to the next node that could read the GFS2 POOL and then once trying to power on the VM i/o errors and lost access.
Everynode that failed had the same message in the journalctl

Sep 19 13:19:26 ih-hpe24node3 kernel: gfs2: fsid=21k17fiak9rbbo2:gfs2pool.2: Error 6 writing to journal, jid=2
Sep 19 13:19:26 ih-hpe24node3 kernel: gfs2: fsid=21k17fiak9rbbo2:gfs2pool.2: about to withdraw this file system
Sep 19 13:19:31 ih-hpe24node3 kernel: gfs2: fsid=21k17fiak9rbbo2:gfs2pool.2: Requesting recovery of jid 2.
Sep 19 13:19:31 ih-hpe24node3 kernel: gfs2: fsid=21k17fiak9rbbo2:gfs2pool.2: Journal recovery complete for jid 2.
Sep 19 13:19:31 ih-hpe24node3 kernel: gfs2: fsid=21k17fiak9rbbo2:gfs2pool.2: Glock dequeues delayed: 0
Sep 19 13:19:31 ih-hpe24node3 kernel: gfs2: fsid=21k17fiak9rbbo2:gfs2pool.2: telling LM to unmount
Sep 19 13:19:31 ih-hpe24node3 kernel: dlm: gfs2pool: leaving the lockspace group...
Sep 19 13:19:31 ih-hpe24node3 kernel: gfs2: fsid=21k17fiak9rbbo2:gfs2pool.2: recover_prep ignored due to withdraw.
Sep 19 13:19:31 ih-hpe24node3 kernel: dlm: gfs2pool: group event done 0
Sep 19 13:19:31 ih-hpe24node3 kernel: dlm: gfs2pool: release_lockspace final free
Sep 19 13:19:31 ih-hpe24node3 kernel: gfs2: fsid=21k17fiak9rbbo2:gfs2pool.2: File system withdrawn
Sep 19 13:19:31 ih-hpe24node3 kernel: CPU: 77 PID: 56168 Comm: gfs2_logd/21k17 Not tainted 6.8.0-83-generic #83-Ubuntu
Sep 19 13:19:31 ih-hpe24node3 kernel:  gfs2_withdraw+0xd7/0x160 [gfs2]
Sep 19 13:19:31 ih-hpe24node3 kernel:  gfs2_log_flush+0x66d/0xb00 [gfs2]
Sep 19 13:19:31 ih-hpe24node3 kernel:  gfs2_logd+0x90/0x330 [gfs2]
Sep 19 13:19:31 ih-hpe24node3 kernel:  ? __pfx_gfs2_logd+0x10/0x10 [gfs2]

Whistl at the same time all 3 nodes have iscsi sessions to the storage + multipath is showing 4 paths to each node.

Seemed to then have this re-occuring error on 1 node

root@ih-hpe24node1:/# journalctl -u pacemaker -u corosync -f
Sep 19 13:47:18 ih-hpe24node1 fence_scsi[160930]: Please use '-h' for usage
Sep 19 13:47:18 ih-hpe24node1 pacemaker-fenced[158298]:  error: Operation 'reboot' [160929] targeting ih-hpe24node1 using hpevm_gfs2_scsi returned 1
Sep 19 13:47:18 ih-hpe24node1 pacemaker-fenced[158298]:  warning: hpevm_gfs2_scsi[160929] [ /usr/sbin/fence_scsi:268: SyntaxWarning: invalid escape sequence '\s' ]
Sep 19 13:47:18 ih-hpe24node1 pacemaker-fenced[158298]:  warning: hpevm_gfs2_scsi[160929] [   if not re.search(r"^" + dev + "\s+", out, flags=re.MULTILINE): ]
Sep 19 13:47:18 ih-hpe24node1 pacemaker-fenced[158298]:  warning: hpevm_gfs2_scsi[160929] [ 2025-09-19 13:47:18,051 ERROR: Failed: keys cannot be same. You can not fence yourself. ]
Sep 19 13:47:18 ih-hpe24node1 pacemaker-fenced[158298]:  warning: hpevm_gfs2_scsi[160929] [  ]
Sep 19 13:47:18 ih-hpe24node1 pacemaker-fenced[158298]:  warning: hpevm_gfs2_scsi[160929] [ 2025-09-19 13:47:18,051 ERROR: Please use '-h' for usage ]
Sep 19 13:47:18 ih-hpe24node1 pacemaker-fenced[158298]:  warning: hpevm_gfs2_scsi[160929] [  ]
Sep 19 13:47:18 ih-hpe24node1 pacemaker-fenced[158298]:  notice: Operation 'reboot' targeting ih-hpe24node1 by ih-hpe24node3 for pacemaker-controld.3266@ih-hpe24node3: Error occurred (complete)
Sep 19 13:47:18 ih-hpe24node1 pacemaker-controld[158302]:  notice: Peer ih-hpe24node1 was not terminated (reboot) by ih-hpe24node3 on behalf of pacemaker-controld.3266@ih-hpe24node3: Error


I have torn down the GFS2 and started again and will see if I encounter the same instablitly.

Current version I am using is:

v. 8.0.7-2

Description: Ubuntu 24.04.3 LTS
Release: 24.04
Codename: noble

Will see how I get on with a new deployment of a GFS2 POOL - I did have HW issues on one host - However given a 3 node cluster I would expect it to be able to tolerate the loss of 1 node.

 

I have also tried this with 22.04 release

There seems to be an issue with PR reservations that possibly looks like the issue that is logged here by red-hat and it is questionable if the same fix has been applied to the ubuntu image used by hpe-VM

https://bugzilla.redhat.com/show_bug.cgi?id=2164869

The iscsi Storage is showing all kinds of  PR reservation issues

Nov 7 09:00:43 SR iqn.2024-12.com.hpe:ih-node1:11556:23d000011-1, Cmd <Write> PR Conflict accessing Stor1:0
Nov 7 09:00:43 SR iqn.2024-12.com.hpe:ih-node1:11556:23d000014-1, Cmd <Write> PR Conflict accessing Stor1:0
Nov 7 09:00:48 SR iqn.2024-12.com.hpe:ih-node2:55431:23d000014-1, Cmd <Write> PR Conflict accessing Stor1:0
Nov 7 09:00:48 SR iqn.2024-12.com.hpe:ih-node2:55431:23d000011-1, Cmd <Write> PR Conflict accessing Stor1:0


We have tried changing the path_checker policy as a workaround but to no avail.

All is fine until a VM is attempted to be moved to another node or a compute node is shutdown - the cluster itself and access to the storage shows as fine - the issue it seems is with access to the GFS2 file system

Nov 07 09:11:10 ih-node1 kernel: INFO: task gfs2_logd/m2rf6:668262 blocked for more than 614 seconds.
Nov 07 09:11:10 ih-node1 kernel: task:gfs2_logd/m2rf6 state:D stack:0 pid:668262 tgid:668262 ppid:2 flags:0x00004000
Nov 07 09:11:10 ih-node1 kernel: gfs2_glock_wait+0x44/0xd0 [gfs2]
Nov 07 09:11:10 ih-node1 kernel: ? run_queue+0xa6/0x1d0 [gfs2]
Nov 07 09:11:10 ih-node1 kernel: gfs2_glock_nq+0xa5/0x2c0 [gfs2]
Nov 07 09:11:10 ih-node1 kernel: gfs2_inode_lookup+0x1c8/0x430 [gfs2]
Nov 07 09:11:10 ih-node1 kernel: ? signal_our_withdraw+0x1dc/0x4c0 [gfs2]
Nov 07 09:11:10 ih-node1 kernel: signal_our_withdraw+0x1dc/0x4c0 [gfs2]
Nov 07 09:11:10 ih-node1 kernel: gfs2_withdraw+0x77/0x160 [gfs2]
Nov 07 09:11:10 ih-node1 kernel: gfs2_logd+0x1ef/0x330 [gfs2]
Nov 07 09:11:10 ih-node1 kernel: ? __pfx_gfs2_logd+0x10/0x10 [gfs2]
Nov 07 09:11:10 ih-node1 kernel: gfs2_glock_dq+0x126/0x130 [gfs2]
Nov 07 09:11:10 ih-node1 kernel: gfs2_glock_dq_uninit+0x14/0x60 [gfs2]
Nov 07 09:11:10 ih-node1 kernel: gfs2_dirty_inode+0x1ca/0x2c0 [gfs2]
Nov 07 09:11:10 ih-node1 kernel: gfs2_update_time+0x73/0xe0 [gfs2]
Nov 07 09:11:10 ih-node1 kernel: gfs2_file_write_iter+0x29c/0x4a0 [gfs2]
root@ih-node1:/var/opt/morpheus-node/vm# journalctl | grep -i gfs2

 

# pcs resource
* Clone Set: dlm-clone [dlm]:
* Started: [ ih-node1 ih-node2 ih-node3 ]
* Clone Set: san_aece3-clone [san_aece3]:
* Started: [ ih-node1 ih-node2 ih-node3 ]
root@ih-node2:/home/smadmin#

 

root@ih-node1:/var/opt/morpheus-node/vm# ls -l /mnt/a3ba5335-39f1-4222-9cc7-5ba93caaece3
ls: cannot access '/mnt/a3ba5335-39f1-4222-9cc7-5ba93caaece3': Input/output error
root@ih-node1:/var/opt/morpheus-node/vm# lsblk
NAME                      MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINTS
loop1                       7:1    0 63.8M  1 loop  /snap/core20/2669
loop2                       7:2    0 50.9M  1 loop  /snap/snapd/25577
loop3                       7:3    0 89.4M  1 loop  /snap/lxd/31333
loop4                       7:4    0 63.8M  1 loop  /snap/core20/2682
loop5                       7:5    0 50.8M  1 loop  /snap/snapd/25202
loop6                       7:6    0 91.4M  1 loop  /snap/lxd/35819
sda                         8:0    0  1.7T  0 disk
├─sda1                      8:1    0    1G  0 part  /boot/efi
├─sda2                      8:2    0    2G  0 part  /boot
└─sda3                      8:3    0  1.7T  0 part
  └─ubuntu--vg-ubuntu--lv 252:0    0    1T  0 lvm   /
sdb                         8:16   0  1.7T  0 disk
└─md127                     9:127  0    0B  0 -1
sdc                         8:32   1    0B  0 disk
sdd                         8:48   0  200G  0 disk
└─2000339775c260001       252:1    0  200G  0 mpath /mnt/a3ba5335-39f1-4222-9cc7-5ba93caaece3
sde                         8:64   0  200G  0 disk
└─2000339775c260001       252:1    0  200G  0 mpath /mnt/a3ba5335-39f1-4222-9cc7-5ba93caaece3
sdf                         8:80   0  200G  0 disk
└─2000339775c260001       252:1    0  200G  0 mpath /mnt/a3ba5335-39f1-4222-9cc7-5ba93caaece3
sdg                         8:96   0  200G  0 disk
└─2000339775c260001       252:1    0  200G  0 mpath /mnt/a3ba5335-39f1-4222-9cc7-5ba93caaece3
root@ih-node1:/var/opt/morpheus-node/vm# iscsiadm -m session
tcp: [17] 192.168.80.38:3260,1 iqn.2006-06.com.stor:775c260200000016.stor1 (non-flash)
tcp: [18] 192.168.80.39:3260,3 iqn.2006-06.com.stor:775c260200000016.stor1 (non-flash)
tcp: [19] 192.168.81.39:3260,3 iqn.2006-06.com.stor:775c260200000016.stor1 (non-flash)
tcp: [20] 192.168.81.38:3260,1 iqn.2006-06.com.stor:775c260200000016.stor1 (non-flash)
root@ih-node1:/var/opt/morpheus-node/vm#

 

Are there any addititional debug files/outputs that would be required to try to understand where this is going wrong? Is there anyone from Engineering/Product reviewing these posts?

Any helfpful suggestions would be greatly appreciated.

I have the outputs from this command if it would be helpful - was trying to find a way to attach it here.

cat /sys/kernel/debug/gfs2/m2rf6me3cnh2pf\:san/glocks > /tmp/gfs2_glocks_dump_node1.log