Serviceguard
cancel
Showing results for 
Search instead for 
Did you mean: 

SGX 11.18 problem - vgchange -a n fails but it tries to stop md mirrors!

Unix Team_6
Advisor

SGX 11.18 problem - vgchange -a n fails but it tries to stop md mirrors!

Hi all,

Im hoping the HP Serviceguard team see this and can help. I will log a ticket for HP but often the forums are faster...

Our config;
2xDL580's running Redhat 4.6/64bit with MCSG A11.18 and SGX and san mirrored using mdadm.

Problem;
On cmahltpkg the deactivate_vg routine fails (due to unable to kill processes holding lvols open) but the package control script then tries to deactivate the md mirrors anyway! so we are left in a state where some md mirrors are stopped (mdadm -S) but others fail to stop as the vg is still active as some of its lvols are still active. Seems like a code bug to me.

Here is the code in question;

elif [[ "$1" = "stop" ]]
then
echo -e "\n####### Node \"$(hostname)\": Halting package at $(date) #######"

check_gfs

halt_services

customer_defined_halt_cmds

if [[ "$HA_APP_SERVER" = "post-IP" ]]
then
verify_ha_server $1
fi

remove_ip_address

if [[ "$HA_APP_SERVER" = "pre-IP" ]]
then
verify_ha_server $1
fi

umount_fs

deactivate_volume_group

deactivate_md

verify_physical_data_replication $1

# Check exit value
if (( $exit_value == 1 ))
then
echo "###### Node \"$(hostname)\": Package halted with ERROR at $(date) ######"
exit 1

Now it seems to me that if the part above which says umount_fs OR the deactivate_volume_group fail then it SHOULD NOT try to run the deactivate_md routine as this causes a mess resulting in having to reboot the nodes.

So I think there should be a variable set globally and in the umount_fs and deactivate_volume_group sections if they fail doing an umount or vgchange -a n then this variable is set and before the deactivate_md is run above it checks this variable before deciding to do it or not.

Anyone else had a similar problme ? Any comments about this ?

The same should apply to the pkg "start" section - if activating mirrors fails it should not try to activate vg or mount lvols.

This is not the first time we have found bugs in the package sh script and had to fix it. Maybe HP know about it or have a new version which fixes the above issues ?
Thanks,

Unix Team
6 REPLIES
Prasu
Frequent Advisor

Re: SGX 11.18 problem - vgchange -a n fails but it tries to stop md mirrors!

Hi,

We had the same prolem. In our server some of the scripts from cron was accessing the lvs every minutes , so cmhaltpkg was not able to unmount the lvs.

What we used to hash the crontab entries before starting cmhaltpkg command.

You can add some entries in contrl file of package to kill the process when cmhalt trying to unmount the lvs



Regards
Prasu
Serviceguard for Linux
Honored Contributor

Re: SGX 11.18 problem - vgchange -a n fails but it tries to stop md mirrors!

Are you running XDC? That is the only way MD mirroring is supported because of the possibility of data corruption.
Ragu_3
Trusted Contributor

Re: SGX 11.18 problem - vgchange -a n fails but it tries to stop md mirrors!

>> Seems like a code bug to me

Looks more like an interoperability issue. The cmahltpkg is causing more volume groups to be corrupted, maybe HP will work closely with the upstream LVM hackers and get this issue fixed. The LVM hackers maybe hampered by dearth of test resources which HP has! Can HP help the community move forward?


--
A hacker is not a cracker
http://www.catb.org/jargon/html/index.html
Debian GNU/Linux for the Enterprise! Ask HP ...
Unix Team_6
Advisor

Re: SGX 11.18 problem - vgchange -a n fails but it tries to stop md mirrors!

Yes, we are running XDC.
Serviceguard for Linux
Honored Contributor

Re: SGX 11.18 problem - vgchange -a n fails but it tries to stop md mirrors!

Then it MAY be a bug. Call it in.
Unix Team_6
Advisor

Re: SGX 11.18 problem - vgchange -a n fails but it tries to stop md mirrors!

ive already logged a ticket. Ta.