HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
Serviceguard
cancel
Showing results for 
Search instead for 
Did you mean: 

mdadm: fail to stop array /dev/md0: Device or resource busy

 
Unix Team_6
Advisor

mdadm: fail to stop array /dev/md0: Device or resource busy

Hi,

we use Serviceguard on Linux on Proliants with mirrored san devices. ocassionally when we halt a package madam fails to stop. we get this error;

mdadm: fail to stop array /dev/md0: Device or resource busy

only a reboot fixes it. manually tring to stop mdadm doesnt work. Our package is using EMC san devices in their own vg (vgsan01).

Anyone got any idea on a more reliable method of stopping mdadm rather than rebooting or finding out why mdadm refuses to stop sometimes ?

8 REPLIES
Steven E. Protter
Exalted Contributor

Re: mdadm: fail to stop array /dev/md0: Device or resource busy

Shalom,

Using software raid on a EMC san is a little bit beyond what the designers of mdadm had in mind. I guess however it is supported.

So.

cat /proc/mdstat

Before triggering the problem.

Then make the problem happen by bringing down the package.

cat /proc/mdstat

Post the report back up here.

/var/log/messages or dmesg may be helpful.

Knownt what distribution of Linus you are using would be extremely helpful.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Stuart Browne
Honored Contributor

Re: mdadm: fail to stop array /dev/md0: Device or resource busy

What might also be of use is the output of 'lsof /dev/md0' when it's having this issue.
One long-haired git at your service...
Serviceguard for Linux
Honored Contributor

Re: mdadm: fail to stop array /dev/md0: Device or resource busy

Are you using the Extended Distance Cluster product that is available for Serviceguard for Linux? That is the ONLY way MD mirroring is supported. There is a potential for data corruption with a "do it yourself" process.

To understand more about XDC the docs are available here: http://docs.hp.com/en/ha.html
Unix Team_6
Advisor

Re: mdadm: fail to stop array /dev/md0: Device or resource busy


Im very interested in the comment by the Serviceguard for Linux team that mdadm is only supported on EDC (extended distance cluster). This is news to us. We run it fine on over a dozen non extended clusters and it works beautifully.

Why does HP only support it on EDC ??

According to the link you sent the latest manual on managing Serviceguard (7th ed) talks about using mdadm to mirror but mentions nothing about it only being supported on EDC. So why mention it ? We've been using mdadm on linux SG clusters since before EDC was ever released and we found lots of docs from HP about how to set it up and use it - thats how we installed it in the first place. Its been working fine for over 18 months on some clusters without any problems at all.

Please explain some more ? We are very shocked by your comment about it being only supported on EDC.

As for our problem its fixed. The problem was the db didnt shutdown - it was still running. SG failed to kill all the pids so it forced deactivated the VG but with oracle processes still running mdadm wouldnt (rightly so) shutdown. Unlike SG on HPUX (which will show a pkg shutdown fail if processes still using it) on Linux it simply does a force deactivate anyway and says the pkg shutdown was good ! interesting. We will put some extra checks in for this so next time if the app is still running the pkg shutdown will fail so we can investigate before trying to shutdown mdadm and restart it on the other node.

Unix Team_6
Advisor

Re: mdadm: fail to stop array /dev/md0: Device or resource busy

Sorry, to add we are using Redhat AS4_u4 mostly and now AS4_u5.
Unix Team_6
Advisor

Re: mdadm: fail to stop array /dev/md0: Device or resource busy

ok, ive found some more info.

in the XDC manual it says "Earlier versions of Serviceguard supported MD as a multipathing software".

So - before XDC was born MD is supported with Serviceguard on Redhat but only in basic multipath mode - which seems a bit strange to me. MD is mainly for mirroring and multipathing. Seems wierd HP only supported the multipathing part of it. Also I suspect its a marketing ploy - to make money as if you want MD mirroring "support" from HP you have to purchase XDC as well as Serviceguard when in fact MD works just fine without XDC.

We're running all our clusters on EMC San too which I note is also unsupported - HP want you to buy their XP/EVA arrays in order to get "support" from HP when in fact in works fine on EMC too. Another marketing ploy if you ask me.
Serviceguard for Linux
Honored Contributor

Re: mdadm: fail to stop array /dev/md0: Device or resource busy

It is not a marketing ploy.

If you read the MD docs, it indicates that it is not designed for sharing in clusters. There is at least one case that can cause data corruption. XDC has code to prevent that case. That's why we did not support MD for SW mirroring until we released that product.

In unsupported configurations we cannot respond to data corruption issues since it may be related to that.
Unix Team_6
Advisor

Re: mdadm: fail to stop array /dev/md0: Device or resource busy

Thanks Serviceguard for Linux Team.

We run EMC powerpath on all our Linux Serviceguard clusters and as well as the prod db on node A we run a standby db on node B via Dataguard so we're pretty safe from any corruption. We test all our clusters rigorously before deployment by pulling out cables (san, network, power) while theyre up and running and they always continue running or failover aok without any problems.

Weve never ever seen a corruption on a Serviceguard cluster on HPUX or Linux - no matter how hard we try to create one:-)

Still - in future I think we will deploy XDC on any Linux Serviceguard clusters - just to be safe.