1825768 Members
2016 Online
109687 Solutions
New Discussion

Re: HBMM Experience

 
SOLVED
Go to solution
Jack Trachtman
Super Advisor

HBMM Experience

I'm excited to see that HP finally released the
HBMM (Host-Based Mini-Merge) functionality (as a Patch), but I'm leery of installing the first release of something of such an obviously complex nature (I've got only 1 cluster which is for production so no place to test this feature). I've extraced the doc from the patch and see that there are some implementation decisions to be made.

Does anyone have any experience yet with HBMM?
20 REPLIES 20
John Gillings
Honored Contributor

Re: HBMM Experience

Jack,

Sure! We in HP have got LOTS of experience with it :-) It's been heavily tested for a very long time, including an external field test over the past 6 months or so.

We're also very excited to see it finally released, as this is a solution to one of the biggest headaches for customers.

I can understand OpenVMS customers being reluctant to install something this new on their production systems, but those of you with test systems, please give it a good thrashing to make certain that we haven't missed anything. The sooner we can get this rolled out to everyone, the better.

Jack, one possibility is to install the kit on production, but not enable mini merge on all your disks. You should still be able to use merge prioritisation and other new features.
A crucible of informative mistakes
Wim Van den Wyngaert
Honored Contributor

Re: HBMM Experience

No mini-merge chez nous.

But we are on 7.3 and are using the mini-copy on every reboot. Without any problem except that 1 time the boot was interrupted when the shadow mini-copy was active. Then we had 2 members of a shadow set that were not the same. There is a patch for that.

So, if you test it, focus on interrupting it while it is doing the mini-merge.

Wim
Wim
Kris Clippeleyr
Honored Contributor

Re: HBMM Experience

Jack, and others,

Be aware that the HBMM V0100 kit has been put on hold by engineering due to a slight problem that will be fixed within a few weeks with the V0200 version of this patch.

Greetz,

Kris
I'm gonna hit the highway like a battering ram on a silver-black phantom bike...
Ian Miller.
Honored Contributor

Re: HBMM Experience

I've just read the hold notification sent via openvms.org - sounds a like a significant problem and makes me wonder about all that testing that has been going on.
____________________
Purely Personal Opinion
Jan van den Ende
Honored Contributor

Re: HBMM Experience

Jack,


I just received this:



Engineering Hold Notice for VMS732_HBMM-V0100

PROBLEM DESCRIPTION:

If a shadow set member device name is specified with the SET SHADOW command,
the command will fail with the following error:

%SYSTEM-F-IVDEVNAM, invalid device name

This failure occurs even though it may be valid or necessary to specify a
shadow set member device name with the SET SHADOW command qualifier that was
used.

For example:

$ SET SHADOW $1$DGA107:/COPY_SOURCE
%SYSTEM-F-IVDEVNAM, invalid device name

The qualifiers that should allow a shadow set member device name are:
/COPY_SOURCE, /FORCE_REMOVAL, /MEMBER_TIMEOUT, /READ_COST, and /SITE.

Commands that take a shadow set name work correctly.

PROBLEM RESOLUTION:

This issue will be corrected by the future VMS732_HBMM-V0200 ECO kit. Expected
release timeframe for this kit is one to two weeks.

WORKAROUND:

There is no workaround for this problem. If customers experience this problem
the VMS732_HBMM-V0100 kit can be removed from the system with a PRODUCT UNDO
PATCH command. The PRODUCT UNOD PATCH comand will remove the last patch
installed. If patch kits were installed after the VMS732_HBMM-V0100 kit,
those kits will have to be removed before the HBMM kit can be removed. Note
that this will only work if the kit was installed with the SAVE_RECOVERY_DATA
option.

If the HBMM kit was not installed withe the SAVE_RECOVERY_DATA option but
replaced files were renamed archived as image_name_OLD, the kit can be removed
by renaming these archived images to the normal image name (removing the
_OLD) and making them the latest images on the system.

If neither of the above options can be used, the kit can be removed by
restoring a pre-kit installation backup.

After removing the VMS732_HBMM-V0100 kit the system must be rebooted.




guess WE will be waiting some more.
I am the one that was known in all european DECUSses to ask about this issue every time again in every Engeneering Panel, so it might be assumed that I am eagerly waiting,
but a few weeks more waiting is all there is to it, I guess....


Sorry to bring such bad news.


Jan


Don't rust yours pelled jacker to fine doll missed aches.
Jan van den Ende
Honored Contributor

Re: HBMM Experience

Sorry,
I only now noticed Kris' entry. Don't know how I missed it.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Zahid Ghani
Frequent Advisor

Re: HBMM Experience

Jack, We have asplit site cluster and have been eagerly waiting for HBMM. I did get involved in the testing of 7.3-1 version using the test facilities provided by HP. I spent a day testing and trynig to break it. The time it took to complete a mini merge was in the rang of 2-4 minutes.
I am planning to to introduce HBMM around November time and just like you am a bit nervous and would prefer the reassurance of some body else having tried on their production system.
As a suggestion may be you could your HP contact to see if they can provide some test facilities. Also I would be interested on what other peoples plans and their exepeince of HBMM.
Robert Brooks_1
Honored Contributor

Re: HBMM Experience

As one of the members of the team that created HBMM, I'm certainly interested in this thread!

While the minimerge functionality is clearly
the focus of our work, we also spent a fair
amount of time on the copy/merge priortization scheme.

Through judicious use of
$ SET SHADOW DSAn/PRIORITY = n and the
SHADOW_REC_DLY system parameter, it is now
possible to predict with 100% certainty the
order in which recovery operations (that is,
a merge or a copy) take place on a system
and cluster.

Has anyone tried this feature yet?

(note: we apologize for the hold placed on the V7.3-2 kit; we've solved the problems
that caused the initial hold, and just found
another problem that happened due to oddly-failing hardware. Without the failing
hardware, it's unlikely that we'd have found this issue. Unfortunately, the hardware
died, and reproducing the mode of failure
has not been easy).
Jack Trachtman
Super Advisor

Re: HBMM Experience

Robert,

Aren't the SET SHADOW DSAn/PRIORITY = n and the SHADOW_REC_DLY features only available after the HBMM patch is installed? I just checked the SET SHADOW Help on a V7.3-2 system and don't see a reference to /PRIORITY.
Ken Fairfield
Occasional Advisor

Re: HBMM Experience

In response to Rob's question, yes, I've been testing the priority stuff quite a bit. You should have seen two questions sent to support by Brent Murphy (which I "fed" to him).

The priority settings really do what you say, thanks! :-) But it took a bit of learning to find that (a) the priorities are node-specific so they need to be set on each node independently, (b) you can't set a priority on a shadow set before it's mounted, and (c) you need to do a Set Shadow/Evaluate=Resources after setting the priorities, e.g., during boot, if you want the shadow sets to merge and/or copy according to the (new) priority settings.

We also found that shadow copies are _not_ processed ahead of full-merges as the documentation says, _unless_ the associated shadow set is given a higher priority than the pending full merges. We are unable to tell whether the documenation is wrong (or incomplete), or whether the implemenation is not quite right (VMS731_HBMM-V0100 ... the -V0200 kit was too late for us).
Jan van den Ende
Honored Contributor

Re: HBMM Experience

Robert,

Reading Ken's remarks I would strongly suggest that every effort is given to getting the documentation SYNCHRONIZED to the software!

Now finally SCSI Minimerge sees the light in the form of HBMM.
But you will probably understand that HBMM will be most interesting to sites that are the most sensitive to disruption. Those disruptions can equally well arise from failing software, as from GOOD software that is USED WRONG. And that would be all the more painfull if the software is used according the documentation, where THAT is wrong!

And, YES, we would like the get it in by yesterday, but we MUST be sure we understand what CAN and what CAN NOT be done to mould it to our wishes, before we dare.

PLEASE, have the documentation right as well!!


... reading this back it feel it sounds a bit harsher then I intended to, but the essence still stands.

And, of course, I owe and enormous THANK YOU to those who finally got it working!
I have been informed of some bits and pieces of the hurdles that had to be taken, and everyone that took part in getting it done should receive a big bonus if I had a vote in that!

Once more, thanks, and I hope to be able to implement HBMM soon!


Jan
Don't rust yours pelled jacker to fine doll missed aches.
Robert Brooks_1
Honored Contributor

Re: HBMM Experience

I'll attempt to address the points raised
individually - thanks for the feedback.

>Aren't the SET SHADOW DSAn/PRIORITY = n and >the SHADOW_REC_DLY features only available >after the HBMM patch is installed?
Yes, this is part of the HBMM kit.
I didn't mean to imply otherwise. My point was
that while the main focus of our work was on
minimerge in and of itself, the prioritization
stuff is (I think) pretty interesting and was wondering if anyone had explored that aspect of the kit


>The priority settings really do what you >say, thanks! :-) But it took a bit of >learning to find that (a) the priorities are >node-specific so they need to be set on each >node independently, (b) you can't set a >priority on a shadow set before it's >mounted, and (c) you need to do a Set >Shadow/Evaluate=Resources after setting the >priorities, e.g., during boot, if you want >the shadow sets to merge and/or copy >according to the (new) priority settings.

Hmmmm. I thought that a), b), and c) were
well-documented -- sorry it was not clear
on all three of those cases.


>We also found that shadow copies are _not_ >processed ahead of full-merges as the >documentation says, _unless_ the associated >shadow set is given a higher priority than >the pending full merges. We are unable to >tell whether the documenation is wrong (or >incomplete)

Yes, uh, well, uh, this was somewhat of
a surprise to us earlier this week :-(
We *do* go into pretty good detail about the
prioritization stuff and the minimerge/full copy/full merge hierarchy, but that hierarchy
is only enforced per shadow-set, not per
system or per-cluster. This was an unfortunate discovery.

What *does* happen across the cluster
correctly is that all minimerges are done
independant of priority. A 2nd pass is made
to process any other work (full copy, full merge) -- this second pass is done in priority
order. Of course, what we really need to
do is perform the 2nd pass, picking up
only full copies (and any late-arriving
minimerges), and then perform a 3rd pass
over the shadowsets, performing any work
that needs to be done in priority order.

This prospective 3rd pass will *not* be part
of the soon-to-be-released V7.3-2 kit; this
is just my musing about a better way to go
about picking up what recovery operations
need to be done in the correct order.

>or whether the implemenation is not quite >right
Well, it's an engineering problem no
matter what, since we largely write the documentation (tech writers clean it up, but
the technical accuracy is on our shoulders).

Having said that, I'd say the implementation
is not quite correct.

>Reading Ken's remarks I would strongly >suggest that every effort is given to >getting the documentation SYNCHRONIZED to >the software!

>PLEASE, have the documentation right as >well!!

You may find this hard to believe, but we
actually spend an amazing amount of time
on the documentation. In fact, we spent
so much time that when discrepancies like this crop up, we're all a bit embarrassed.

I hope that the examples we've added for
HBMM policy manipulation take away some
of the confusion you'll likely first have
when you see the syntax required for
HBMM policy creation and use.

>... reading this back it feel it sounds a >bit harsher then I intended to, but the >essence still stands.
We've been a lot harder on ourselves
internally that you have been here . . . :-)
Wim Van den Wyngaert
Honored Contributor

Re: HBMM Experience

Just curious ...

How are you testers testing if the shadow set is consistent, t.i. that all disks contain the same info after copy/merge is complete ? One of the patches on HBMCopy solved such a problem, but I found no tool to check it.

May be the shadowing masters would like to check this threat too.
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=666457

Wim
Wim
Volker Halle
Honored Contributor

Re: HBMM Experience

Wim,

there is ANAL/DISK/SHADOW, which does exactly this: compare all blocks on all members for identical contents. This is a new feature in V7.3-2.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: HBMM Experience

Thanks. But I'm stuck on 7.3.

Wim
Wim
Ian Miller.
Honored Contributor

Re: HBMM Experience

there was an old tool that the CSC had which would do a block compare of shadow set members. Parhaps you could get this from your local support centre.
____________________
Purely Personal Opinion
Fodil ATTAR
Frequent Advisor
Solution

Re: HBMM Experience

Hi Jack,

We have experienced troubles on nodes which had HBMM-V0100 & SHADOW-V0200 Patches Clusters crashs.

It has been Strongly recommended to pass ASAP the dynamic parameter WBM_MSG_UPPER from 100 to 1 000 000 on all nodes which receive theses two patches.

Anyway Mini merge efficiency is astonishing.

Fodil,
We must become the change we want to see cf. GHANDI
Jack Trachtman
Super Advisor

Re: HBMM Experience

Does anyone have additional information on Fodill's experience?
Fodil ATTAR
Frequent Advisor

Re: HBMM Experience

Hi,

Some more informations about context.

Separate groups of clusters using SAN Devices

1rst GS1280 (Galaxy) all in VMS 7.3-1
2nd GS160 & alpha 4000 all in VMS 7.3-1

Fodil.
We must become the change we want to see cf. GHANDI
Robert Brooks_1
Honored Contributor

Re: HBMM Experience


>It has been Strongly recommended to pass ASAP
>the dynamic parameter WBM_MSG_UPPER from 100 >to 1 000 000 on all nodes which receive theses
>two patches.

Please note that this is only relevant for
the HBMM kit on V7.3-1. That kit is not
generally available to all folks; it's a
limited release to roughly a dozen customers.

There is no need to modify the WBM sysgen
params if you are using the production HBMM kit on V7.3-2.