MSA Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

EMA12000 with OpenVMS7.3-1

 
BR788072
Occasional Contributor

EMA12000 with OpenVMS7.3-1

Hi,
We have an EMA 12000 comprising:
6 shelves,
14 x 36Gb disks, 33 x 72Gb disks
2 x HSG80 - ACS 8.6S-13, 2 x 256Mb Cache
2 x EL Switch
1 x MDR
1 x MSL5026 Tape Library, 1 x 110/220 SDLT Drive

This is served by 2 Alpha servers, 4000 & 4100 running OpenVMS 7.3-1.
The disks are configured raid 3/5 and partitioned to give 1 x 50Gb drive (on the 36Gb disks), then the remainder into 17 & 23Gb logical volumes. (To suit our bespoke software).

Data is written to our main partition then periodically archived to one of the smaller partitions.

We are now seeing data errors relating to the headers on the main disk. (10 months use)

We have seen one complete disk fail which resulted in raid reconstruction (and data errors were then noticed on the reconstructed set) and one 'misconfigured' disk (since swapped out).

Our two HSG80s are configured for multibus failover. It appears that no matter what we do (preferred path etc), the only HSG that does any work is the Bottom controller (Hardware E16). (Top is E12, and has always been the slowest). The only disk with a preferred path is our main disk

Our archived disks are not normally mounted, again due to bespoke software, having been originally designed for optical disks. Usually a disk is mounted, archived to, then dismounted. When needed, the disk is then remounted and read. We see errors being reported to VMS when ever a disk is mounted (initially).

Has anyone any experience of a similar set up and similar problems.
(we have also had numerous errors on just about every item in the SAN, resulting in most components being changed. We are now starting to think that this kit is just not suitable for our application). Everything has been patched to the latest versions.
Appreciate any ideas
Andrew
3 REPLIES 3
BR788072
Occasional Contributor

Re: EMA12000 with OpenVMS7.3-1

You wouldn't believe this. After 12 months of looking, I've at last found manuals relating to 8.6 - where? - ON IBMs site!
Uwe Zessin
Honored Contributor

Re: EMA12000 with OpenVMS7.3-1

Andrew,

the PREFERRED_PATH setting is only a hint when both controllers boot at the same time. A single controller restart will cause a failover of all units to the other controller. They will stay there, because it is not possible to configure a failback on the HSG. A host can override this for each unit at any time - set '$ SET DEVICE /SWITCH /PATH='.

I wonder: how have you measured that the top controller is 'slower' than the bottom?

Can you post the output of '$ SHOW DEVICE/MULTIPATH' as a small .TXT attachment?

Here is a pointer to HP's site, which you might find useful:
http://h18000.www1.hp.com/products/storageworks/acs/documentation.html

It is very unusual that all components show errors - are you sure you don't have an environmental problem? Temperature, power, grounding?
.
BR788072
Occasional Contributor

Re: EMA12000 with OpenVMS7.3-1

Hi Uwe.
I'm impressed. Thanks for the link to the documentation.

Now, the 'thing'.
Following installation, we had about 6 disks fail (originally 18Gb instead of 36Gb). All changed to 36Gb.
We then started using a rolling backup script to exercise the SAN - Tests commenced and we saw errors. We then tried a slow, careful, process of substitution and elimination. Changed controller, no difference, changed memory, no difference, upgraded memory, no difference. Just about everything has been tried. We were eventually persuaded that these errors did not necessarily represent something to worry about (;-)!
(Shelf was changed, switches changed firmware upgraded - everything!)

Then we lost a disk on an archive set. On reconstruction we saw data errors - users could no longer access the data.

Now we are seeing data errors on our main disk - It's now getting worrying!

The speed issue is purely subjective - connect a terminal to either controller and enter show *** - There is a noticable speed difference.

We have also an issue with our MSL5026 becoming 'invisible' - there one day, missing the next.

(We're fairly convinced it's not environmental considering all the other equipment in the room)

Does that give you any further clues.
Andrew