Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

GS160 failures

Wim Van den Wyngaert
Honored Contributor

GS160 failures

I'm in an asking mood today.

I have a GS160 with 2 QBB each running a VMS node. There is dual everything.
My question is : what can go wrong with 1 qbb that brings the other down too. E.g. can an electrical problem in qbb1 cause failure in qbb2 ?
Wim
11 REPLIES
Åge Rønning
Trusted Contributor

Re: GS160 failures

Fatal Errors or corrupted packets on the global switch will take all partitions down if I remember correctly.
VMS Forever
Mike Naime
Honored Contributor

Re: GS160 failures

Power problems can take you down.

Having to replace a failed PCI card in the old 320 required that I power cycle the entire 320 box before that QBB would come back online. This is one of the reasons that we do not use the GS160/320 for production systems.

Do you have shared memory between the QBB's? Or are you Hard Partitioned with no shared memory?
VMS SAN mechanic
Wim Van den Wyngaert
Honored Contributor

Re: GS160 failures

No shared memory.
Once we had to bring it down because a tapedrive had to be switches with its local scsi.
Wim
Uwe Zessin
Honored Contributor

Re: GS160 failures

A GS-class system is not a fault-tolerant system! There used to be a problem if one had to replace a defective CPU in QBB0 (of course, CPUs _always_ failed there...), which made it necessary to power down the whole box - don't know if this is fixed, now.
.
Denver Technology
Occasional Advisor

Re: GS160 failures

Hi Wim
As Mike has already said Power can be an issue. Unlike the Marvel GS1280 the GS160/320's only have single power rails for all QBB's/FireBox's - which I think was a major design oversight. Losing a PDU will bring the all partitions down. And our GS320 IS running our production system, which is why we have a GS320 DR system as well!
Keith Parris
Trusted Contributor

Re: GS160 failures

In addition to a subset of the possible hardware failures, firmware upgrades also require all partitions to be down at once.
Orrin
Valued Contributor

Re: GS160 failures

Hi,

Uncanny that you should ask the question today, just last week we upgraded our GS160 and added another partition. ( Thursday)
On friday the system crashed, adn we had no Idea what the problem was, we logged a acall with HP and the Engineer came out here, he did some checking.

The error message that we got was there is an error with CPU0 in QBB 0, and it takes the entire system down.

After about 2 hours of fiddling around. The answer we got was the firm ware levels in both QBBS do not match, so he did an Upgrade so that all firmware levels and microcodes were at the same level.

Second he mentioned that the missing cable ( The cable that goes from the PCI bus to the Display on the front panel, might be the problem, he then swapped the two PCI shelve status i.e master to slave.

After all this he was still not sure that the problem is fixed, we have the system up and running, it is a test box so we're not to fussed, but we plan to add a partition on the production soon.

Thanks hope this helps.

Orrin
Mike Naime
Honored Contributor

Re: GS160 failures

From my point of view, if you do not require the larger horsepower of the combined QBB's, then the 160/320 are a waste of power and floor space.

When hard partitioned, the 160 gives me the equivalent of one rack of ES45's. The 160 takes up 3.5 floor tiles, where the rack of 45's is one floor tile.

Our 320 that is hard partitioned into 8 qbb's that is the equivalent of 2 racks of ES40's, but it uses 5 floor tiles. (4xCPU w 8GB ram)

Like Keith pointed out, if you are servicing the box to upgrade firmware, you take it all down. This is what makes it a bad "Production" box for me. Also, the maintenance and licensing on the GS was 10X the licenses for equivalent ES4x boxes. Another reason not to use it.

So far, replacing CPU's and Memory in the 160/320 has not taken down the rest of the system.
VMS SAN mechanic
Wim Van den Wyngaert
Honored Contributor

Re: GS160 failures

I fully agree with you Mike.

But it was the management that decided to buy this one ... with only 1 dual cpu per qbb. It is ready for expansion ... while the platform is in phase out.
Wim
Jan van den Ende
Honored Contributor

Re: GS160 failures

Like most (all?) management: penny-wise & pound-foulish.
And they listen better to sales-people then to their own techies... (probably we don't buy them rich lunches, or golf-parties, or ...? ;-(

I know your feeling, I sympathise, but I'm afraid that 's not much help.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Keith Parris
Trusted Contributor

Re: GS160 failures

I know of quite a few GS160s and GS320s which are partitioned into multiple nodes, typically with 1 QBB per node.

Why would one do this instead of using a rack of ES40/45s? One reason is I/O: the ES40/45 have a limited number of I/O slots, and the GS160/320 can have many more. If you require 6 rails of CI, for example, with 2 PCI slots per CIPCA, plus maybe a couple of LAN adapters, that exceeds the 10 slots available in an ES40.

Another reason is Galaxy. You can use shared memory as a cluster interconnect between nodes in a GS160/320, with lower latency than you can get with Memory Channel. Galaxy shared memory can also be used as an IP network, with lower latency and higher bandwidth than Gigabit Ethernet. And you can shift CPUs around between Galaxy instances to handle varying workloads.