Operating System - HP-UX
1748081 Members
5333 Online
108758 Solutions
New Discussion юеВ

Re: complete power failure in data center

 
Sandro Schaer_1
Advisor

complete power failure in data center

hi folks

this is just a true story that had happened a few weeks back. No questions to answer. Just sit back and enjoy....


We recently had a power failure of the complete data center. There are mainframes, lot's of unix and nt servers along with a few storage systems running. not a single device was running ! except for the light everything else was without power. the reason was overvoltage causing one of the huge uninteruptable power supplies to fail. this again causing overload on the other one and triping just every circuit breaker.

now for the downtimes.

- the old ncr mainframe was down for almost two days. main reason were the disk drives. some of them are almost 20 years old. simply would not start. some network multiplexers haven't been shutdown for more than 20 years.....

- of the countless windows servers a few have been fried completely. unfortunately primary and secondary dns.... along with the network devices this part of the datacenter was not 100% working for 8 hours.

- ncr unix had some issues with filesystems. downtime 5 hours.

- hp unix (13 servers, 3 va74x0, 1 FC60, 1 Autoraid, couple of fc switches, MCSG clusters) was up and running 42minutes after the 'big bang' ! None of the 40+ oracle databases was seriously damaged. Only one power supply of a va7400 died. It was replaced within less than 1 hour and this includes delivery time ! (Unfortunately none of the HP-UX based applications could be accessed due to network problems)


What did we learn out of it ?
- really good admins are priceless
- keep any users out of the data center, they just make you nervous
- hp support is amongst the best available
- hp products don't fail as often as ncr, dell, compag
- enjoy the situation. one of the rare chances to proof you're worth your money....


Hope this never happens to anyone of you. It's worse than hell.
5 REPLIES 5
Pete Randall
Outstanding Contributor

Re: complete power failure in data center

Sandro,

I agree with all your conclucsions and have only one comment to add concerning "- keep any users out of the data center, they just make you nervous": *most* managers are users!


Pete


Pete
harry d brown jr
Honored Contributor

Re: complete power failure in data center

Sandro,

Out of curiosity, what models of NCR are you running?

It's not surprising that older disk drives and power supplies failed after running for years. We used to experience that quite often, which is why our new data center models use SAN disk (EMC), redundant power paths (3 of them), dual UPS's, diesel generators, failover capability to the other data center and of course security which prevents "users" from even visiting the Datacenters.

live free or die
harry
Live Free or Die
Sandro Schaer_1
Advisor

Re: complete power failure in data center

ncr unix are two 3550 systems. each has two pentium 90MHz processors, 128mb of memory, roughly 20 gigs of disk (largest disk is 2gb)......

ncr mainframe is a 9844 with ap/dsp4. some of the disks are 6099 the other i'd have to check. largest disk is 700mb...

the mentioned multiplexers are of type ncr 721


veeeeeeeery old but mostly reliable equipment
Rita C Workman
Honored Contributor

Re: complete power failure in data center

Wanna another laugh....

We built our DR site (w/EMC technology & new RP8400's.) Well on the day we had everyone slated to come in and test our wonderful new DR plan with our SRDF sync'd disks and our big new Continental Cluster...we had a disaster.
No .. honest... a real disaster. At our Disaster Recovery site no less.

Seems with all the flooding, that the power company decided to just drop all power in that area to fix something or another....and poof...no power. Mgmt had gone with a mammoth UPS unit, that apparently held everything up for a several hours, but then died. Servers, disk arrays, the whole SAN on that side...DEAD. (Yes they are looking into putting in a generator out back now hopefully). So even though power came back on, nothing came back up because the UPS hadn't been 'reset'.
My Director thought I was playing a joke when I called him at home.

Fortunately, the power was back up before folks got in and after a few nervous moments with the disk array everything went off without a hitch.

What a way to prove to mgmt that they need to remember that the guy with the pliers on the utility pole down the block can stop a system too !!! tee hee

Lesson learned:
Always come in early....always !!!

Rita
John Bolene
Honored Contributor

Re: complete power failure in data center

At both of our primary and backup sites, we have one massive UPS system that holds the critical parts of the building and computer center for 10 minutes. The massive room full of batteries is staggering.

Each center also has 4 diesel generators, 3 are needed to run the load. These generators start up immediately when the center goes on UPS and take 3 minutes to stabilize. They are then cut into the power feed.

We have one week of diesel fuel in the tanks, some few thousands of gallons.

We are sometimes notified by our power company during the summer that we need to run the generators as they need the extra capacity for utility customers. Because of this arrangement, we get a special price for the electricity we use from them.

It is always a good day when you are launching rockets! http://tripolioklahoma.org, Mostly Missiles http://mostlymissiles.com