ProLiant Servers (ML,DL,SL)
1753435 Members
4522 Online
108793 Solutions
New Discussion юеВ

Re: Very strange power issue - DL380 G2

 
Russel Olinger
Occasional Contributor

Very strange power issue - DL380 G2

I definently have strange issues.

I was given a DL380 G2, 6Gb of mem, 6 72Gb drives configurated as Raid 5 with a spare (4+1+1 if you will). It ran fine at a data center for 2.5 years. No problems. I bring it home, reformat the drives and install Win2003 Server. I spent about a week getting the whole thing prepped - it stayed online just fine.

I move it down to a new data center and within hours the thing hangs. It continues to hang radnomly between as little as 30 min to as long as 8 hours - but eventually it does hang.

I change out the memory, check bios settings, windows settings - everything looks good. I bring the server home plug it into work on it and stays up without issue for 7 days before I bring it back in to that data center again. Within hours of being back in the data center it hangs. I suggest they have a power problem and that the power is spiking or the opposite and its causing the machine to hang.

And when I mean hang the server is still on and running, but Windows just locks up. Not even the blue screen of death - it simply just freezes on what ever screen was last up. Mouse, keyboard don't work, monitor starts doing weird things, NICs stop working, etc. But its not power cycling - that has to be done manually. I ran system level diagnostics, refronted the machine with a clean build of Windows 2003 with all the patches but did not load any of my apps or data onto the server.

I bring it back into that colocation data center and within hours it hangs again. I ask them to move the power cables from the Cabinet power to standard wall power using extension cords. They do and the server runs fine for another 6 days without one lock up.

So HA! I think we have verified the source as their cabinet power. They are running 110 power on a 20amp to each cabinet. It feeds to a UPS system, which then feeds a power strip - my server and other servers are plugged into the strip. I tell them either the cabinet power is spiking, the UPS is spiking or the strip is bad. They move it to another cabinet and the same thing occurs. It locks up.

Both cabinets they had my server in had identical power setups - what are the odds both setups were bad? In addition, each of those cabinets had plenty of other servers (by other companies) populating them. Mine was the only HP but it was also the only one hanging. No other client/customer was having power related issues with their equipment.

None of this makes sense.

They point the finger at my hardware saying its a problem with my system and, legitimately, say we have 10 other servers in each of those racks - yours is the only one with a problem, it must be your hardware. They also say they have over 1,000 servers connected up to the identical power setups in all of their cabinets and none of them are having, or ever had, a problem like this. That argument is legit and its hard to argue.

But I point the finger at their power setup, legitimately, because I tell them it runs fine on their regular wall outlet building power, it runs fine at my house and it ran fine for 2.5 years at another data center. In fact it ran cleanly at one data center for 2.5 years and at two different homes for weeks at a time. It just at that colocation facility the system hangs. My argument is legit and is hard to argue.

I took the server home and have had it replaced and will be bringing the new one back into the colocation facility later this week. But I fear a repeat of what has already happened.

Has anyone heard of anything like this before or could possibly offer a solution as to why this was occurring. My buddies at the data center where the server ran clean for 2.5 years have only been able to say "maybe THIS server is very sensitive and just so happens to be the first one at that facility to detect power fluctuations causing it to hang". That original server is back online in a lab at the original facility I got it from. Its been online and running for almost 3 weeks now with no issue.

The difference between the first data center, the two homes and the colocation facility is that the colocation facility had a UPS in the middle of the power configuration. The UPS is a TrippLite SMART 3000RM2U. The colocation used three different UPSs (all identical) during the testing and there was no change, the server locked up everytime.

Any thoughts?
4 REPLIES 4
e4services
Honored Contributor

Re: Very strange power issue - DL380 G2

Power supply? May be that regulated power they have in a bit lower than the wall, say 109 .vs. the walls 112 , and the power supply hickups. Sounds feasable.
Hot Swap Hard Drives
Russel Olinger
Occasional Contributor

Re: Very strange power issue - DL380 G2

Part of the testing ran the server on one power supply, on both, then on the other power supply. In all three scenarios the server locked up. Then we changed the power supplies and the problem still persisted.

But at the other locations (2 homes and another data center) it didn't matter if we ran it on either supply or both - it never locked up.

I put in a new server last night (same model, but new everything) and so far, since 9pm last night, the new server has stayed online. Just about 12 hours so far, this is a good sign, but I don't trust anything yet.
Joshua Small_2
Valued Contributor

Re: Very strange power issue - DL380 G2

The other thing that changes is the networking. I would try placing the server in the data center (using their power) and not connecting any network cables for 48 hours and seeing what happens.
I've seen some switches create bad VLAN related traffic which has crashed some NIC drivers. Perhaps there's some worm just doing the rounds in the data center that you're not as immune to as you'd expect.

I'm clutching at straws, but then, there's not a lot to go on here.
chongkan
Trusted Contributor

Re: Very strange power issue - DL380 G2

Compare the electrical requirements with the values measured on the power strip at the data center..

http://h18002.www1.hp.com/products/quickspecs/11473_div/11473_div.HTML#Technical%20Specifications

How long does it take to hang when pluged in to the DC? You can let the server running on a unlimited number of loops in a Insight Diags test to see what happens..

Is this happening with two different servers? If not maybe the power backplane should be replaced for testing purposes..