- Community Home
- >
- Servers and Operating Systems
- >
- HPE ProLiant
- >
- ProLiant Servers (ML,DL,SL)
- >
- Unexpected reboot
ProLiant Servers (ML,DL,SL)
1756018
Members
2913
Online
108839
Solutions
Forums
Categories
Company
Local Language
юдл
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
юдл
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-27-2000 04:00 PM
тАО12-27-2000 04:00 PM
Unexpected reboot
We have a pair of Proliant 5500s with 1Gb of RAM and Raid5 with a 3200 Smart Array controller. For about 1 year now one of the Servers reboots unexpectedly. It is running NT 4.0 and SQL 6.5 but nothing else. I was working on it once and have seen it go down. There is no blue screen error. The screen goes black and I see it boot the BIOS and so on. It is on the same UPS as the other server and only one goes down. I changed the power supply long ago. I swapped the memory with the other server to no avail. The only error in the NT log is that the shutdown was unexpected and the Compaq array has valid data and has restored it. (Thank God for that controller). You get the same result if you pull the plug. I ran the diagnostic and the only error I saw is an array parity error 4 so I changed the controller. It now gives me array parity error 1 with the new controller (weird). This machine crashed only once every 2 months or so before we put it in production but now it goes down every 5-10 days. It always comes back OK so far but this can not go on. I have contacted the software manufacturer and they assure me the software is installed elsewhere and is stable. It is hard to work on this machine because it can not be down for more than 30 minutes (24/7/365) so I must have a plan before I can work on it. ). The program is critical and I fear the only answer I have is to put it on another server. Any ideas would be appreciated.
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-28-2000 04:00 PM
тАО12-28-2000 04:00 PM
Re: Unexpected reboot
This sounds to me like you may have a system board problem. This would be the first this to try. Good luck and I know how yoy feel.
Later
Mike
Later
Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-30-2001 04:00 PM
тАО01-30-2001 04:00 PM
Re: Unexpected reboot
If the problem is still happening email support@compaq.com - is there any common software / operational factors about the time of the reboots? is it various times of day - what software processes are going on? based on the info here it could be software, memory, system / processor board. Extended memory diagnostics can give you confidence in the memory - but it's time consuming.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-22-2001 04:00 PM
тАО05-22-2001 04:00 PM
Re: Unexpected reboot
We had a similar problem on a clustered server, where one node kept crashing every week or two with no bugcheck. The fequency seemed to go up over time, as did our usage. I also replaced the Memory first, which did not fix it either. I then replaced the System board, along with the CD drive AND cable. The problem went away after this, but I was never 100% sure which of the 3 components were bad. I replaced the CD and cable, as we had two CD drives go bad in the same timeframe as our problems. This was the primary node for a 24/7 Active-Active cluster, so I was more interested in fixing it than finding the exact component that was bad. Compaq is good about giving parts, so I would get them to send a new System Board on warranty to see it that fixes the issue. If not, look to cable, etc. that may have pinched. You do have to fault yourself partly for allowing a server that crashes to go into production. Good quality control would have kept this from happening.
Other notes:
-A memory leak in the software can cause this issue. We found a leak in SQL 7 a couple of years ago which did just this, but then again, running over 10K databases was a little dicey at best. You can determine memory leaks by using perfmon and other tools to look at memory usage, working sets, etc. All companies that have these leaks seem to deny their existance until blantantly obvious proof is thrown in their face.
-Replacing the system board should only take about 10 minutes if you time it right. I would drop the server at 2AM if needed, and replace it after doing a dry run on another identical server if possible. Other options are to look through the server hardware manual first, or schedule in the Compaq Rep. to replace this for you. After a system board install, you have to let the Compaq utilities do hardware discovery again. This should not cause any issues if all the parts were placed back in their original slots. Compaq has done a good job on their design to allow for quick replacement of system boards, etc.
-If this is a real 24/7/365 server, then it should be clustered, mirrored or hooked up to a loadbalancer. If a company expects 24/7 from IT, then they need to invest in the equipment which will make this realistic. One server is not a 24/7/365 solution, regardless of OS, manufacturer, etc. Hardware fails, IT people make mistakes, alll software has bugs. Real 24/7 always means redundancy.
Other notes:
-A memory leak in the software can cause this issue. We found a leak in SQL 7 a couple of years ago which did just this, but then again, running over 10K databases was a little dicey at best. You can determine memory leaks by using perfmon and other tools to look at memory usage, working sets, etc. All companies that have these leaks seem to deny their existance until blantantly obvious proof is thrown in their face.
-Replacing the system board should only take about 10 minutes if you time it right. I would drop the server at 2AM if needed, and replace it after doing a dry run on another identical server if possible. Other options are to look through the server hardware manual first, or schedule in the Compaq Rep. to replace this for you. After a system board install, you have to let the Compaq utilities do hardware discovery again. This should not cause any issues if all the parts were placed back in their original slots. Compaq has done a good job on their design to allow for quick replacement of system boards, etc.
-If this is a real 24/7/365 server, then it should be clustered, mirrored or hooked up to a loadbalancer. If a company expects 24/7 from IT, then they need to invest in the equipment which will make this realistic. One server is not a 24/7/365 solution, regardless of OS, manufacturer, etc. Hardware fails, IT people make mistakes, alll software has bugs. Real 24/7 always means redundancy.
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP