- Community Home
- >
- Servers and Operating Systems
- >
- Integrity Servers
- >
- RX4640 freezes randomly
Integrity Servers
1754363
Members
4757
Online
108813
Solutions
Forums
Categories
Company
Local Language
юдл
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
юдл
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Go to solution
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-23-2011 07:46 AM
тАО03-23-2011 07:46 AM
hi, I am a new comer to this forum, so please forgive me if I am asking the questions at the wrong place. Here is the issue:
we have a RX4640 server running HP-UX 11 in our lab. From last year it started to freeze randomly - the interval between each frozen occurrence ranges from a couple of days to a few weeks. When it "freezes", the server is powered off. I have to restart by using iLO MP command line command (PC-->ON) to bring it back online. Sometimes (very rarely) that wouldn't work, and I had to physically unplug the power cords off the server and then plug it back in to make it reboot. After the server is powered up, all applications and databases work just fine.
The server's firmware info is as following:
MP FW : E.03.30
BMC FW : 04.05
EFI FW : 05.48
System FW : 04.21
I recently noticed that when the server froze, there were some fatal alerts logged in system event log. Following is log record for the latest frozen instance (obtained from MP-->SL):
Log Entry 532: 22 Mar 2011 22:10:55
Alert Level 7: Fatal
Keyword: Type-02 127008 1208328
System shut down or reset caused by sensor reading
Logged by: Baseboard Management Controller;
Sensor: System Event - 5V
Data2: OEM Code2: 0xC3
0x204D891E6F022660 C300A870C3120300
I also tried to see the logs from the OS log directory with command "/usr/sbin/diag/contrib/slview -f /var/stm/logs/os/fpl.log.02". However it showed two logs for this frozen instance instead of just one:
Log Entry 8402:
Alert Level 7: Fatal
Keyword: SHUTDOWN_OR_RESET_ON_SENSOR
System shut down or reset caused by sensor reading
System shut-down or reset caused by sensor reading.
Logged by: Baseboard Management Controller
Data: 0x204d891e6f022660 0xc300a870c3120300
Tue Mar 22 22:10:55 2011
Generator: Baseboard Management Controller
Sensor Type: System Event
Sensor Number: 195
Cause: A sensor reading in the system was determined to be non-recoverable and the system was shut down or reset.
Action: Read the system logs to find which sensor was out of range.
Log Entry 8401:
Keyword: IPMI Type-02 Event
Logged by: Baseboard Management Controller
Data: 0x204d891e6f022650 0x76255401c3020300
Tue Mar 22 22:10:55 2011
Generator: Baseboard Management Controller
Sensor Type: Voltage
Sensor Number: 195
Cause/Action : No information available.
While inspecting previous logs I could see similar fatal alerts around the time of each frozen occurrence, but they came from different "sensor numbers" (such as 97, 205, etc.)
My questions are:
1) What are these sensors? Is there a document that tells what each sensor number represents?
2) given these alerts, does anyone know what exactly happened that triggered the system to be shutdown?
3) is there a quick way to fix it? (this is a server out of warranty period, and we don't have much budget to replace parts)
4) if no quick/easy fixes, is there a way we can set the server to automatically reboot after each time it's frozen(shutdown)? (say, using MP command scripts, if there is such thing)
Thanks in advance for your advice!
- Leon
we have a RX4640 server running HP-UX 11 in our lab. From last year it started to freeze randomly - the interval between each frozen occurrence ranges from a couple of days to a few weeks. When it "freezes", the server is powered off. I have to restart by using iLO MP command line command (PC-->ON) to bring it back online. Sometimes (very rarely) that wouldn't work, and I had to physically unplug the power cords off the server and then plug it back in to make it reboot. After the server is powered up, all applications and databases work just fine.
The server's firmware info is as following:
MP FW : E.03.30
BMC FW : 04.05
EFI FW : 05.48
System FW : 04.21
I recently noticed that when the server froze, there were some fatal alerts logged in system event log. Following is log record for the latest frozen instance (obtained from MP-->SL):
Log Entry 532: 22 Mar 2011 22:10:55
Alert Level 7: Fatal
Keyword: Type-02 127008 1208328
System shut down or reset caused by sensor reading
Logged by: Baseboard Management Controller;
Sensor: System Event - 5V
Data2: OEM Code2: 0xC3
0x204D891E6F022660 C300A870C3120300
I also tried to see the logs from the OS log directory with command "/usr/sbin/diag/contrib/slview -f /var/stm/logs/os/fpl.log.02". However it showed two logs for this frozen instance instead of just one:
Log Entry 8402:
Alert Level 7: Fatal
Keyword: SHUTDOWN_OR_RESET_ON_SENSOR
System shut down or reset caused by sensor reading
System shut-down or reset caused by sensor reading.
Logged by: Baseboard Management Controller
Data: 0x204d891e6f022660 0xc300a870c3120300
Tue Mar 22 22:10:55 2011
Generator: Baseboard Management Controller
Sensor Type: System Event
Sensor Number: 195
Cause: A sensor reading in the system was determined to be non-recoverable and the system was shut down or reset.
Action: Read the system logs to find which sensor was out of range.
Log Entry 8401:
Keyword: IPMI Type-02 Event
Logged by: Baseboard Management Controller
Data: 0x204d891e6f022650 0x76255401c3020300
Tue Mar 22 22:10:55 2011
Generator: Baseboard Management Controller
Sensor Type: Voltage
Sensor Number: 195
Cause/Action : No information available.
While inspecting previous logs I could see similar fatal alerts around the time of each frozen occurrence, but they came from different "sensor numbers" (such as 97, 205, etc.)
My questions are:
1) What are these sensors? Is there a document that tells what each sensor number represents?
2) given these alerts, does anyone know what exactly happened that triggered the system to be shutdown?
3) is there a quick way to fix it? (this is a server out of warranty period, and we don't have much budget to replace parts)
4) if no quick/easy fixes, is there a way we can set the server to automatically reboot after each time it's frozen(shutdown)? (say, using MP command scripts, if there is such thing)
Thanks in advance for your advice!
- Leon
Solved! Go to Solution.
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-23-2011 12:37 PM
тАО03-23-2011 12:37 PM
Solution
With the information you have provided, there does seem to be a problem with voltages. From these particular alerts I can see a problem with the 5VDC Voltage Rail. This in particular could be a result of a faulty Voltage Regulator Module. If other voltage rails are reporting problems, then it could be a faulty system board or even power supply.
>Log Entry 532: 22 Mar 2011 22:10:55
>.
>.
>Sensor: System Event - 5V
Check your logs in further detail. If each Sensor refers to the 5V rail, then I would suspect that the 5V VRM that is on the system board is faulty. This could mean replacing the entire system board (unless you can find that VRM module - there are three installed total (5v, 12v, and 3.3v)).
To answer your other questions:
> What are these sensors?
The system has ciruitry called Baseboard Management Circuitry (BMC). This is responsible for monitoring and reporting the base functions of the system such as voltages.
> is there a way se can set the server to automatically reboot after each time
I dont beleive so. It is the function of the server to try and prevent further damage from occurring. Continuing to operate with voltages going out of range could lead to further damaage.
>Log Entry 532: 22 Mar 2011 22:10:55
>.
>.
>Sensor: System Event - 5V
Check your logs in further detail. If each Sensor refers to the 5V rail, then I would suspect that the 5V VRM that is on the system board is faulty. This could mean replacing the entire system board (unless you can find that VRM module - there are three installed total (5v, 12v, and 3.3v)).
To answer your other questions:
> What are these sensors?
The system has ciruitry called Baseboard Management Circuitry (BMC). This is responsible for monitoring and reporting the base functions of the system such as voltages.
> is there a way se can set the server to automatically reboot after each time
I dont beleive so. It is the function of the server to try and prevent further damage from occurring. Continuing to operate with voltages going out of range could lead to further damaage.
----------------
Was this helpful? Like this post by giving me a thumbs up below!
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-23-2011 01:57 PM
тАО03-23-2011 01:57 PM
Re: RX4640 freezes randomly
Robert,
Thanks for the insights. They are very helpful. I did check other alerts, and I saw there was another sensor (#97) complained about 5V Voltage out of range too. Also there were several fatal alerts when sensor number 205 complained about "1.5V MR PwrGood", such as the log record below:
Log Entry 545: 23 Mar 2011 15:01:24
Alert Level 7: Fatal
Keyword: Type-02 127008 1208328
System shut down or reset caused by sensor reading
Logged by: Baseboard Management Controller;
Sensor: System Event - 1.5V MR PwrGood
Data2: OEM Code2: 0xCD
0x204D8A0B44022750 CD00A870CD120300
I have no idea what this log means except it is a fatal alert. I wonder if there is any other logs (in addition to the Event Log) I should look into to find more information...
Thanks for the insights. They are very helpful. I did check other alerts, and I saw there was another sensor (#97) complained about 5V Voltage out of range too. Also there were several fatal alerts when sensor number 205 complained about "1.5V MR PwrGood", such as the log record below:
Log Entry 545: 23 Mar 2011 15:01:24
Alert Level 7: Fatal
Keyword: Type-02 127008 1208328
System shut down or reset caused by sensor reading
Logged by: Baseboard Management Controller;
Sensor: System Event - 1.5V MR PwrGood
Data2: OEM Code2: 0xCD
0x204D8A0B44022750 CD00A870CD120300
I have no idea what this log means except it is a fatal alert. I wonder if there is any other logs (in addition to the Event Log) I should look into to find more information...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-23-2011 01:58 PM
тАО03-23-2011 01:58 PM
Re: RX4640 freezes randomly
Robert,
Thanks for the insights. They are very helpful. I did check other alerts, and I saw there was another sensor (#97) complained about 5V Voltage out of range too. Also there were several fatal alerts in which sensor number 205 complained about "1.5V MR PwrGood", such as the one below:
Log Entry 545: 23 Mar 2011 15:01:24
Alert Level 7: Fatal
Keyword: Type-02 127008 1208328
System shut down or reset caused by sensor reading
Logged by: Baseboard Management Controller;
Sensor: System Event - 1.5V MR PwrGood
Data2: OEM Code2: 0xCD
0x204D8A0B44022750 CD00A870CD120300
I have no idea what this log means except it is a fatal alert. I wonder if there is any other logs (in addition to the Event Log) I should look into to find more information...
Thanks for the insights. They are very helpful. I did check other alerts, and I saw there was another sensor (#97) complained about 5V Voltage out of range too. Also there were several fatal alerts in which sensor number 205 complained about "1.5V MR PwrGood", such as the one below:
Log Entry 545: 23 Mar 2011 15:01:24
Alert Level 7: Fatal
Keyword: Type-02 127008 1208328
System shut down or reset caused by sensor reading
Logged by: Baseboard Management Controller;
Sensor: System Event - 1.5V MR PwrGood
Data2: OEM Code2: 0xCD
0x204D8A0B44022750 CD00A870CD120300
I have no idea what this log means except it is a fatal alert. I wonder if there is any other logs (in addition to the Event Log) I should look into to find more information...
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP