- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- Operating System - Tru64 Unix
- >
- GS80 fails due to high temperature - how to set th...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-25-2005 01:00 AM
тАО10-25-2005 01:00 AM
[I've send this in servers forums, but maybe someone does not follow it.]
Whenever temperature raises above 29*C in data center, one of two GS80 fails.
Q1: I'm not sure why only one of them fails, althouogh this one is more loaded ?
Q2: Also is there a way to see/set temperature thresholds for QBBs and PCI drawer?
As far I can see, on SCM console I can only see the current temperature.
Q3: (Currently QBBs/PCI temp is round 30*C, is this ok or is it high)?
I've run through "AlphaServer Management Station User's guide" - there is a note that "The warning limits are not user-configurable." (regards to temperature).
Q4:So here another question: It seems that this software is not installed on my GS80s. Where can I find this software? I can't find "installation guide" on HP site regarding AMS.
Thanks in advance for your comments (which will be appreciated :-) )
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-25-2005 01:10 AM
тАО10-25-2005 01:10 AM
Re: GS80 fails due to high temperature - how to set threshold?
you can see the current temperature via
sysconfig -q envmon
See man envconfig
greetings,
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-25-2005 01:25 AM
тАО10-25-2005 01:25 AM
Re: GS80 fails due to high temperature - how to set threshold?
Here's another question then:
jerry:root# envconfig -q
ENVMON_CONFIGURED = 1
ENVMON_GRACE_PERIOD =
ENVMON_MONITOR_PERIOD =
ENVMON_HIGH_THRESH = 50
ENVMON_USER_SCRIPT =
jerry:root#
According above, envmon is configured, but on my GS80s it does not start :
"Environmental Monitoring Daemon did not start...trying again"
So if it is not started, then envmond could not shut down the server due to temp.threshold exceeding. Is that right?
So the server shuts down by itself ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-25-2005 01:56 AM
тАО10-25-2005 01:56 AM
Re: GS80 fails due to high temperature - how to set threshold?
are there any relevant error messages in /var/adm/messages?
If so, please post them.
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-25-2005 02:08 AM
тАО10-25-2005 02:08 AM
Re: GS80 fails due to high temperature - how to set threshold?
Regarding envmond, during server startup:
"Environmental Monitoring Daemon did not start...trying again", three times.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-25-2005 02:17 AM
тАО10-25-2005 02:17 AM
Re: GS80 fails due to high temperature - how to set threshold?
In some versions ie: configs with V5.1A+PK#3/PK#4/PK#5(you did not mention what you have), envmond does not start if the "community public" has been removed from /etc/snmpd.conf or/and if the the system is not configured as a DNS client (ie nslookup exits unsuccessfull)
If that is not the issue you can debug the envmon startup as follows:
cd /usr/sbin
cp -p envmond envmond.orig
vi envmond (and comment out the line Env_Daemonize)
(see extract below between --------)
---------------------------------
proc envmon_main {} {
# Env_Daemonize
global SYSMANUI
set SYSMANUI cli
---------------------------------
Then run envmond interactively as follows:
envconfig stop
/usr/sbin/envmond -ui cli
In the output on your terminal, you migth see
things like:
No response from server
while executing
"exec /bin/nslookup $cluAlais "
(procedure "getLocalIPv4List" line 15)
invoked from within
"getLocalIPv4List "
(procedure "getCommunityString" line 52)
invoked from within
"getCommunityString "
(procedure "envmon_main" line 54)
invoked from within
"envmon_main"
(file "/usr/sbin/envmond" line 852)
__ Johan /.
_JB_
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-25-2005 02:47 AM
тАО10-25-2005 02:47 AM
Re: GS80 fails due to high temperature - how to set threshold?
I have Tru64 5.1 with PK5, and here's what I've found according your suggestions, no output during envmond start :-( :
jerry:root# grep public /etc/snmpd.conf
community public 0.0.0.0 read
jerry:root# grep Dae envmond
# Env_Daemonize
jerry:root# envconfig stop
jerry:root# envmond -ui cli
jerry:root# ps -Af | grep env
root 525311 524289 0.0 Oct 24 ?? 0:23.12 /usr/openv/volmgr/bin/ltid
root 919597 902628 0.0 16:37:19 pts/3 0:00.01 grep env
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-25-2005 05:01 PM
тАО10-25-2005 05:01 PM
SolutionAre you sure it was an environmental event that brought it down? Please paste a console excerpt, or suggest other evidence. Any messages output by the firmware before taking your system down will not show up in the messages file, and must be copied from your personal console logs.
On the GS80-class system, the thermal shutdown threshold is 50C, though I think the firmware may take other action at 45C. If you are saying your lab temperature was near 30C, then maybe your internal system temperature was near 45C, and maybe the rug was properly pulled out from under you. I cannot explain why envmond did not do it for you, but it should have.
FYI, at the SCM prompt, "show status" will show you the last alert, which may give you more info on the environmental event that brought the system down. ...or it may simply tell you that you pulled the plug last week (or whatever).
As for getting envmond to run, one reason it will fail to start is if /var/run/envmond.pid is lying around and stale, though I'm not sure how this could happen. Check for this file and delete it if found.
It is possible that, if a valid thermal shutdown was initiated by firmware (and envmond was somehow unable to let you down gracefully), the envmond script did not have time to delete its pid file, and won't start now.
Also, I believe envmond can be run in debug mode:
envconfig stop
rcmgr set ENVMON_DEBUG 1
envconfig start
This may give you some output in syslog's user.log.
(note that this mode is unsupported, so do not run in production like this, though I really don't see much harm except chatty log files)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-25-2005 07:16 PM
тАО10-25-2005 07:16 PM
Re: GS80 fails due to high temperature - how to set threshold?
First of all, you're right about /var/run/envmond.pid. This solved envmond issue :), and just for the record:
jerry:root# envconfig stop
jerry:root# rcmgr set ENVMON_DEBUG 1
jerry:root# envconfig start
Environmental Monitoring Daemon did not start...trying again
Environmental Monitoring Daemon did not start...trying again
Environmental Monitoring Daemon did not start after 3 tries.
jerry:root# rm /var/run/envmond.pid
jerry:root# envconfig start
Environmental Monitoring Daemon started.
"show status" displays no alerts, but alerts were not enabled. So I've issued "enable alert" and now I'll wait for another failure.
I'll post any new info regarding this.
Again thanks to all folks who spent some time on this thread !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-02-2005 08:25 PM
тАО11-02-2005 08:25 PM
Re: GS80 fails due to high temperature - how to set threshold?
Temperature in data center got high again and server failed again.
For the record GS80 is composed of two QBBs:
QBB0 with 2 CPUs and 2 Memory modules
QBB1 with 4 CPUs and 4 Memory modules
Unfortunately on scm console there is no alert :-(.
However "show system" on scm shows CPU 0 and QBB backplane on QBB0 as faulted:
Par hrd/csb CPU Mem IOR3 IOR2 IOR1 IOR0 GP QBB Dir PS Temp
QBB# 3210 3210 (pci_box.rio) Mod BP Mod 321 (├В┬║C)
(-) 0/30 --pf --pp --.- --.- P0.1 P0.0 P f P PPP 25.5
(-) 1/31 PPPP PPPP --.- --.- --.- --.- P P P PPP 25.0
But after 2-3 hours, when temperature gets lower, GS80 boots ok, and "show system" output is without faults.
Do you have some opinion regarding this?
10x to all,
Regards