- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Why did this server shutdown?
Categories
Company
Local Language
Forums
Discussions
- Integrity Servers
- Server Clustering
- HPE NonStop Compute
- HPE Apollo Systems
- High Performance Computing
Knowledge Base
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Knowledge Base
Forums
Discussions
- Cloud Mentoring and Education
- Software - General
- HPE OneView
- HPE Ezmeral Software platform
- HPE OpsRamp Software
Knowledge Base
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 06:47 PM
01-31-2006 06:47 PM
Why did this server shutdown?
yesterday we had a server shutdown during normal business hours which we didnt initiate.
I couldn't find any hints in the syslogs and no events which could have triggerd this shutdown.
Only thing is that this server normaly gets its power from a UPS. At the time of the shutdown the UPS has been serviced and therefore been bypassed. So the server was on normal external power.
All i can see in the syslog and in the logs of the Oracle database running on this server ist that it was a normal system shutdown. But what caused this?
I have the syslogs attached to this thread. Also in the file werwarda.txt the head of the last command. As you will see the shutdown was initiated at 14:32 and nobody was on the system at that time.
We suspected the envd had initiated the shutdown. There are two events configured in the /etc/envd.conf:
OVERTEMP_CRIT:y
OVERTEMP_EMERG:y
/usr/sbin/reboot -qh
FANFAIL_CRIT:y
FANFAIL_EMERG:y
/usr/sbin/reboot -qh
But i couldn't find any trace that one of these events took place.
Checking the system with sam and stm didn't show any failing components.
This is a RP3440 running HP-UX 11.11 with one processor and 16 GB RAM. It is connected through two Tachyon FC HBA to a EMC CX300. The CS300 logs also showed the RP3440 going down, but the other servers connected and all other components in this SAN showed no problem.
It is wierd that this was the only server out of around 30 Windows and HP-UX servers going down.
Where else can i find a hint what caused this shutdown?
Regards Stefan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 06:55 PM
01-31-2006 06:55 PM
Re: Why did this server shutdown?
Anything in tombstones file?? or in /var/adm/crash??
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 07:11 PM
01-31-2006 07:11 PM
Re: Why did this server shutdown?
Please post/ Check the event logs from /var/opt/resmon/log/event.log ( file name may be event*.log, check with the latest time update).
From this file you will come to know the event status like over temp .etc..
Regards,
Shameer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 07:13 PM
01-31-2006 07:13 PM
Re: Why did this server shutdown?
On the system (thru OS) :
=========================
You can check whether there's a event log captured around the time when the system reboot :
/var/opt/resmon/log/event.log
You can check also whether the system generated the crash directory at /var/adm/crash (see the latest timestamp).
You can check the tombstones file at /var/tombstones/ts99, open the file and check the timestamp. If the timestamp same with the time when the system rebooted, you need to log a call with HP to decode this tombstones file.
On the system (thru GSP) :
==========================
A. Check also on the GSP, with following command :
1. login to GSP
2. type SL for system log
3. type E for system event
4. type T for text mode
Check the system condition around the time / before the system reboot.
B. Check the Power Status, with the following command :
1. login to GSP
2. type CM for command menu
3. type PS for power status
Hope this information can help you.
Cheers,
AW
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 07:16 PM
01-31-2006 07:16 PM
Re: Why did this server shutdown?
I'd check GSP/CSP on the server. There may be a hardware problem or HPMC in there that indicates a problem.
A CPU could have gone off line. HPMC forces an immediate boot.
The other steps posted prior to my post are excellent and should also be followed.
You can use GSP/CSP to TOC a system, even if /etc/rc.config.d/savecrash is not configured to save crashes.
If you get a crash dump, use q4 analysis to create a file for HP Response to analyze. These types of crashes usually result in patching.
AS a general step, see that the machine is current on quarterly patches and look for i/o hang patches. I got nailed by that a year ago, and the patches were not all 3 star and therefor not in the quarterly release on the system.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 07:31 PM
01-31-2006 07:31 PM
Re: Why did this server shutdown?
Have you checked /etc/shutdownlog file ,
and also # last reboot
Also check /var/adm/crash direectory if any crash happened , cusing reboot. You can check /var/tombstones/ts99 file also , for any valid cpu time stamp.
hope this will help ,
hth,
Raj.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 07:53 PM
01-31-2006 07:53 PM
Re: Why did this server shutdown?
/etc/shutdownlog states:
10:05 Thu Aug 18, 2005. Halt: (by servora!root)
17:58 Thu Aug 18, 2005. Halt: (by servora!root)
14:35 Tue Jan 31, 2006. Halt:
The only entries in the /var/opt/resmon/log/event.log are from the system startup in August and yesterday. They are caused by teh EMC software which brings one of the FC Adapters to a "non-participating" mode. No events around the shutdown time.
There is no entry in /var/adm/crash and the tombstones files have timestamps of the last system starts. Nothing with an timestamp near the shutdowntime 14:32.
The Logs i could access through the WebConsole didnt show something i could identify as the source of this shutdown. Only thing i see in the CL is that there was a normal system shutdown initiated. Please have a look at the attached Word Doc with the CL and SL around the questionable time.
The GSP>CM>PS shows everything normal (two power supplies, three fans and the temprature).
But i have to say that this server is not current on quarterly patches. So i will look for the latest patchsets as a first step.
But i cant see a crash anywhere. Especially as the Oracle Database did a clean shutdown. Thats nothing i have seen in a system crash situation.
Thanks for your responses so far. I will look at the GSP again if i can find something i overlooked yesterday. Also i will check for the latest patchsets.
Please be so kind and have a look at the attached Worddocument if you can see something i missed.
Kind Regards
Stefan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 08:10 PM
01-31-2006 08:10 PM
Re: Why did this server shutdown?
Can you please check the /erc/rc.log to ensure any errors
Also check your /var/adm/syslog/OLDsyslog.log
That may provide few lead.
Chan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 08:20 PM
01-31-2006 08:20 PM
Re: Why did this server shutdown?
It shows it was haled at 14:35 Jan 31.
14:35 Tue Jan 31, 2006. Halt:
But how it was haled not sure , ...!!
Can you try to see /var/adm/OLDsyslog.log and check this time stamp i.e 14.35 or , before this time 15.30 around , as what was the activities was going on.
Hth,
Raj.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 08:23 PM
01-31-2006 08:23 PM
Re: Why did this server shutdown?
Check the cron log.
If other servers have password free ssh access, check the /var/adm/syslog/syslog.log messaging and the cron logs of those servers.
Looks like a normal shutdown.
Check the keyboard logs in your operations department. Maybe someone is embarassed to admit what they did.
Check the sw logs from SD/UX, perhaps a swinstal with -x reboot=true was run.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 08:36 PM
01-31-2006 08:36 PM
Re: Why did this server shutdown?
A wild guess did some one turn off the system while the UPS maintenance was on. or any flaw on the UPS side when the UPS was bypassed and the system switched to raw power.
Rgds
PS: I saw the change in Chans status, Wizard, congrats Chan !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 09:17 PM
01-31-2006 09:17 PM
Re: Why did this server shutdown?
the syslog states in plain German(!) that shutdown was performed the envd, the system physical environmental daemon.
Jan 31 14:32:36 servora /usr/sbin/envd[1853]: Beendigung durch Signal 15
hence there must have been something suspicious in the physical syrroundings. Since the messages are so sparse it looks as if a "reboot -q" has occurred.
However, what speaks against this is that a "reboot -q" should not, as far as I know, execute the /etc/rc scripts, and Oracle should then not have been taken down cleanly?!
Still, check if other servers on the same location reported errors relating to the physical environment.
regards,
John K.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 09:18 PM
01-31-2006 09:18 PM
Re: Why did this server shutdown?
the syslog (OLDsyslog) says that the system got a signal 15 (normal shutdown). But there is no hint where this signal came from.
Here are the last entries of the OLDsyslog:
Jan 30 19:30:00 servora su: + tty?? root-oracle
Jan 31 11:13:46 servora rlogind[5493]: Login failure (exit(127) from login(1))
Jan 31 14:32:36 servora /usr/sbin/envd[1853]: Beendigung durch Signal 15
Jan 31 14:32:36 servora diagmond[1850]: Exit due to user requested abort
Jan 31 14:32:38 servora cimserver[1330]: HP-UX WBEM Services stopped.
Jan 31 14:32:45 servora inetd[958]: Going down on signal 15
Jan 31 14:32:45 servora rpcbind: terminate: rpcbind terminating on signal. Restart with "rpcbind -w"
Jan 31 14:32:45 servora syslogd: going down on signal 15
So the syslog says that the shutdown was initiated at 14:32 and the shutdownlog tells us that the shutdown completed at 14:35.
But no entry in all the logs shows who or what started the shutdown process.
There has been no software/patch installation at this time. And also the cronlogs of this and the other servers didnt show something usable.
Fortunately this was a clean shutdown and no crash. So there was no damage to the database or any data. All we lost is the time it took to figure out what happend, that there was no hardware problem and to restart the system.
At the moment i do suspect it is somehow connected to the UPS maintenance and we are investigating in this direction.
@Steven:
i wouldnt susspect my collegues, but it is true that the necessery password is known by several people. But normally there should be a trace of such a "mistake" in some logs or shell histories. I checked wtmp (werwarda.txt in the first attached file) and the .sh_history of the root account. But no hint on this or on the other HP-UX servers.
Following this road would mean that either one of the other system admins is terribly missusing his knowledge or that we have a hacker somewhere in the company. Also this one must be much better and faster than me to leave no trace so far.
And i really, really don't hope we have a problem like this!
BTW: How are you doing in Israel?
Kind Regards
Stefan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 10:06 PM
01-31-2006 10:06 PM
Re: Why did this server shutdown?
@John:
As far as i know this message only means that the envd did recieve a signal 15 and is going down. This is a translation of the original message "Going down on signal 15".
I think this is the result and not the source of this signal 15.
And i agree that the normal shutdown of the oracle database speaks against a reboot -q. So i dont suspect envd as it is configured to do a reboot -qh.
Kind Regards
Stefan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 10:14 PM
01-31-2006 10:14 PM
Re: Why did this server shutdown?
What do think , to assign some points now.
Best Regards,
Raj.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 10:14 PM
01-31-2006 10:14 PM
Re: Why did this server shutdown?
What do you think , to assign some points now.
Best Regards,
Raj.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 10:27 PM
01-31-2006 10:27 PM
Re: Why did this server shutdown?
It looks like some of system process has issued halt to the system and system has rebooted.
I would check the shell history and also look at your alert log for Oracle. In the OLD syslog there is a rpcbind terminating with the following message 'Jan 31 14:32:45 servora rpcbind: terminate: rpcbind terminating on signal. Restart with "rpcbind -w"' so check your patching.
Regards,
Simon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 10:46 PM
01-31-2006 10:46 PM
Re: Why did this server shutdown?
I have seen this before . Once .
A UPS will do a self check regularly , and if the battery is low, it may cause the system to reboot.
If the selfcheck ran with a loose battery connection while they were working on it.
Steve STeel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 10:49 PM
01-31-2006 10:49 PM
Re: Why did this server shutdown?
Also check /var/tombstones for any hardware errors reported as if your ups is faulty you should get confirmation here.
Regards,
Simon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 10:49 PM
01-31-2006 10:49 PM
Re: Why did this server shutdown?
Do you have monitoring tools on the system.
Something like Tivoli or Openview.
Those tools can also generate an shutdown command but you won't see anyone logging in.
If you have look into those logfiles.
grtz. Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2006 11:27 PM
01-31-2006 11:27 PM
Re: Why did this server shutdown?
@Raj:
please check my profile. I do assign points, no need to remember me. Just give me some time to check and doublecheck the information and tips i got.
@Simon:
This entry was done during the shutdown that took place at this time. At the moment we are in normal buisiness hours so i am not interested to restart any processes. But i will check for and install the latest quaterly patches as soon as possible.
The UPS got its maintenance around this time. The UPS is working fine now, but did it or did it not have an effect on our server during the maintenance and testing? We can't answer this at the moment.
@Steve:
Thanks for this info. We are currently investigating in this direction. Good to hear that someone has seen an effect like this before.
@Mark:
No we don't have any monitoring on this server or any other HP-UX server. Also we don't have any scheduled reboots which might have been triggerd on the wrong time. Also the time is correct because we have an internal NTP server and checked the logfiles of the CX300 and this server. They show no timedifference.
I still hope we can find a explanation and that we don't have to search for an hacker.
Kind Regards
Stefan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-01-2006 12:04 AM
02-01-2006 12:04 AM
Re: Why did this server shutdown?
if any user is listed in /etc/shutdown.allow, check his history, crontab and related logs.
regards,
John K.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-01-2006 12:11 AM
02-01-2006 12:11 AM
Re: Why did this server shutdown?
Did you checked what SEP has highlighted.
Someone executing "swinstall -x autoreboot=true .... " could be one of the high possiblity.
You can check logs in /var/adm/sw/swinstall.log and /var/adm/sw/swremove.log.
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-01-2006 12:13 AM
02-01-2006 12:13 AM
Re: Why did this server shutdown?
Do you have error reports on your msp/gsp.
Can you also give those logfiles.
If it have to due with a power failure there is an entry in the logfiles.
grtz. Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-01-2006 12:28 AM
02-01-2006 12:28 AM
Re: Why did this server shutdown?
I'm curious to know whether you are running ups_mond or UPS Manager II or something else.
At least a year back I had a machine shutdown on me due to a three year old battery that wanted to be changed. I'm pretty sure it was a shutdown though not a reboot. My memory isn't great though...
Good Luck with the investigations.
Cheers.