- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Downtime checklists
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2004 11:03 AM
08-25-2004 11:03 AM
Downtime checklists
Thanks in advance,
Joe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2004 12:28 PM
08-25-2004 12:28 PM
Re: Downtime checklists
1)
Has the system got power?
Can you get into the GSP logs?
Do you know how to interpret the GSP logs?
If not place hardware support call.
2)
System comes up normally, what should be checked?
- /var/adm/syslog/syslog.log
- /var/adm/syslog/OLDsyslog.log
- root mail
- /var/tombstones
- /etc/rc.log
- /etc/shutdownlog
crashdump (/var/adm/crash)
stm logs
GSP logs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2004 03:14 PM
08-25-2004 03:14 PM
Re: Downtime checklists
I think you will find that if mirror (or RAID everything) and only use hot-plug disks, have multiple NIC's and swiches, clean power and environment that you checklist is almost useless --- and that's a good thing. If you filter all your patches and upgrades througha sandbox then crashes will really become extremely rare. I speak from experience as I have recently passed the 5 year mark with zero unplanned production downtime. I don't need no stinkin checklist for crashes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2004 04:57 PM
08-25-2004 04:57 PM
Re: Downtime checklists
This kind of checklist reminds me of the "disaster/recovery" documentation that was requested from site personnel at my previous job by one of the other big IT companies. They wanted detailed procedures of how we would go about troubleshooting a problem on a system that was down. I laughed and told the IT Lead (who was quite UNIX-savvy himself and laughing right along with me) that I'd be more than happy to hand them a copy of all the HP-UX manuals because I wasn't writing volumes of documentation that already existed from the vendor. I'm not sure what, if anything, got submitted.
Jeff Traigle
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2004 03:52 AM
08-26-2004 03:52 AM
Re: Downtime checklists
From my work experience I do know that a lot of time is wasted before you get the result from the vendor just because of the gathering the necessary information.
What you always should do is when you have a crash is to see if you can get your system up and running again and have a co-worker log a call right away.
Even if you get the system back into production it is still a good idea to have the support people help you figure out the cause of the crash.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2004 04:00 AM
08-26-2004 04:00 AM
Re: Downtime checklists
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2004 04:06 AM
08-26-2004 04:06 AM
Re: Downtime checklists
These policies are just as important (if not more important) to develop as a hardware/software checklist. This is a conversation with you fellow team members/ employees.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2004 04:39 AM
08-26-2004 04:39 AM
Re: Downtime checklists
If a system went down then I would notify the 1st tier support personnel. The 1st tier folks would in turn do the notification of those who need to know and then the info was disemminated out to the user community.
At another location I would notify the helpdesk directly even though I would be 3rd tier.
Still another location would post the system status on a web site and consider that notification of all users.
Again, this is a question for your co-workers, peers, supervisors, etc...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2004 09:07 PM
08-26-2004 09:07 PM
Re: Downtime checklists
The last one was a change to the application, which we had to get people off the system to rectify. I made the change and I put it back after some worrying that it was some other issue.
The one before that was a suspected controller failure. We have serviceguard but the load seemed to be too much for one server, it wasn't particularly comfortable here for a while.
Anyway I think I tend to react the same way.
1. What has changed, can I think of anything recent that might cause a problem?
2. What do the forums say, has anyone seen this before? Yes I quite often come here for a search before logging a call with HP.
3. Check logs, syslog, dmesg (or /var/adm/messages), cmviewcl, bdf, vmstat.
4. Log a call with HP.
5. Look at the disk array, LAN cards.
6. HP went through armlog -e, armdsp -a, armdiag etc...
While this is going on my manager is telling the users what is happening (via the helpdesk) and our DBA is looking for database problems.
So there it is, my rather haphazard route to rectifying unplanned downtime.
I hope this helps a little.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2004 12:09 AM
08-27-2004 12:09 AM
Re: Downtime checklists
#1) Start up application on failover system.
#2) Check to see if this looks like a long outage or a short outage (just best guess) (e.g. short outage -- sys admin rebooted the machine my app runs on without telling me of any planned down time. long outage -- no one knows why the machine crashed).
If short outage then just wait for App to come back up.
If long outage then email a predefined list of managers and ask them to have their people log into the failover machine.
For a system, I would think it would be similar. Get ready to move your critical applications if necessary and have a list of people you need to notify.
Best regards,
Kent M. Ostby