- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: What is allowed ?
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2004 12:17 AM
тАО09-29-2004 12:17 AM
Re: What is allowed ?
I have been managing VMS clusters since 1986 - I know what redundancy is!
Talking about powering a system down and burying it in concrete as a response to my comment - I don't feel like treated serious.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2004 12:44 AM
тАО09-29-2004 12:44 AM
Re: What is allowed ?
I have never worked in an environment that was that demanding like yours, but we didn't have any detailed rules of what was allowed and what not. It was decided on a case by case and the user base was much smaller. You have a highly complex system and I don't think you will ever be able to make a final list.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2004 03:15 AM
тАО09-29-2004 03:15 AM
Re: What is allowed ?
on a site, we did 2 controller battery change on a HSD. The first one went fine, but the Compaq guy who did the second was not as good, and needed more than 2 minutes (in fact about 20 minutes :-(
So a controller battery change must be seen as a dangerous operation.
I think the simplest attitude is to plan any change, when there is no activity and we have some free time to repair. Yes this often means coming on Saturday afternoon or evening, work a good part of the night, and have a complete sunday if something goes wrong.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2004 04:15 AM
тАО09-29-2004 04:15 AM
Re: What is allowed ?
we have a simple rule: You do not have any
planned maintenance on the HW during production hours.
OTOH we are not yet trading 24h a day, so we have a window to do what we need to do.
Greetings, Martin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2004 05:15 AM
тАО09-29-2004 05:15 AM
Re: What is allowed ?
" Talking about powering a system down and burying it in concrete as a response to my comment - I don't feel like treated serious. "
- I did not in least try to not take you seriously, far from it.
Actually, it was just a blind repeat of the starting line from (training & symposium) various sessions on the broad subject, both from my former life in chemistry, as in IT.
Actually, it is more or less the description of what you WOULD need if you ever intended to specify a system for Orange Book "A" certification.
Maybe you should take it as description of the utter limit, immediately showing the implied irrelevance.
Wim.
on changing HSx batteries:
We had dual-redundant HSZ40's, and now have dual-redundant HSG80's.... on redundant sites.
HBVS over both.
_IF_ on changing the batteries of one HS on one site, SOMEHOW things get messed up, and both leave service (has happened to us also, yes), THEN we fall back to reduced shadow sets from one site only, and the penalty will be shadow merge (finally, Engeneering, THANKS for HBMM...as soon as we hear enough from others to dare)
BUT: bottom line: we do NOT LOOSE service. (and next time we ask for a more experienced engeneer!)
Jan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2004 11:31 AM
тАО09-29-2004 11:31 AM
Re: What is allowed ?
First rule is to keep the systems and applications running.
Second rule is if something goes wrong to notify the appropriate persons and get things working as soon as feasible. A lot depends on the system admins and management.
Our team of 2 system admins has a lot of leeway. Also our applications people have a lot of leeway.
1. What is not allowed?
A. Doing most hardware maintenance. We don't change live network connections connected to production systems unless it is needed to attempt fixing a critical problem.
On some sites, developers aren't allowed to make changes on a production system except during approved maintenance.
Exceptions:
We can swap an external tape drive. (This normally doesn't make things any worse. The tape drive is down already. This is also an advantage of external tape drives.)
We can swap a failed member of a non-raid shadowed disk drive. (Admin still has to ensure that the right disk is swapped properly.)
B. Software changes to a running system including procedures can be disallowed or restricted depending on system and amount of risk. (See what is allowed below for some possibilities.)
2.What is allowed? That depends a lot on system admin and management. It can very from system to system. At our 2 sites, we have a lot of leeway in making changes as long as we don't impact production.
A. At a minimum, monitoring of the clusters/systems. You need to know discover if something has gone wrong with the nodes or with the network and report it to the proper person(s). I call the main person(s) and send an email out to others.
I have AVAIL_MAN monitoring all of our 8 VMS critical production standalone systems and 1 critical production cluster. They are at 2 sites. I have it enabled to be able fix problems on the nodes. I also use MONITOR utility.
B. Use of console manager if a problem occurs.
C. System procedures can be changed. (Admin must assess the amount of risk. If there is too much risk of impacting a production system, don't do it.) Design procedures modularally so that they can be tested, preferably on a non-production system and preferably without having to reboot. This can depend on the amount of confidence or prior experiences management has with the system admin. At some sites, the system manager may need to get approval.
D. Approved personnel (this can be software configuration management or developer or someone else depending on your company) can make changes on production systems. At one site, I had to get signed approvals prior to making changes to a live production system. Developer may or may not need prior approval to work on live production system. Rules for this can be stringent or more relaxed. (Developers must acess the amount of risk. If desired, developers must log changes to production systems or go through a configuration managemnet procedure.)
Lawrence
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2004 07:02 PM
тАО09-29-2004 07:02 PM
Re: What is allowed ?
Rules for this can be stringent or more relaxed. (Developers must acess the amount of risk. If desired, developers must log changes to production systems or go through a configuration managemnet procedure.)
If mission critical, there is NO option than go through a change management procedure. NO RELAXED RULES.
I'm working on both side of the fences so I _know_ that developers can best be kept miles away from production systems. For very experienced maintenance programmers - which is, IMHO, a separate discipline in programming and system development - you may once in a while need a one-time exception for those cases where analysis of a problem requires access because it cannot be done otherwise (and that DOES happen). But be VERY reluctant. It's up to them to prove they need access - and to prove they (and their tools) can be trusted in a production environment.
Willem
OpenVMS Developer & System Manager
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2004 08:19 PM
тАО09-29-2004 08:19 PM
Re: What is allowed ?
Another very interesting Topic!
I am sure most people if not all have Change Management of some sorts. There are some golden rules that I follow.
1. Carry out out risk analysis - is the change included in the list of actions that can be carried out with full cluster/site down/node down.
2. Spell out the risks -(for the benefits management and clients)
3. What are regression paths.
4. Plan the work and have cut off points.
5. Make sure management is aware of the risks and signs off the work. we use a Change management form.
Like many of you I have had bad experiences when the change that was deemed perfectly safe but turned out not to be. Like moving a monitor from the top of server caused ther the node to crash.
I would be interested to know is if people
categorise their risks like us and what they have in each category.
Category A- Full cluster shutdown
B- Site Shutdown
C- Node shutdown
D- Cluster up but no users (applications down)
Zahid
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2004 08:21 PM
тАО09-29-2004 08:21 PM
Re: What is allowed ?
Its all about management of risk. Whats the risk if you don't do X and whats the risk if you do. Pre-change testing, careful procedures, pre-determined procedures to back out the change and to deal with things that go wrong (because sometimes they will). Exactly what is allowed and what is not is very dependant on the system setup, applications, required availability and local politics.
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2004 08:41 PM
тАО09-29-2004 08:41 PM
Re: What is allowed ?
"change management so at least everyone knows who is doing what and when even if the approvers don't understand what you are doing :-)"
I see you've dealt with similar managers. But seriously, if change approvers put their mark on a piece of paper and don't understand it then its their lookout. If things go wrong you have someone to point to.
My attitude is if I have carried out the risk analysis to best of my ability then that's all one can do. If there are lessons to be learnt then fair enough -amend the procedures for the future.
I agree generalising the risk categories is site dependant BUT there are some things that are common. Also What is expected to be safe someone might have had different experience. Sharing those experiences might save someone from a 'bad day'.
Zahid