Operating System - HP-UX
1833059 Members
2469 Online
110049 Solutions
New Discussion

Looking for ideas about Unix SysAdmin Best Practices

 
SOLVED
Go to solution
Francisco Mancardi_1
Frequent Advisor

Looking for ideas about Unix SysAdmin Best Practices

Hi all out there!:
During the last four years I have been involved in the tasks
of being a SysAdmin, and after a while become the leader
of three people doing the same kind of task in a site, that
today has 30 Sun Boxes, 3 Sun Enterprise servers, a lot of disk
and two tape libraries.

I consider that specially when several people, 'put his hands'
in the same group of machines if you don't have good techniques
to assure coordination your are shooting to your feet in the
near future. ( I have experience as software developer where
the same situation arise).

For my own, and taking ideas from here and there, I have developed
a set of guidelines.

I would like to know about the Best Practices that other SysAdmin
have established, and if "Big companies" have their standards
or best practices defined.


When I think of Best Practices I don't mean the too general
guidelines like:

. document
. do backup frequently
. do recover to be sure your backups are good

but more detailed practices (well I call more detailed and maybe
these are as general as the stated above) like

. Keep the documents near to the files you have modified
(if possible inside the files)

. No anonymous modification is allowed

. mantain a log of all the changes you made to the
system, in the system

. No task will be consider finished and well done, until
the documentation that allow to repet the task is done
and checked by another person




Well, I'm here waiting for your feedback.

Best Regards

Francisco
14 REPLIES 14

Re: Looking for ideas about Unix SysAdmin Best Practices

If your looking for something fairly abstract, you might want to take a look at this...

http://www.usenix.org/sage/publications/code_of_ethics.html

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Vincent Farrugia
Honored Contributor

Re: Looking for ideas about Unix SysAdmin Best Practices

Scott_14
Regular Advisor

Re: Looking for ideas about Unix SysAdmin Best Practices

Sounds like you have 3 people to handle a lot of systems, I beleive communications is very important. You could try and have a 15 min meet, or get together, how often depends on
the people involved. This way
everyone can bring up things going on.

At one place I worked at, they had several binders, with like a home made form, for user changes, adds delets, printer stuff, kernel stuff, and things that a sysadmin may change or interact with. That way if some one was not around, you could just see what was done last, or in the process. Just make sure everyone agress to keep it current, to many times, it gets put aside and the information lacks. Then bigger stuff, just assign it to some one to so you dont have people stepping on each other.

scot
Roger Baptiste
Honored Contributor
Solution

Re: Looking for ideas about Unix SysAdmin Best Practices

Franc,

We have a more stricter set of guidelines
for production systems. For test and development boxes there is more flexibility.

Some of them: (only for production systems)

1) Agree for prescheduled maintenance time
with user groups for the production systems. Say, every Saturday from 1Am to 8Am , if needed we can take the system offline
for maintenance work

2) Document every time the system has rebooted, whatever be the reason.

3) Every user request on the system should
be validated by the manager of the user group
as well as Sysadmin group. (automated ticketing
software is used to ease this).

4) Follow a list of standard set of security guidelines to tune the system . (cannot be
applied to all production systems in practice).

5) Always make a copy of any system file,
before you modify it. (password, cluster files etc).

6) Establish and follow guidelines in creating LV's, VGs for systems. eg: some systems use distributed striping, some use lv striping, extent size differs. Same with VG naming conventions

7) Work with DBA's closely in db-related issues. i.e, for any db related request, get their assent.

8) Monitor performance of the system on a regular basis . (depends on whaat sort of tool/software is used).

9) Set up alerts (for big shops, it is not needed, since ITO would be there by default)


The list can go on depending on the site.
But, please remember to keep the documentation
as less as possible. Otherwise it wont last.

-raj
Take it easy.
Roger Baptiste
Honored Contributor

Re: Looking for ideas about Unix SysAdmin Best Practices


Another thing is to get your team to meet regularly , once a week to discuss on things they are working on, issues with users etc.
The idea is to have better coordination and be aware of the larger picture.


-raj
Take it easy.
John Bolene
Honored Contributor

Re: Looking for ideas about Unix SysAdmin Best Practices

I can't stress enough about having a play machine to test stuff on.

It is much better to know that something is going to work on the production machines instead of having to experiment on them and have to then fix them.

We have a test box that we play on and have an ignite tape that we can reload when things go wrong as happens about once a week. We can also test some of those things that we would never dream about doing on the production box, just to see what happens.

We have several people working on this machine and can schedule exclusive time for possible destructive testing.

We test patches and new products on this machine also before they go onto the developers machines and then into production.

We have a cheap PC set up with linux and jabber (check jabber.org) for instant messaging within the group. This sure beats having to email or use the phone.

Communication and coordination is a key item with many projects going on at the same time.

The other comments have covered most of everything else I had to say.

It is always a good day when you are launching rockets! http://tripolioklahoma.org, Mostly Missiles http://mostlymissiles.com
Francisco Mancardi_1
Frequent Advisor

Re: Looking for ideas about Unix SysAdmin Best Practices

Thank You all a lot.

I will try to assemble a document with all your advice and tips
and will put it in the forum (in the near future)


Best Regards

Francisco
A. Clay Stephenson
Acclaimed Contributor

Re: Looking for ideas about Unix SysAdmin Best Practices

Hi:

Actually having a test machine is not enough. You really need 3 levels.

1) A 'Sandbox' - this is the play machine and it is used to verify patches, new database releases, new OS's, new products, and it possible it's on a separate network. Anything and everything is fair game goes on this guy.

2) A 'Testbox' - this is used for applications development and cannot be clobbered at the drop of a hat. Crashing this box costs money so don't do it. It is used to test latest software modifications and is refreshed with production data fairly often.

3) A 'Production' box - keep your hands off.
Only after patches, modifications, etc. have bubbled up through the sandbox and testbox are they applied here and only during scheduled maintenance periods.

Having this many levels, means that I have not had an unscheduled downtime in the production environment for over four years and counting. I have never even had a hardware failure severe enough to knock down a production box.

--------------------------------------------

My other paranoid practice is to make sure that all of my boxes have 'LIFEBOAT' disks. I never use anything but hot-plug drives and at least once a week, I do a raw copy of a boot drive to a lifeboat disk. This is in addition to mirroring. If a bad patch is applied or more likeky I do something really dumb, I simply shutdown, remove the boot disks and mirrors and move the lifeboat into the boot slot. I'm back up in minutes - faster than with Ignite. I do make make_recovery tapes as well but then again I'm paranoid.



If it ain't broke, I can fix that.
Bill McNAMARA_1
Honored Contributor

Re: Looking for ideas about Unix SysAdmin Best Practices

I find it easier to administer bigger platforms.
It's less of a problem when you break something...!

Really though, regular backups. Test your backups.
Test your backups again, backup again... is generally my experience.

Later,
Bill
It works for me (tm)
Rita C Workman
Honored Contributor

Re: Looking for ideas about Unix SysAdmin Best Practices

Hey Clay...I love that lifeboat disk. Gotta see if I can do something like that here.

Here's a couple things I do.
1. Keep documentation (even my old scribble notes on things) on system in one directory...all call tickets logged there too and detailed for quick reference under file called call_tkts, so anybody could find it. All admin folks and backups have access to these files. No secrets.
2. Keep all UNIX/admin stuff scripts under one directory ( /scripts). All scripts must contain remarks and when/who wrote them. Again, no secrets.
3. Communicate... communicate....
we may not all know everything but we should all be aware of what is going on. I don't like unnecessary surprises (unless it's jewelry).

My biggest issue is that I don't like somebody doing something nobody else was aware of. I don't like smoke and mirrors..and I don't like folks who hide things and appear only when they want to 'save the day'. I believe in keeping things between admin's equally known by all.
Rgrds,
Rita

Santosh Nair_1
Honored Contributor

Re: Looking for ideas about Unix SysAdmin Best Practices

A couple of things that are really important (at least for me) is communications between the admin staff and a good change management system.

The first is critically important. Many a times I've run into situations where one person is working on a machine and someone else is undoing the first persons' work...communications is very important.

As for the change management system, it comes in very handy in terms of keeping track of changes on machine and more importantly, to find trends (determining lemon machines).

-Santos
Life is what's happening while you're busy making other plans
Bernie Vande Griend
Respected Contributor

Re: Looking for ideas about Unix SysAdmin Best Practices

Its been mentioned already, but I'll reiterate.
The 3 best practices are: Testing, Change managements, and server standards.

Testing: like clay says, whenever possible try the change, new software, parameter change, etc on system that won't impact users. His idea of having test and a sandbox is great if you can afford it. If possible you should have one for each OS version, hardware (not feasible probably), and major application.

Change Managment: This doesn't have to be elaborate, but have a place where ALL changes are documented, whether small or large so anyone can refer back to see what has been done. In some organizations this can get to be a political nightmare, but it is useful for sysadmins to have their own mechanisms to keep track of things.

Standards: Have a list of things that should be the same on every machine and make sure they are all set up that way: common list of depots, standards for user logins, DNS, system prompts, history, /etc/profile, LVM naming standards, etc. This will help you all stay on the same page.
Ye who thinks he has a lot to say, probably shouldn't.
Bill Hassell
Honored Contributor

Re: Looking for ideas about Unix SysAdmin Best Practices

And here's something that is often overlooked: education

"There's only one thing more expensive than education and that's ignorance" (author unknown)

I've tried a number of education methods: classroom, plain old books, interactive video conferencing, web-based, self-paced (fancy name for CDROMs), but the real value comes with networking, that is, interacting with a live instructor.

Another very cost-effective method is attending a professional conference, something that specializes in HP products and services (sales pitch alert)...and that would be HP World, sponsored by Interex, the International Association of Hewlett-Packard Computing Professionals (www.interex.org).

As many forum participants will tell you, a few days at HP World is worth a month of classes as you discuss best practices with your peers. Plan to attend HP World 2002 in Los Angeles and check out the Interex web site for membership and services.


Bill Hassell, sysadmin
harry d brown jr
Honored Contributor

Re: Looking for ideas about Unix SysAdmin Best Practices

Francisco ,

Documentation is the key, and accurate documentation is a savior. Use something like viso to document your systems, especially the following items:
wiring diagram, serial numbers, type of cards in what slots, location of computers in the cabinet, and the location of the cabinet, software installed, os level, patches, file system layouts, memory, disks, and many other things. I made my available on an internal web site that allows me to answer questions about "resouce" availability at a quick glance. Of course the key is to KEEP IT UP TODATE.

Also, as Dan said, education is so damn important. Sun's Fasttrack is the best course Sun has to get experienced admins cranking on sun's. You'll learn "jumpstart" and how to mess with the boot params and devices.

Clay also has a lot of good points, that I'll even think about doing - especially the hot plugable disks!

Also, work with your developers and software suppliers about keeping everything out of the root filesystemS so that root disk recoveries, patch installs, kernel parameter changes, and operating system installs are smoother.

Also, re-evaluate your backups and schedules. Make sure you are doing a make_recovery on hp systems EVERY DAY - they don't take that long. make sure backups are audited to ensure that you can recover them. I've known backups that were done for years, only to find out before a major upgrade that the backups were pure garbage - actually blank tapes - fortunately we found this before the upgrade!

Security. You have to be anal about security. No common user should ever have access to any shell! Users should be running applications and be shielded from unix. In my opinion, only sysadmins (including DBA's) should have access to any shell! Operators, doing backups should also be removed from having access to unix shells! Off load account setup and user passwd administration to another group, by developing scripts and such to make the process easier.

Monitoring. Get glance on your sun boxes and any other unix machine you have. buy a perfview license, and set up a collection server. produce at least weekly reports showing performance and resource availability. be pro-active and not reactive. See a problem comming, not be blindsided by one.

And most importantly, ENJOY yourself! being a sysadmin can be a lot of fun!

live free or die
harry




Live Free or Die