1849593 Members
5983 Online
104044 Solutions
New Discussion

Re: Mistakes

 
SOLVED
Go to solution
Martin Johnson
Honored Contributor

Re: Mistakes

I know a guy who worked all weekend to fix a problem that kept crashing a system. This hero was given a bonus and a promotion for his efforts.

What management didn't know is that the scripts this hero fixed were written by him about a year before the problem occurred!

You gotta love it! Write poor code and get a bonus and promotion for fixing it.


Marty
John Bolene
Honored Contributor

Re: Mistakes

I agree with the others as it depends.

Management should make the following available.

training
test machines
play time
information resources

Given these, there is very little lattitude for making mistakes.

If you can test the scripts and procedures to the point that it works on a test machine, then there should be no reason for failure on the production machines. There will ALWAYS be the occasional OOPS, but these might be once a year.

Personally, in my 30 year computing career, there have been only 2 major OOPS. One was without much training at all, they put me as night operator of an IBM 360. I managed to wipe out 2 disk drives that took over 2 weeks of overtime for 5 people to rebuild. They still kept me, but trained me better and started having backups made. The other was when I was researching a system patch to make on a running operating system (yes, there are machines that you can patch a running operating system). I messed it up and they had an hour of downtime that cost about $30K. Neither of these machines had a test machine available.



I have not made any major mistakes since then. I do all my playing and testing on test machines that it does not matter if they get fried or not, we just reload them from backups if that happens.

Backout and contingency plans should be made for all changes. Stupid stuff happens, some that may not be planned for.

Some folks cannot get much of anything right (get rid of them), but I do trust most people to want to do the right thing. They just may not know what the right thing is or the best way to go about getting it done.

More than my 2 cents in this post.
It is always a good day when you are launching rockets! http://tripolioklahoma.org, Mostly Missiles http://mostlymissiles.com
Martin Johnson
Honored Contributor

Re: Mistakes

John,

I agree with you on the use of test machines. Unfortunately, unless they are identical to the production machines, you can still have unexpected problems.

The problem with test machines is that they cost money. How manhy companies do you know who is willing to buy a v2500 for a test machine. I know mine isn't. We tested changes to be made to MC/ServiceGuard on our tech test cluster (two K class systems with 2 GBs of memory). Everything worked fine. When we made the changes to production ( 2 V class ssytems with 8 GBs of memory), the cluster would not come up. It took us over 2 hours to identify and fix the problem (exceeded shared memory limits within sybase).

Marty
Ameet_HP
Frequent Advisor

Re: Mistakes

Mistakes can be done at any stage, in any work. Do not think to release him as he is a contractor (thats bad as employee gets rights to do mistakes & contractors get release notice, Huh !). Check if how much rest job he could do successfully. Did he admit the mistake & accepted to take precaution in future ?

I will not agree to release him only on doing mistake. In fact, you can warn him 2 times, 3rd time its up to you.

Some companies are having good policy.. If any user mistakes, he needs to document it with error & solution & then send this document in their group so as others read it & learn from others' mistakes. This policy will never let him hide his mistakes. Stay cool, you may find him working intelligently tomorrow.
TrustNo1
Regular Advisor

Re: Mistakes

If nothing said above is taken to heart, be cautious, "Those who can, do, those who can't are moved to managent".

~jdk
Dare to Dream
Jim Turner
HPE Pro

Re: Mistakes

Never forget your six P's:
"Prior Planning Precludes Piss Poor Performance."

We have an Assoc. SysAdmin that issued a "chmod 750 .*" in an attempt to catch dot files. (Wait for it...) Yes, that's right, ".*" also matches ".." which means his chmod walked up the tree. Ugly. He's still here, albeit wiser and more careful now.

Also, not all issues are technical or programatic . . .

We had (keyword: had) a Sr. SysAdmin who drew a diagram on a whiteboard in a meeting of managers and directors. One block was labelled "PFM". Everyone was dilligently jotting notes and sketches. A good bit into his presentation, one of the directors finally asked what the "PFM" block did. He calmly stated that it was "Pure F*cking Magic", and they wouldn't understand it anyway. Security guard, pasteboard box, badge, door. Ouch. (I'm still trying to figure out if he was patently stupid or simply had stones so big walking was difficult!)

Cheers,
Jim