1825801 Members
2718 Online
109687 Solutions
New Discussion

Re: greatest blunders

 
SOLVED
Go to solution
U.SivaKumar_2
Honored Contributor

greatest blunders

Hi,

Please post the greatest blunders you think you have done in the life as a system administrator.


regards,
U.SivaKumar

"Best Men are moulded out of Faults - Shakesphere "
Innovations are made when conventions are broken
169 REPLIES 169
Chris Wilshaw
Honored Contributor
Solution

Re: greatest blunders

Well that would be one of 2.

1) creating a filesystem on a database raw log location (that's was the last time I used SAM for any LVM-type work).

2) Believing what I was told when someone said that we could remove a FS from a system as the database that was using it had moved to another machine. Apparently, Ingres is a little picky when you take one of its data locations away, even if it's not being used. That led to an inconsistent database, and several days of tension while it was fixed.
Jean-Louis Phelix
Honored Contributor

Re: greatest blunders

Hi,

Using a lan console, then telnet on another lan console to do a CTRL-B RS. Guess which machine has rebooted ...
It works for me (© Bill McNAMARA ...)
H.Merijn Brand (procura
Honored Contributor

Re: greatest blunders

Trying to test how "hot-swap" disks work. The fast way: just pull one out. Needless to say that this (production) machine went down.
Enjoy, Have FUN! H.Merijn
Robert-Jan Goossens
Honored Contributor

Re: greatest blunders

Working with NIS the first time and putting +root in the passwd file guess what happended!!!!!!
Ravi_8
Honored Contributor

Re: greatest blunders

applying SNAPlus2 of 11i on 11.0 and crashing the system
never give up
Vicente Sanchez_3
Respected Contributor

Re: greatest blunders

Think that computer world is logical.
Tomek Gryszkiewicz
Trusted Contributor

Re: greatest blunders

Deleting all hidden files in /root: rm -rf .* .........

Regards,
Tomek
BFA6
Respected Contributor

Re: greatest blunders

Hi,

Forgetting how links work. Copied what I thought was a file to a file, but ended up overwriting a linked .profile - oops.

Good job we had a backup :-)

Hilary
kish_1
Valued Contributor

Re: greatest blunders

created home area for new user, and instead changeing the permission of new user i gave *
share the power of the knowledge
Michael Tully
Honored Contributor

Re: greatest blunders

Many years ago I deleted all of /var that had files over 10 days old. The best thing was that no one ever noticed....

However I saw a good one the other day, one administrator, rebooted the wrong machine.... where's that egg?
Anyone for a Mutiny ?
Bill McNAMARA_1
Honored Contributor

Re: greatest blunders

I've done this with people looking over my shoulder (while in single user):

echo "/dev/vg00/lvol6 /tmp vxfs delaylog 0 2" > /etc/fstab
reboot!!

Other good ones:
mv /dev/ /Dev
(try it - and don't ask why!!)

Later,
Bill
It works for me (tm)
Donald Kok
Respected Contributor

Re: greatest blunders

worst thing?:

ls /somedir
#oh that's rubbish
rm -r *
# removed all in current directory
My systems are 100% Murphy Compliant. Guaranteed!!!
Olebile
Frequent Advisor

Re: greatest blunders

The worst one i did was becoz of laziness, the intention was to remove a directory under /. So I typed rm -r then using the intelli mouse copied the file name and pasting. Quess what... the mouse copied only part of the directory name which matched other mounted directory names. Everything under those directories was zapped but you still get an error message " mouted filesystem could not be deleted" only the mount point remains you'll be horrified to see what lies underneath. Always type when using rm -r....
Perfomance Monitoring is not always easy
U.SivaKumar_2
Honored Contributor

Re: greatest blunders

Hi Michael ,

Your case is common in racked servers. Sun uses
a locator to locate the server in the rack before giving reboot.

HP has anything like that , I doubt

regards,
U.SivaKumar
Innovations are made when conventions are broken
Christian Gebhardt
Honored Contributor

Re: greatest blunders

Hi
As a newby in UNIX I had an Oracle Testinstallion on a production system
productiv directory: /u01/...
test directory: /test/u01/...

deleting the test installation:
cd /test
rm /u01

OOPS ...

After several bdf commands I noticed that the wrong lvol shrinks and stops the delete command with Ctrl'C

The database still worked without the most binaries and libraries and after a restore from tape without stopping and starting the database all was ok.

I love oracle ;-)

Chris
Justo Exposito
Esteemed Contributor

Re: greatest blunders

Hi,

Develop a script in order to change the permits for all the files and subdirectories under the actual and when run change all the permits in the system.

Regards,

Justo.
Help is a Beatiful word
Systeemingenieurs Infoc
Valued Contributor

Re: greatest blunders

trying to undo script-wrapping :

cd /usr/bin
rm -f remsh : mv remsh.org remsh
=

remsh was rappidly restored ; i overlooked the disappearence of mv. It took 1 week for someone to notice it (only cron jobs were affected ; the other jobs used /sbin/mv).
A Life ? Cool ! Where can I download one of those from ?
Pete Randall
Outstanding Contributor

Re: greatest blunders

Well, there's the old standard of re-booting a workstation, only to find out the window the programmer had up was actually logged into the production server. The only "good" thing was that, even after I su'd to root, the shutdown message said that the programmer had done it.

Pete

Pete
Pete Devlin
Valued Contributor

Re: greatest blunders

From home I got the syntax to tar wrong whilst testing a newly replaced tape drive :- tar -cvf /etc/passwd /dev/rmt/0m. Then my connection dropped.... Luckily it was a dev box at the weekend & there was a console session running so I was able to travel the 3 miles and after first copying/usr/newconfig/etc/passwd into place I recovered from tape. No one knew. I only ever use files in /tmp now for testing tape devices with tar!!
Cheers
harry d brown jr
Honored Contributor

Re: greatest blunders


Learning hpux? Naw, that's not it....maybe it was learning to spell aix?? sco?? osf?? Nope, none of those.

The biggest blunder:

One morning I came in at my usual time of 6am, and had an operator ask me what was wrong with one of our production servers (servicing 6 banks). Well nothing worked at the console (it was already logged in as root). Even a "cat *" produced nothing but another shell prompt. I stopped and restarted the machine and when it attempted to come back up it didn't have any OS to run. Major issue, but we got our backup tapes from that night and restored the machine back to normal. I was clueless (sort of like today)

The next morning, the same operator caught me again, and this time I was getting angry (imagine that). Same crap, different day. Nothing was on any disk. This of course was before we had raid availble (not that that would have helped). So we restored the system from that nights backups and by 8am the banks have their systems up.

So now I have to fix this issue, but where the hell to start? I knew that production batch processing was done by 9PM, and that the backups started right after that. The backups completed around 1am, which were good backups, because we never lost a single transaction. But around 6am the stuff hit the fan. So I had a time frame: 1am-6am, something was clobbering the system. I went though the crons, but nothing really stood out, so I had to really dive into them. This is the code (well almost) I found in the script:

cd /tmp/uniplex/trash/garbage
rm -rf *

As soon as I saw those two lines, I realized that I was the one that had caused the system to crap out every morning. See, I needed some disk space, and I was doing some house cleaning, and I deleted the sub-directory "garbage" from the /tmp/uniplex/trash" directory. Of course the script is run by root, which attempted to "CD" to a non-existent directory, which failed, and cron was still cd'd to "/", it then proceeded to "rm -rf *" my system!

live free or die
harry
Live Free or Die
Pete Randall
Outstanding Contributor

Re: greatest blunders

OK, I'll confess (if Harry can admit that blunder, I guess I can, too). My worst blunder was trying to share a FC60 disk array in a non-MC/SG environment.

Installed the switch, tested connectivity from both servers, cloned the production database's volume group onto another volume group, exported the volume group, imported onto the development server, started bringing up the development DB - trashed the production DB. I'd grabbed the wrong map file and imported the wrong VG.

But this wasn't my greatest blunder!! We coincidentally had some hardware problems with the FC60 and I mistakenly blamed the DB issues on them. Once we got the hardware all straightened out, I proceeded to do the same thing all over again, blindly believing that vgchange would protect me because vgchange supposedly won't let both systems activate the same VG - NOT!!

The good thing about this whole scenario is that we got really good at restoring the production DB - we've got the procedure down pat now.

Pete

P.S. We do share the FC60 to this day - we're just a lot more careful about which VG we're actually dealing with from which server.

Pete
harry d brown jr
Honored Contributor

Re: greatest blunders

Pete, sorry to drag you out kicking and screaming, but my blunder was about 12 years ago :-)

live free or die
harry
Live Free or Die
Pete Randall
Outstanding Contributor

Re: greatest blunders

Harry,

I'm a slower learner, I guess. Mine was this past summer.

;^)

Pete

Pete
Robert Thorneycroft
Valued Contributor

Re: greatest blunders

Having been called in the middle of the night to find out there was a problem with a change that had been made to the system, I suggested restoring the file back from the copy on an EMC BCV.

Unfortunately the operator somehow got the wrong idea and managed to kick off a BCV restore, which as anyone who knows anything about EMC commands will realise is a completely different thing.

Anyway the long and short of it was that the system which runs an extremely busy large Oracle database was taken back to its state of 2 days previous, this then took about one and a half days to recover rolling forwards on the redo logs.