- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Worst system corruption ever experienced..
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2002 02:46 AM
09-18-2002 02:46 AM
Worst system corruption ever experienced..
Before I send this disk off to HP Labs for comment, though I'd get comment from y'all.
Scenario:
MC/ServiceGuard clustered HP L1000 servers running 11.0 with a VA7100. Running Oracle 8.0.6 as a package.
On the fabulous Friday the 13th we were testing a move of the package from the ProductionA to productionB servers and the switch failed, the logfile complained that the VG's could not be deactivated.
The ProdA server was rebooted and came up with +/- 15 failures in the boot up sequence, mainly loadable modules, network and file systems -- we could not log on. The server was rebooted again, and this time it gave us a GSP level 13 software panic as soon as the root check was done.....
The server's boot disk was rebuilt -- time factor -- and I vgimported the mirrored boot disks that had failed. What I found was simply incredible... All files to do with LVM, networking and servicing had been zero's out, that is, although they were there, and their sticky bits were there, they were 0 in size. EG:
-rSxr--r-- root sys 0 lvcreate
Most of the binaries for LVM were suchly infected ( I realise that they are ELF shared executables), but I also found that some ASCII's were zero'd...
/etc/services
/etc/fstab
As well as the utmp, btmp, last, lastb etc....
All these files were zero'd at exactly 1600hrs on the 10th, but the failure was only picked up on the 13th when the files associated with the package switch were utilised (LVM, network etc.)
Could this be inode corruption? I checked syslog and all important logs and NOTHING was logged at that time. This is too involved to have been a hack, and we all know there is no such thing as a virus for HP-UX????? ;-)
I have not in the years of doing UX from 9.X seen a corruption of this magnitude, I have never had to rebuild an HP because of a corruption like this....
Comments please...!!!!!
MND
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2002 02:48 AM
09-18-2002 02:48 AM
Re: Worst system corruption ever experienced..
anything interesting in the PIM / HPMC (from bch)
Later,
Bill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2002 02:55 AM
09-18-2002 02:55 AM
Re: Worst system corruption ever experienced..
While I doubt this has anything to do with your problem, it does relate to the notion that there's "no such thing as a virus for (Unix)". While there may not be virii, per se, there are hacks:
http://www.mi2g.com/cgi/mi2g/press/100902.pdf
Courtesy of SANS NewsBites.
Pete
Pete
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2002 03:04 AM
09-18-2002 03:04 AM
Re: Worst system corruption ever experienced..
Thanks, but the server was not complaining, and it is back up and running. Will check PIM though.... good pointer.
Pete,
Again, thanks, really interesting -- did not think there was a hacker bored enough to write a UN*X virus.... although I heard about the slapper virus yesterday....
MND
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2002 03:49 AM
09-18-2002 03:49 AM
Re: Worst system corruption ever experienced..
Its's also no surprise that several LVM executables are currupted... in fact there only two different ones (one in /sbin and one in /usr/sbin) whith lots of links to it. Corrupting one means corrupting all.
To be honest, I would start intense investigation what happened on Sep 10th, 1000hrs. What persons have root access? Who logged in as root? What was done at that point in time?
I'm not excluding the possibility that a SW bug or a HW defect could cause these special symptoms, but I personally don't believe it.
Regards...
Dietmar.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2002 04:12 AM
09-18-2002 04:12 AM
Re: Worst system corruption ever experienced..
Frankly, I lean towards thinking this was done by 'someone' not 'something'. Someone who wondered what would happen if she/he took certain files and did a simple >"file" to them all.
Just a thought,
Rita
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2002 04:14 AM
09-18-2002 04:14 AM
Re: Worst system corruption ever experienced..
Thanks, I fsck'd the filesystems before I mounted them on the new disk, and there were no vxfs errors at all.
As to the logins, well, I can see from the OLDsyslog.log that there was nothing that happened there at that time, I know who was working on the server at the time and there is no chance of them doing anything dangerous as far as I am concerned, Good unix engineers. Unfortunatly utmp, btmp and all commands to do with them (last, lastb)are also zero'd, the .sh_history shows nothing untoward.
My first though was hack attack, but now I am uncertain......
I am seriously thinking of sending the whole disk to HP labs for their thoughts, it is just too wierd for my liking.
MND
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2002 04:24 AM
09-18-2002 04:24 AM
Re: Worst system corruption ever experienced..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2002 04:24 AM
09-18-2002 04:24 AM
Re: Worst system corruption ever experienced..
It's suspicious that the utmp & btmp are zeroed. What about the /var/adm/wtmp file?
Any suspicious gaps in syslog.log?
Did you check dmesg prior to rebooting? Anything in there?
Check the /etc/passwd & group files for UID 0 accounts. Check for hidden "." files in key dirs.
I've had files zeroed by system crashes, but not nulled out while no system problems were encountered. They were files that were being accessed at the time the system nosedived.
Rgds,
Jeff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2002 04:34 AM
09-18-2002 04:34 AM
Re: Worst system corruption ever experienced..
Later,
Bill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2002 04:51 AM
09-18-2002 04:51 AM
Re: Worst system corruption ever experienced..
Not that this is a cause but more of an investigative angle:
If your environment is a trusted system then check the accounting trail from that timeline (9/10/02 10:00AM). BTW, if your not on trusted systems, this might be the perfect opportunity to do so, it keeps nice little trails of who is doing what on your boxes.
Also have your networking engineers run an audit of the logfiles on their switch since they can trace back who was logged on to that system IP address at that time and possibly be able to see the traffic that was passed to it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2002 07:16 AM
09-18-2002 07:16 AM
Re: Worst system corruption ever experienced..
HTH
Marty
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2002 12:31 AM
09-19-2002 12:31 AM
Re: Worst system corruption ever experienced..
I have checked the tombstones Bill, and all seems well...
Dietmar, I am experiencing power problems at the mo, but will send a list, it is very interesting, because at 16h00 half of the files were changed and at 16h03 exactly, the other half.
The only thing that took place at those times was a (trusted) ftp....?????
Jeff, my mistake on the btmp,utmp... they are there, it is just the commands that parse these files that were zero'd -- ie last, lastb.
Bill, yeah, I checked cron -- nothing dangerous, just an oracle script to copy logfiles to the DR server.
Frank, due to the nature of the application, I cannot (unfortunately) run C2.... pity!
Marty, checked .sh_history -- clean as a whistle.... oh and by the time I got the message you already had your CROWN! Welcome to royalty sire! May you wear it with pride!!!! (10 points just for the hell of it, once I have resolution, otherwise peeps think I am happy! ;-)
MND
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2002 02:41 AM
09-19-2002 02:41 AM
Re: Worst system corruption ever experienced..
I only need 3112 points to make Pharaoh. Please .. !!
;^)
Pete
No points, please - just kidding.
Pete
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2002 02:53 AM
09-19-2002 02:53 AM
Re: Worst system corruption ever experienced..
Very interesting. One thing you could try is on the mirror youve imported run an ncheck on the files which are zeroed and see if ncheck reports 0 clusters for each file, or lots of clusters. If lots of clusters note them and see if the range of clusters for all zeroed files is similar - ie. indicating an area of the disk was may have developed a bad spot and thus corrupted all the files residing on that part of the disk. Certainly possible, IMHO.
ie.
ncheck -F vxfs -S - /dev/
This will list the clusters on the disk that file resides on. Repeat for lots of zeroed files and compare clusters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2002 05:36 AM
09-19-2002 05:36 AM
Re: Worst system corruption ever experienced..
'lrwxrwxrwt 1 root sys 13 Oct 6 1999 wtmp -> /var/adm/wtmp'
creates a nice empty file...
Regards,
Jac
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2002 06:40 AM
09-19-2002 06:40 AM
Re: Worst system corruption ever experienced..
Dietmar/Stefan, I have attached a shortlist of affected files (there are more) and the associated nchecks for them... Not sure what I read from this.
You will notice that on the ncheck of the /usr/bin directory the last does not exist, but this is also true for the /usr filesystem.
/etc/services is the same -- zero'd -- and so is /etc/fstab.
Thoughts?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2002 07:43 AM
09-19-2002 07:43 AM
Re: Worst system corruption ever experienced..
checked your output, I was really after ncheck with the large -S option, not lowercase s. Large S gives sector numbers. Can you redo with large S as with the ncheck command in my previous post ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2002 08:32 AM
09-19-2002 08:32 AM
Re: Worst system corruption ever experienced..
If hardware fails, then open files can be zero'd out. However, I have no clue as to what would be opening those files simultaniously. Someone syncing network services... understandable, but not likely that someone was writing fstab, services, at the same time as last, lastb, arp, etc... were all running and then the disk farted. Could happen, but very slim chance.
I would look more at who would be scripting something which would look at system critcal files, and gathering information.
Copying services and other system files to an archive, and at the same time running a bit of user accounting with last/lastb make more sense in this case than hardware, especially since it is back up and running.
If that is not possible though, then look for internal/external hacks and attacks. While you say you cant run C2, can you turn off services like NFS, NIS, FTP, Telnet, rlp??? Also, I'd really watch that 2nd server for similar activity.
Last thought on this as to another possible cause.
One time, I accidentally copied a Solaris intel binary to a Sparc system. It tried to run the binary, and did whack out the system. anyone with root access playing with strange/foreign binaries?
Regards,
Shannon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2002 11:15 PM
09-19-2002 11:15 PM
Re: Worst system corruption ever experienced..
Oops, helps to do the man ncheck_vxfs...... and to read the post correctly innit???? Missed the -
Attached the ncheck for /sbin* and /usr/*bin*, hope this makes more sense.
Thanks all for your continued interest and help with this problem.
I am still not convinced about the hack. The environment is just too clean.
MND
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2002 11:26 PM
09-19-2002 11:26 PM
Re: Worst system corruption ever experienced..
Been thru the attachment, seems to me that the files we are grep'ing for aint there. All of them.....
They are in the -s but not the -S
MND
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2002 11:40 PM
09-19-2002 11:40 PM
Re: Worst system corruption ever experienced..
They are in the -s but not the -S"
Been thru the attachment, all your lv commands are there with the correct number of clusters. I presume this is on the imported mirror lv - not your boot lv ? You only listed the nhceck info for the /usr lvol not for / (/etc) so I couldnt check /etc/fstab etc. Anyway, ncheck output looks fine.
Im still of the opinion that you had a hardware problem. Something wrote to the disk to cause the zeroed files. I see 2 possibilities;
1. A bad spot developed on the disk where these files are located thus 'losing' them. Fsck didnt lose the files from the file list (you can see them) but lost data where they were linked to or where the physical files were and thus they are no 0 bytes.
2. Some sort of disk problem ocurred and when fsck was run it had a problem or got confused and it zeroed the files in question. Been known to happen. Depends on patch levels (which I presume are uptodate) or what the underlying hardware problem is (could be disk, cable, controller).
Have you checked the xstm (utility ->logtool) hardware logs for the day this happened to see if any hardware errors recorded. This is about the best place I can see to try to support my opinion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2002 11:51 PM
09-19-2002 11:51 PM
Re: Worst system corruption ever experienced..
"Been thru the attachment, all your lv commands are there with the correct number of clusters."
The lv commands in /sbin are not there, only in the /usr/sbin, attached an output of the difference between the /sbin in /dev/vg00/lvol3 and the affected lvol in the imported VG, you will see all the commands are not there.
I seem to agree with the ide3a it is a hardware failure, but am still uncertain. The site is 2000 KM away, and as I stated, very secure.
Still gets me that these failures were at 16h00 and 1603 exactly, and this is all that happened at that time.... EXACTLY the time.
#last -R| grep "Sep 10 16"
root ftp 10.100.2.27 Tue Sep 10 16:02 - 16:03 (00:00)
root ftp 10.100.2.27 Tue Sep 10 15:59 - 16:00 (00:00)
MND
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-20-2002 12:04 AM
09-20-2002 12:04 AM
Re: Worst system corruption ever experienced..
so nothing in the xstm hardware logs for that day/time ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-20-2002 12:10 AM
09-20-2002 12:10 AM
Re: Worst system corruption ever experienced..
I will certainly need to look into this, but will need an administrator to help me.
I am attaching the /etc/services and fstab's nchecks, you will see that services does not exist in the affected filesystem, the fstab is a difficult one, as I re-created it on one of the reboots -- so it will lie a bit!