1829494 Members
1369 Online
109991 Solutions
New Discussion

Re: Lost / partition

 
Neme
Advisor

Lost / partition

I have lots of customers (in different cities) that have linux installed (fedora core 3 , mandrake 10, 10.1, 10.2) and working. The machines stopped to work with the same error. When I try to reinitialize the system, the following message appears:

No inittab found
Enter your runlevel

And when I try to boot with runlevel 3, 0, 5, it didn't work. I boot the machine with the rescue option and fortunately my data partition (that I've created in another partition) was there. Only the root partition was lost. The /etc directory only have some files.

I mount the / partition and the /data partition. Only the / partition was compromised. The /data partion was ok. Then I installed again withot formating the data partition, restore the /etc from my backup and it goes fine.

But I don't know what's going on. Why suddenly the machines lost the inittab and / partiton? I look at the messages and nothing related to the problem appears. Once I try to repeat the problem with one machine at my office and I find the same error trying to use cups lots of times in a recursive shell, but it was only 1 time.

The hardware is different, i.e., one machine has scsi, another ide or sata, dell machines, ibm machines, out of brand machines, kernel 2.4 or 2.6.

They run only dataflex , and some use only a server in a lan environment (The users only access the server via putty on windows machines or ssh on linux machines), they don't have internet access.

The users don't have access to the server and the disk space is ok.(only 20-30% of the disk is used) so probably virus is not the problem...
In one customer, there is also a netware server using the same power supply and the server never hungs (so we discard power problems).
21 REPLIES 21
Claudio Cilloni
Honored Contributor

Re: Lost / partition

different kernel... different hardware... but (I guess) same applications and same final use of the machine.
Maybe a bug in the software they use. That's the common thing between them.

Be sure there are no applications/services/daemons/logins that runs as root user if it isn't absolutely needed.

Does dataflex (I don't know what it is) run as root or ssh user logged as root?

ciao
Claudio
Gopi Sekar
Honored Contributor

Re: Lost / partition


initially i thought that machines are compromised by hackers and they removed / partition also. but later u said no machines are connected on internet.

it could be problem with the application they are using, some silly bug in the application which instead of removing temporary file, simply removes / (?)

the application(dataflex) does it run as root? if so try to run it as normal user if it can run properly. Also all the users logs in to that machine as root or as normal user?, may be this could be user mistake too.

if all the machines (located in different cities and used by different people) crashes at the same time(i assume so) then its more likely the bug in the application which gets exploited only under specific condition (date/time, routine maintenance job, backup etc)

Regards,
Gopi
Never Never Never Giveup
Neme
Advisor

Re: Lost / partition

The dataflex utility runs under a normal user that have write access only on his home directory. The errors ocurred at diferent times and days. The most interesting thing is that we have the same environment in lots of places, with exactly the same configuration and with no errors!!
We thought that maybe the filesystem type, so we try reiser, ext2, ext3 and we did not resolve the problem. Why the data partition does not get compromised?
We created a partition for the dataflex utility and the problem continued. The / partition get lost but the /data and /dataflex partition was there!
Neme
Advisor

Re: Lost / partition

I forgot to thank you very much for the help!!!
Gopi Sekar
Honored Contributor

Re: Lost / partition


are you using any backup utility or some periodic application which does system maintenance?

does any of the component in dataflex runs with root access?

how do the users power down their dataflex servers(if at all they do), i am asking this silly question because one of my customer who is from DOS background simply switch off the power to power down linux box :)

how the server is maintained? is it maintained by one of the local non-linux user or by some linux expert/system admin? are there any front end tools used for system maintenance.
Never Never Never Giveup
Neme
Advisor

Re: Lost / partition

We create a user to power off the system. This user automaticaly runs a halt command. We managed the system remotely. We have the same environment here and in other customers that is up more then 80 days...

Thanks!
Neme
Advisor

Re: Lost / partition

For backup we use the tar utility that runs once in a day with cron (tar zcvf xxxx.tgz ...).
Gopi Sekar
Honored Contributor

Re: Lost / partition




I dont think tar can be rude enough to erase a file system :)

the file name is it always static or is it automatically decided by a script. in which case check the script closely for any malfunction.

the backup job, is it running as root user or again as dataflex user?

Gopi
Never Never Never Giveup
Neme
Advisor

Re: Lost / partition

the backup runs as root user, but the problem occurs usually at work hour, not at the same time that backup is running..
Gopi Sekar
Honored Contributor

Re: Lost / partition


i am not very sure, but it will be still worth a time to re visit the backup script to check for any potential error.

also if it happens during work hours, then some scripts or some utility used by one of the user is causing the problem.

since you have mentioned that it runs on several locations without any problem, find out the difference of daily work routine done between working location and crashed location. may be that will give some insight on to the problem.
Never Never Never Giveup
Neme
Advisor

Re: Lost / partition

The users don't use any special scripts. They only use putty to access the server, run a script to enter the dataflex program. The script is the same on all the instalations, the ones that are ok and the others that some times have the problem...

The users don't access the server for a shell, they access only to use the system. When they stop to use the system, the script log them out.
Wim Van den Wyngaert
Honored Contributor

Re: Lost / partition

The inittab file is normally read by the init process that is started with /sbin/init.

How did that get started if / isn't available ?

Wim
Wim
Neme
Advisor

Re: Lost / partition

The / is not completely lost. Most of the files, mainly at the /etc directory that get lost. Some times the /var isn't lost and the logs do not show nothing about the error. Some times the /var is not there. It is a very silly problem.
Maybe the kernel is trying to access the swap area and write data in the wrong partition?
I can't guess why the data partition is always ok...
Stuart Browne
Honored Contributor

Re: Lost / partition

So you're saying that the partition is fine, and there's still a valid filesystem on it (fsck in single user mode), but the content of the filesystem gets corrupted?
One long-haired git at your service...
Wim Van den Wyngaert
Honored Contributor

Re: Lost / partition

And if so, did you check lost+found ?
Wim
Neme
Advisor

Re: Lost / partition

lost+found is always empty and the data isn't corrupted, the files disapear. If we boot with a rescue cd and mount the partition / for instance, the /etc/hosts is there, but the /etc/passwd isn't. Some files are there but lots of them are not anymore! The inittab is not there, so the message no inittab found enter runlevel appears.
This problems are causing lots of trouble, because we need to reinstall the linux os sometimes in places that are so far here.
In one case, the problem happened in a machine that has been used for more then 2 years with novell os and never have one single problem!
Claudio Cilloni
Honored Contributor

Re: Lost / partition

did you made any software upgrade or new installation into this machine recenty?
Gopi Sekar
Honored Contributor

Re: Lost / partition


is it possible to get a list of daily routine works done by the users from two different place. one where the system never crashed and the other where system crashes too often.

this will give us some insight on what is going wrong.

Never Never Never Giveup
Bruce Copeland
Trusted Contributor

Re: Lost / partition

Do any of the systems that have failed do regular forced filesystem checks?

I once saw something pretty similar to this on a Fedora 2 system. The problem was ultimately traced to recent changes in distributions that no longer do regular forced filesystem checks. Under certain conditions that can allow a lot of filesystem errors to accumulate.

Bruce
Neme
Advisor

Re: Lost / partition

I didn't make new instalations recently. The user use the same script to use the system. Once they log out they close the connection (no one use the server for other purpose but the system).
Elabro
Advisor

Re: Lost / partition

Hi,

Actually, you can try Active@ Partition Recovery utility to restore. It has the best methods I've seen before. Great tool, won't regret trying it one day.
http://www.partition-recovery.com/