Re: Corruption of a file

donatella sabellico · ‎09-06-2005

Hi,I am sending this like a "message in the bottle"..
I would like to have some hints on this problem.I had a binary file in which all the zeros (0) are missing(!).We never saw a corruption like that.
Comparing the Good file againt the corrupted one we discovered that the two files are equivalent except for the fact that within the Bad file
all the zeros (0) are missing.
Our correct file had record has a fixed size of 512 bytes. Since the
record space is not completely used, there is some field that is set to
0 (zero).
The corruption of this file is evident but we cannot figure out who or what on HP11i can cause it (I dont' consider the fact that intentionally caused it) We simulate this opening the binary file with the vi and then save it with wq!. On this machine there is Openview and also Data protector HP. 11.11i.
Do you have any idea on that ?

Thanks!

RAC_1 · ‎09-06-2005

What is this file?? HP-Ux executable?? If yes, run cksum on it and compare sizes.

There is no substitute to HARDWORK

donatella sabellico · ‎09-06-2005

It is my application file not an HP 11.11i file.

donatella sabellico · ‎09-06-2005

One more point: The file size approx the middle of the original file size. In fact we understand that something was wrong looking at the size. Then when we saw the binary file we found that all the zeros are gone!

Stephen Keane · ‎09-06-2005

Do you mean zero as in the character '0' or zero in binary i.e. the NULL character. Editing a true binary file with vi is not a great idea, especially if there are NULLs in it. Did vi 'corrupt' the file? If not try comparing the output of xd run against each file and the output saved as a text file.

xd -vxc file1 > file1.txt
xd -vxc file2 > file2.txt

You can safely edit file1.txt/file2.txt with vi as they will be text files.

Alessandro Pilati · ‎09-06-2005

Donatella,
how did you get the 0-less file?
Did you compile it on your server?
Or did you transfer it from another server ( via rcp, ftp or someother ) ?

Regards,
Alessandro

if you don't try, you'll never know if you are able to

Torsten. · ‎09-06-2005

Hi Donatella,

where the file came from? Did the size changed before or after you used vi? vi can't work with a binary file and *will* corrupt it! You can't simulate open a binary file with vi - try "strings" instead to show some "readable" content.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.
__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!

James R. Ferguson · ‎09-06-2005

Hi Donatella:

Running 'vi' on a file with nulls (binary zeros) will remove the NUL characters.

# dd=if /dev/zero of=/tmp/myfile count=2
# echo "theend" >> /tmp/myfile
# ls -l #...shows size of 1031 characters
# vi /tmp/myfile #...and simply do wq!
# ls -l # ...size is now 7 characters including newline at end.

Doing other operations on binary files with standard TEXT-oriented filters is also dangerous. 'sed' could also have corrupted your file too if the output of a substitution, for instance, was renamed to your original file. The NUL characters are stripped, too.

Regards!

...JRF...

# sed
Regards!

...JRF...

donatella sabellico · ‎09-06-2005

Hi Guys, thanks for your answer.
We don't use vi against this file.
We perfectly knows what vi can do on a binary file.
BUT our customer reports us a problem in which our Application was down.
When I saw the application file that our Application used to work this *file* is corrupted in the way I explain before.

So we need to understand what could cause this.
We *simulate* this corruption using the vi but our customer told us that they don't do vi on this file.
We are trying to figure out what else on HP 11i can cause the same corruption because we know that our Application cannot do that.

Thx.

Stephen Keane · ‎09-06-2005

ftp in ascii mode instead of binary?

A. Clay Stephenson · ‎09-06-2005

I would bet the bank that this is not an OS problem but rather a problem with your code; otherwise, this same problem would be appearing in files all over your box. Eventhough you say it can't possibly be your application, I beg to differ. Some of the things that could cause this are failing to read before doing a write so that the original data are not preserved; if you are using hi-level files then you may be doing an fseek before doing an fwrite. There are some patches that fix "possible data corruption" and I would look for those but for problems isolated to one file then it's all but certain that the fault is the application code.

If it ain't broke, I can fix that.

Bill Hassell · ‎09-06-2005

This is normal Unix behavior and your file is called a "sparse" file. When a file is written randomly such as writing record 1, then skipping to record 500 and writing another record, Unix will create a sparse area which is undefined because it was never written. So the filesystem knows about record 1 and record 500, but the rest of the space is skipped. The application knows that this is the case (or the programmers better know this) and will not access the undefined space. The actual occupied space for this file is much less than the maximum defined record would indicate, hence the name 'sparse'.

However, it seems that the application is indeed accessing the undefined space, and standard Unix behavior is to supply a string of nulls for the undefined space. You can verify this with a simple cp of the file. When you cp the file, it is accessed serquentially and the filesystem code supplies a stream of nulls for records 2 through 499.

Now your application *may* set the unused fields to nulls and the occupied space is exactly the same whether records have non-null characters or not. In that case, a cp of the file will produce the same result, but certain backup programs will attempt to minimize occupied space by turning sequential null records into sparse pointers. fbackup has this as an option (not default). Check with Data Protector documents about sparse file optimization.

A core file from a crashed program is often a sparse file which you can see by copying the core file and comparing the two sizes with ll. Or you can create a sparse file like this:

$ dd if=/etc/issue bs=1 oseek=999999 count=1 of=/tmp/sparse.test
1+0 records in
1+0 records out
$l /tmp/sparse.test
-rw------- 1 root sys 1000000 Sep 6 11:11 /tmp/sparse.test
$du /tmp/sparese.test
16 /tmp/sparse.test

If you run the dd command without any options (dd if=/etc/issue of=/tmp/sparse.test) then du won't change (occupied space) but ll will drop to just a few dozen bytes (actual records).

Bill Hassell, sysadmin

donatella sabellico · ‎09-06-2005

The 'sparse' is a good hint...I will check this with data protector but.. do you know if HP 11i did this by default or it can be done only manually using the fbackup or via Data protector..Sorry to re-iterate the questions but my customer is pushing on this.
Thx!

A. Clay Stephenson · ‎09-06-2005

I very much doubt that this is a sparse file problem because a read() would make this invisible to the application. For example, in your case, if only bytes at offset 0 and 511 were non-NUL data, a read of 512 bytes would silently fill in the missing bytes with NUL's and your application would have no means of even knowing that it happened. If sparse files worked any other way, they would be unusable. The frecover option to restore sparse filess as sparse files is -s; in OB2/DP, you need to check the "Restore sparse files" option box. You should note that neither fbackup nor OB2/DP have options to back up sparse files because they have no way of knowing they are READ()ing sparse data. Again, the OS silently fills in the "missing" data with NUL's. The restore operations work by detecting some threshhold number of consecutive NUL bytes and when observed an lseek() is done to skip past those bytes.

od and/or hd are much better tools for examining binary data but NO tools other than those which can look at the filesystem itself can know about sparse data.

If it makes you feel better, you can blame sparse files but I will stick with my bad code hypothesis.

If it ain't broke, I can fix that.

Bill Hassell · ‎09-06-2005

And I agree with Clay since reading these files will have the missing/undefined records filled with zeros. As Clay mentioned, the default for backup programs is to save and restore files with real zeros, but even with a sparse file, reading an undefined record will indeed return a string of NULs. But I suspect that the application code is much more complicated than that and will need better debugging tools to view the file contents. If you use xd -xc to display the file, you'll see the missing records full of NULs.

Now in your description, you say that the record is not complete, that is, it is 512 bytes long but a portion of the record is undefined or not used. Now we are right in the middel of the application code. If 512bytes of data were read, then there must be something in every byte location. I suspect that the method to define or show the content of the record is incorrect. The filesystem knows nothing about data inside records--all of that is defined by the application.

Bill Hassell, sysadmin

donatella sabellico · ‎09-06-2005

Hi, thanks for your reply.
The corruption is evident also because the orginal (good file) was of like 2mb.

When the corruption occurred it was 500k.

The zeros are not more there.
We had different records of 512 bytes each and part of this 512 were written with significant data.
Our Application write/read on this file each time 512 bytes (not less!) but our file always grows (in this case it decrease of size) because we wrote on them;

Some other part of 512 are not used and they are filled with zeros.

In the whole file, in each 512 records, the zeros are not more there!
The file was significant smaller that the original one and the zeros are not more there. We also verify this situation using HexComp a plug-in of Beyondcompare.

We don't do any action that can cause that the file will be more smaller of the original one.
What I saw using HexCoomp
is that on each of 512 records of this file the zeros are not more there.
We don't do that from code point of view.

Thx!

A. Clay Stephenson · ‎09-06-2005

Since I didn't write your code and since you aren't very forthcoming about your design, I can only speculate as to the cause of your problem. If this problem is restricted to only your file (and it appears that it is), it is unlikely in the extreme that this is an OS problem. One thing that could cause a smaller file is the filesystem filling up but that does not explain why NUL's are not being written. Even sparse files would appear to be normal in size (as shown by ls -l) but would consume fewer blocks than expected within the filesystem itself. My best guess is that your application code does less then rigorous error checking after every i/o operation and thus some operations went blindly ahead after an abnormal event occurred.

If it ain't broke, I can fix that.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Corruption of a file

Corruption of a file