Operating System - HP-UX
1832932 Members
3065 Online
110048 Solutions
New Discussion

Re: tar command changed directory size

 
jane zhang
Regular Advisor

tar command changed directory size

Hi all,

I used command to move one directory from one box to another.
In box A
cd EN2001.1; tar -cvfh - . | remsh csihp08 "(cd /homes/boxB/EN2001.1; tar -xvf - .)"

The original size in boxA is
# du -sk EN2001.1
1723142 EN2001.1

After tar command, in boxB, the size got much bigger.
# du -sk EN2001.1
4501585 EN2001.1

Can you explain?

Jane
8 REPLIES 8
A. Clay Stephenson
Acclaimed Contributor

Re: tar command changed directory size

Almost certainly that is a result of 1) sparse files or 2) symbolic links being followed.

Sparse files are those with "holes" in them. For example, write 1 byte at offset 0 then seek to offset 1,000,000 and write 1 byte. You now have a file which occupies only 2 blocks but ls -l reports its size as 1MB. When a sparse file is copied the "missing" bytes are filled in with NUL's. More advanced backup like OmniBack or fbackup are able to preserve sparse files.
If it ain't broke, I can fix that.
jane zhang
Regular Advisor

Re: tar command changed directory size

Hi,

I do not really want to go back to omniback tape to restore if I can do it online.

the big size difference is definitely not desired. Does tar command have any option to reduce the holes? I also need to preserve the symbolic links as well.

Thanks,

Jane
A. Clay Stephenson
Acclaimed Contributor

Re: tar command changed directory size

You should be able to change your tar|tar pipeline to an fbackep | frecover -s pipeline. Man fbackup (which doesn't really know about sparse files) and frecover (which does) for details.

Vanilla tar or cpio cannot preserve sparse files.
If it ain't broke, I can fix that.
Bill Hassell
Honored Contributor

Re: tar command changed directory size

There is no way for any backup program to tell the difference between a normal and sparse file since (as with all Unix flavors) a regular file's format and contents are determined solely by the creating program. What frecover and Omniback will do is to see a string of nulls, stop copying them to disk until non-zero data is encountered and perform an lseek to skip to that particular record and continue writing. The filesystem remembers the empty space by not assigning inodes that to the data areas not written.

It is very important that 1723142 is NOT the true size of the file. If you use cp or tar or cpio or pax or dump, they will all create a copy that is 4501585 in size. This is the true size of the file and if this is a database file, the unoccupied space may eventually become occupied, eventually increasing the orginal file's size. It is very important to note that the file behaves exactly the same way whether the null records exist or not. To see this, use the cksum command on both files. You can also use the cmp command to compare the files. Other than the result from du, the files are truly identical.

It is very seldom the case that a backup then restore of a set files and directories will be the same size. A sparse file can make the copied result larger, but a sparse directory (a directory that once had thousands of files but they were deleted) when copied will be much smaller. This is normal behavior.

As far as symlinks, this can be quite complicated to resolve. A symlink is really just a string that contains the link. There is nothing truly connecting the symlink to a real file or path. So if you restore the link on another system, it may point to nothing. A hard link is just a directory entry that points to the same file and will always work when restored.


Bill Hassell, sysadmin
Elmar P. Kolkman
Honored Contributor

Re: tar command changed directory size

What you can do is a find on boxA to find symbolic links to make sure they are the cause of your problem:
find EN2001.1 -type l -exec ls -lad {} ';'

Or, to get some more info

find EN2001.1 -type l | while read link
do
echo "`ls -lad $link` -> `ls -adlL $link' `du -sk $link`"
done

When you add up the results from the du's in the loop, they could very well add up to your difference, in which case the -h option of tar is the cause of your size difference. If not, the sparse file option could very well be the cause. In which case you have to look at the type of files and need to think about better ways to duplicate the data, for instance a dump and import.
Every problem has at least one solution. Only some solutions are harder to find.
jane zhang
Regular Advisor

Re: tar command changed directory size

Hi,

Thanks you all, I will need to consider the other method to move the directory which contains symbolic links.

Jane
Michael Schulte zur Sur
Honored Contributor

Re: tar command changed directory size

Hi,

according to man pages tar does not follow links unless explicitly told so. Have you compared individuell file sizes? What about block size?

greetings,

Michael
Dietmar Konermann
Honored Contributor

Re: tar command changed directory size

Just a short hint... the pax(1) command also copies sparse files as sparse (and of course handles symbolic links). Have a look at the man page and watch out for the -rw option.

Best regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)