1753971 Members
7734 Online
108811 Solutions
New Discussion

FTP a file

 
SOLVED
Go to solution
Oviwan
Honored Contributor

FTP a file

Hey Folks

We have a file on windows with this content:
F00092 2132.0F03B.....

we created this file with bcp of ms sql with this parameter DATAFILETYPE = 'widechar' because the ö,ä,ü etc...

If we ftp it to our HPUX box we get the following:
ascii or bin mode:
ÿþF00092 2132.0F.....

is there a way to remove these first two character "ÿþ" from this file? if I copy this characters to the shell I get this "^?~"

the size of the files are 20GB to 90GB. with vi we get Line too long...

other thing I tried:
copy the file then
dd if=/tmp/file1 of=/tmp/file2 bs=1 skip=2
to skip the first two bytes but this takes too long ~300MB/30min.

has anyone a fast way to do this?

Thanks in advance

Regards
13 REPLIES 13
Geoff Wild
Honored Contributor

Re: FTP a file

zip the file first on Windoze...then ftp as bin....

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Steven Schweda
Honored Contributor

Re: FTP a file

> the size of the files are 20GB to 90GB.

> zip the file first on Windoze...then ftp
> as bin....

Released versions of Info-ZIP UnZip (5.52 is
current) can not cope with files bigger than
2GB. Unreleased versions

ftp://ftp.info-zip.org/pub/infozip/beta/

should do better, but it's not obvious to me
why Zip+UnZip would do any better than binary
FTP.

If you really are using binary FTP, and those
bytes appear at the destination, then I'd
suspect that those bytes are there at the
source. Do you have a low-level tool like a
UNIX "od" which you could use to see the real
data in the source file?

If those bytes are really in the source file,
then there may be data other than the first
two bytes which will also cause problems.
Hein van den Heuvel
Honored Contributor

Re: FTP a file

bcp (bulk copy) accepts a slew of options.

-w selects unicode characters, so that' 2 bytes per character. It that what you are ready to deal with?

Form the bcp man page:

http://msdn2.microsoft.com/en-us/library/ms188289.aspx

"Unicode character format data files follow the conventions for Unicode files. The first two bytes of the file are hexadecimal numbers, 0xFFFE. These bytes serve as byte-order marks, specifying whether the high-order byte is stored first or last in the file."


It seems to me that those 'funny' characters are supposed to be there for a proper Unicode file usage.

What do yo intent to do with the file once on HPUX?

If the hpux consuming program really can not deal with the MSB/LSB flag, then I suspect you want to look in the direction of Perl or a C program for a solution.
Perl support Unicode, but also supports 'binmode'.

With binmode you could read the file in say 2048 or 512 byte chunks and stagger the output by 2 bytes as needed, much like dd, but using larger steps through the data.

What record length is expected?
What record terminator expected?
What was the record select expression? All columns?

By default bcp will deliver cr-lf (\r\n) line ends and tab (\t) column seperators.

Be sure to do a $ od -x | head to get an impression of what is coming at you in detail.

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting

Hein van den Heuvel
Honored Contributor

Re: FTP a file

Oh and btw... when mucking with this kind of stuff, do yourself a favor and make a small test case first.

As my teacher used to say... design allowing for your first solution to be thrown away... because you will.

In your case that probably means to re-execute a bcp and on the SELECT add a LIMIT clause to say 1000. Now transfer and try.
So much more manageable!

Cheers,
Hein.
Oviwan
Honored Contributor

Re: FTP a file

Hey Hein

the parameter of bcp
-t "" -r ""\n

I'm trying to load these files (2000) with sqlloader in an oracle db. and it works also for each row except for the first because this characters.

so I will load each file with sqlloader and after that I write the first line of each file to another file and remove this characters. after that I load this new file.

The problem is, that I can't edit a 20Gb file for example with vi.

what is the best way to redirect the first line of a file to an other file?

Thanks
Hein van den Heuvel
Honored Contributor

Re: FTP a file

Is there a good, strong reason you used as record terminator () ?
That's why 'vi' fails!
Why not accept the default (\r\n) or specify just the unix standard newline (\n = linefeed )

In perl you can specify the line terminator to be , but not in vi. So maybe this will work. (untested)

perl -pe 'BEGIN {$/=} $_ = substr ($_,2)} old > new

This might work on a small file but fail on a large one due to 'slurp'ing the file. If so, use read/sysread to stomp through the file in chunks, copying as you go.

hth,
Hein.


rajdev
Valued Contributor

Re: FTP a file

Hi Oviwan,

have u tried using sed/head/cut etc
I am not sure if it works for > 20GB files , otherwise you have to use split command to split into multiple smaller size files.

>>>>> what is the best way to redirect the >>>>> first line of a file to an other file?

you can try this :

### check if u have enough space

head -1 file | cut -c 3- > file.line1
sed '1d' file > file.remaining

you can use these files to load to the database.

Regards,

Rajdev

Hein van den Heuvel
Honored Contributor

Re: FTP a file

More thoughts....

Maybe you do want the \r\n dos default terminator to keep it a two-byte sequence. Just tell sqlloader?

>>> and it works also for each row except for the first because this characters.

if all but the first line is there, then why not fix the database entry for that first line?

You could extract just enough to have the first record using for example : $dd -count=1 -bs=512 if=export of=tmp
Then edit this small file and sqlload the missing record?

Hein.


Oviwan
Honored Contributor

Re: FTP a file

what I'm trying to do is to migrate a ms sql db to oracle.

I export the tables with bcp (with the -w parameter and -r "") of mssql to a *.dat file.

and create the *.ctl files with the oracle migration workbench. the record separator is automatically "". I could edit the files to change it, but this are over 2000 files...

now I have the ctl and dat files for sqlloader.

In the dat files are the two first bytes these ugly characters. so the sqlldr process fails... if I delete this chars with vi and do run sqlldr then the ö ä ü char converts to a ¿ sign. when I open the dat file with vi i see ö ä ü .

does someone have experience with this kind of migration?