REMOVING UNICODE CHARACTERS from file

MBacc · ‎09-08-2009

Does anyone know how to remove unicode characters from a file in unix?

Steven E. Protter · ‎09-08-2009

Shalom,

It would help to know what characters specifically and how they got there. Samba? FTP transfer. Email as an attachment? If so how was the file transmitted.

dos2unix

See the man page, it might help.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Matti_Kurkela · ‎09-08-2009

Your problem can be re-phrased as "remove everything that is not an ASCII control character or an ASCII printable character".

When a problem is presented in this way, it's easy to find a solution using the standard "tr" command.

Example: file.utf8 contains Unicode UTF8 characters, and file.txt will be the stripped version.

export LC_ALL=C
tr -dc '[:cntrl:][:print:]' < file.utf8 > file.txt
unset LC_ALL

Setting the environment variable LC_ALL to C for the duration of this command is important: it explicitly switches off the Unicode support and tells tr that only ASCII characters are considered to be "printable".

This command can be run as an one-liner too:

LC_ALL=C tr -dc '[:cntrl:][:print:]' < file.utf8 > file.txt

MK

MK

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

REMOVING UNICODE CHARACTERS from file

REMOVING UNICODE CHARACTERS from file

Re: REMOVING UNICODE CHARACTERS from file

Re: REMOVING UNICODE CHARACTERS from file