- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- REMOVING UNICODE CHARACTERS from file
Operating System - HP-UX
1820879
Members
5312
Online
109628
Solutions
Forums
Categories
Company
Local Language
юдл
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
юдл
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-08-2009 11:46 AM
тАО09-08-2009 11:46 AM
REMOVING UNICODE CHARACTERS from file
Does anyone know how to remove unicode characters from a file in unix?
2 REPLIES 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-08-2009 12:30 PM
тАО09-08-2009 12:30 PM
Re: REMOVING UNICODE CHARACTERS from file
Shalom,
It would help to know what characters specifically and how they got there. Samba? FTP transfer. Email as an attachment? If so how was the file transmitted.
dos2unix
See the man page, it might help.
SEP
It would help to know what characters specifically and how they got there. Samba? FTP transfer. Email as an attachment? If so how was the file transmitted.
dos2unix
See the man page, it might help.
SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-08-2009 12:45 PM
тАО09-08-2009 12:45 PM
Re: REMOVING UNICODE CHARACTERS from file
Your problem can be re-phrased as "remove everything that is not an ASCII control character or an ASCII printable character".
When a problem is presented in this way, it's easy to find a solution using the standard "tr" command.
Example: file.utf8 contains Unicode UTF8 characters, and file.txt will be the stripped version.
export LC_ALL=C
tr -dc '[:cntrl:][:print:]' < file.utf8 > file.txt
unset LC_ALL
Setting the environment variable LC_ALL to C for the duration of this command is important: it explicitly switches off the Unicode support and tells tr that only ASCII characters are considered to be "printable".
This command can be run as an one-liner too:
LC_ALL=C tr -dc '[:cntrl:][:print:]' < file.utf8 > file.txt
MK
When a problem is presented in this way, it's easy to find a solution using the standard "tr" command.
Example: file.utf8 contains Unicode UTF8 characters, and file.txt will be the stripped version.
export LC_ALL=C
tr -dc '[:cntrl:][:print:]' < file.utf8 > file.txt
unset LC_ALL
Setting the environment variable LC_ALL to C for the duration of this command is important: it explicitly switches off the Unicode support and tells tr that only ASCII characters are considered to be "printable".
This command can be run as an one-liner too:
LC_ALL=C tr -dc '[:cntrl:][:print:]' < file.utf8 > file.txt
MK
MK
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
Company
Learn About
News and Events
Support
© Copyright 2025 Hewlett Packard Enterprise Development LP