- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- grep for asian characters in UTF8 file
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-27-2005 03:13 AM
тАО08-27-2005 03:13 AM
Each line has 2 columns: an english sentence in the first column and its C,J,or K translation in the second column. (tab separated)
However some lines contain only English and are of no use to me. So I want to locate and discard the english-only lines containing no valid asian data.
cat -A allows me to see certain escape codes in this file, such as tab (^I), and carriage returns (^M$) and also the -A flag shows me that each asian sentence begins with an uppercase M. I'm assuming that represents some code signalling a switch to asian text.
I guess what I am really looking for is a way to grep for asian characters..
Solved! Go to Solution.
- Tags:
- grep
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-27-2005 04:27 PM
тАО08-27-2005 04:27 PM
Re: grep for asian characters in UTF8 file
cat -v
or have your also tried using:
strings
Also, you might want to verify your Shell Environment variables:
$ set
Just to verify your settings for UTF.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-27-2005 06:58 PM
тАО08-27-2005 06:58 PM
Re: grep for asian characters in UTF8 file
cat -v gives the same result. (just not sure how to interpret escape codes, if any)
running strings (debian binutils) on it only gives me the english-- no asian characters, just empty space..
Output of locale:
LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=en_US.utf8
snippet of my file: (see attachment if this appears garbled)
--------------------------
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-28-2005 04:56 PM
тАО08-28-2005 04:56 PM
Re: grep for asian characters in UTF8 file
grep -v "^[a-z,A-Z,tabs,spaces]*$
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-28-2005 05:18 PM
тАО08-28-2005 05:18 PM
Re: grep for asian characters in UTF8 file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-28-2005 09:06 PM
тАО08-28-2005 09:06 PM
Re: grep for asian characters in UTF8 file
Don't know the exact syntax but i think you got the meaning.
Just grep -v all lines which contain only a-z,A-Z,spaces,tabs,nos.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-28-2005 09:35 PM
тАО08-28-2005 09:35 PM
Re: grep for asian characters in UTF8 file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-28-2005 10:09 PM
тАО08-28-2005 10:09 PM
Re: grep for asian characters in UTF8 file
I think Gnu grep has a option to look for ascii values of characters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-28-2005 10:22 PM
тАО08-28-2005 10:22 PM
Re: grep for asian characters in UTF8 file
This is not related to ur question but i need some information regarding korean language setup.
1.If i using ICONV cmd it converting english to Korean but if i am trying thro'Keyboard(I changed Korean fonts in windows),not getting i/p from kbd(only english taking)so pls help me for this
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-28-2005 11:18 PM
тАО08-28-2005 11:18 PM
Re: grep for asian characters in UTF8 file
perhaps you could instead look for a certain line structure, e.g. lines ending with a certain code not followed by any text.
For instance, based on your attachment, start by discarding lines ending in either "
- and afterwards throw away the rest, among other lines, those ending in "LS31" followed by two digits.
If that is an idea, try the below single-line example as a starting point:
# grep -vE ".*[
regards,
John K.