cancel
Showing results for 
Search instead for 
Did you mean: 

BOM  charater issue

AwadheshPandey
Honored Contributor

BOM  charater issue

Dear Gurus,

I am getting BOM character () in unix files. Do you have any idea how to get rid off.

Regards,

Awadhesh
It's kind of fun to do the impossible
2 REPLIES
Steven Schweda
Honored Contributor

Re: BOM  charater issue

> [...] BOM character () [...]

I don't know what a BOM character is, and, as
you can see, this forum is not very good at
rendering exotic ASCII characters.

> I am getting [...] in unix files.

Getting _how_? Which "unix files"? (What
_are_ "unix files"?)

> Do you have any idea how to get rid off.

Stop putting them in there in the first
place?

man sed
Matti_Kurkela
Honored Contributor

Re: BOM  charater issue

BOM = Byte-Order Mark, an optional feature in Unicode text files. It should appear at the beginning of the file only. The modern version of the Unicode standard says it should not be used in the middle of text.

http://en.wikipedia.org/wiki/Byte_order_mark

In UTF-8, the BOM is represented as a three-byte sequence: 0xEF,0xBB,0xBF.

This "bomfilter" script could be used to filter the UTF-8 BOM character out from any text piped to it:

#!/bin/sh
BOM=$(/bin/echo \\0357\\0273\\0277\\c)
sed -e "s/$BOM//g"

Examples of use:

bomfilter < bomtext.txt | more

grep someword bomtext.txt | bomfilter | more

MK
MK