BOM  charater issue

Honored Contributor

BOM  charater issue

Dear Gurus,

I am getting BOM character () in unix files. Do you have any idea how to get rid off.


It's kind of fun to do the impossible
Steven Schweda
Honored Contributor

Re: BOM  charater issue

> [...] BOM character () [...]

I don't know what a BOM character is, and, as
you can see, this forum is not very good at
rendering exotic ASCII characters.

> I am getting [...] in unix files.

Getting _how_? Which "unix files"? (What
_are_ "unix files"?)

> Do you have any idea how to get rid off.

Stop putting them in there in the first

man sed
Honored Contributor

Re: BOM  charater issue

BOM = Byte-Order Mark, an optional feature in Unicode text files. It should appear at the beginning of the file only. The modern version of the Unicode standard says it should not be used in the middle of text.


In UTF-8, the BOM is represented as a three-byte sequence: 0xEF,0xBB,0xBF.

This "bomfilter" script could be used to filter the UTF-8 BOM character out from any text piped to it:

BOM=$(/bin/echo \\0357\\0273\\0277\\c)
sed -e "s/$BOM//g"

Examples of use:

bomfilter < bomtext.txt | more

grep someword bomtext.txt | bomfilter | more