15 2010

Bash: unbom (to remove UTF-8 BOMs)

Tests for and removes UTF8 BOMs.

#!/bin/bash
for F in $1
do
  if [[ -f $F && `head -c 3 $F` == $'\xef\xbb\xbf' ]]; then
      # file exists and has UTF-8 BOM
      mv $F $F.bak
      tail -c +4 $F.bak > $F
      echo "removed BOM from $F"
  fi
done

USAGE: ./unbom *.txt

The magic is tail -c +4 which strips the first 3 bytes.

Leave a Reply

Elements allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>