Bash: unbom (to remove UTF-8 BOMs)

Tests for and removes UTF8 BOMs.

#!/bin/bash
for F in $1
do
  if [[ -f $F && `head -c 3 $F` == $'\xef\xbb\xbf' ]]; then
      # file exists and has UTF-8 BOM
      mv $F $F.bak
      tail -c +4 $F.bak > $F
      echo "removed BOM from $F"
  fi
done

USAGE: ./unbom *.txt

The magic is tail -c +4 which strips the first 3 bytes.

2 thoughts on “Bash: unbom (to remove UTF-8 BOMs)

  1. Alessandro says:

    Good & thanks for sharing. You may want to change:
    for F in $1
    do

    to something like:

    while [[ x$1 != x ]]
    do
    F=”$1″
    shift

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.