Freitag, 25. Januar 2013

ANSI and UTF-8 without BOM

Java expects that textfiles doesn't have any BOMs.




From Wikipedia, the byte order mark (BOM) is a Unicode character used to signal the endianness (byte order) of a text file or stream. Its code point is U+FEFF. BOM use is optional, and, if used, should appear at the start of the text stream. Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in.


// FEFF because this is the Unicode char represented by the UTF-8 byte order mark (EF BB BF). Java-Representation of a BOOM
public static final String UTF8_BOM = "\uFEFF";


IF the file right in front of you doesn't containt any special characters, there is no difference between a ANSI-Version of the file.

If you want to read a File with a BOM in Java have a look at
http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html


Keine Kommentare:

Kommentar veröffentlichen