Friday 15 April 2011

Java: read bytes from a utf-8 file -


I have a file that contains UTF-8 data in this file for every Unicode word / line any BOM ( Byte order mark) and neither any length / size information.

I have to read the offsets offset (yes bytes!) Length. If the functions of the API work offset by reading the bytes, reading bytes or reading bytes, it will really be helpful.

Example Content - "100 degree information", the length of this material is 9, if I request to read 9 bytes, then he should read everything. Currently it is only reading 8. It seems that the API is treating the Unicode character as 2 characters.

How to read the content correctly? Which API is used for the same API?

But Unicode characters for the degree are actually two bytes when encoded UTF-8. A degree symbol is represented by bytes c2 b0 . If you really want to read bytes on a specific offset in a file, then you can use RandomAccessFile in Java, but I doubt what you really want. Perhaps the easiest way to do it seems that you want to use FileReader and either read an array of 9-letter characters, or a large Read only 9 characters in the character array. For example: Try

  (Reader Reader = New Input Strieder Reader (New File Inputstream (File Name), "UTF-8")) {char [] buffer = new Charge [1024 ]; Reader Read (buffer, 0, 9); }    

No comments:

Post a Comment