Archive for April, 2007|Monthly archive page

Converting files to UTF-8

Here’s a common problem you often face as a Java ME programmer:

You’re internationalizing your game or application, so you send all of your files full of texts and labels to the translators, and you get back a bunch of files that are saved in some standard character encoding scheme, but not utf-8. You can’t bundle these files directly into the resource directory of your game’s jar file because some devices won’t be able to read them. Ideally, the solution is to tell the translating service that you need the files in utf-8 format, but often it isn’t the developer who is in charge of this, and sometimes such information gets lost in the shuffle. So your product manager hands you a pile of files and leaves you to figure out how to make them work.

Many standard text editing programs (emacs, for example) are capable of reading in a text file in one encoding and saving it in another. But if you’re a professional software engineer, you don’t want to waste your time opening up fifty files one by one, changing the encoding, and resaving them — especially if you’re likely to get more files and updates later.

What to do?

Some operating systems have built-in commands (such as native2ascii) to change the character encoding of a file. But looking at my options, I’d say the simplest and most portable solution is to just write a trivial little Java SE file converter, like this:

/**
* A utility to convert text files to utf-8.
*/
public class FileEncoder {

/**
* args[0] is the input file name and args[1] is the output file name.
*/
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream(args[0]);
byte[] contents = new byte[fis.available()];
fis.read(contents, 0, contents.length);
String asString = new String(contents, “ISO8859_1”);
byte[] newBytes = asString.getBytes(“UTF8”);
FileOutputStream fos = new FileOutputStream(args[1]);
fos.write(newBytes);
fos.close();
} catch(Exception e) {
e.printStackTrace();
}
}

}

Because it’s written in Java, you can call this directly from your Ant build script (see this post for an example of calling an arbitrary Java program from an Ant script). That way you can actually leave the originals as they are and create the corrected files on the fly while building the rest of the resources for each target device.

In the above example, I’ve hard-coded “ISO8859_1” as the encoding of the source file. That’s ISO Latin-1, a character encoding I see a lot of here in France. For a list of other encodings supported by java (and their names for use in Java) look here. Note that the names of the encodings are a little different in Java SE (formerly J2SE) than they are in Java ME (J2ME). So in the above Java SE program, I write the output file in “UTF8” but once I’ve read the resource file into a byte array in the Java ME program on the device, I convert it to a String as follows:

String contentsOfMyDataFile = new String(dataFileByteArray, 0, dataFileByteArray.length, “utf-8”);

Now if you want to hard-code a string with non-ascii characters directly into your Java ME application, what you do is completely different from what you do when reading resource files from the jar. In the code, you use escape characters. “\u” is the signal that what follows is a unicode character code. A standard example is printing a price in euros: to put the euro symbol in a String in your code, you would write “\u20ac”

For a list of character code charts, look here.

Advertisements

Practical Java™ME Game Projects with MIDP 1/2/3

java_game_book.gifMy new book Practical Java™ME Game Projects with MIDP 1/2/3 has just been announced on the publisher’s website! It’s coming out in September.

Highlights include a lot of the same topics covered in this blog — most blog entries are at least related to projects I did for this book. The examples in the book are complete games and covered in more depth than here, plus lots of additional topics are explained and illustrated!