English/Transliterated Persian Translator

UPDATE: Not even a week after I made this, Google announces that they will be shutting down or deprecating all of the APIs used in this project. This is slightly frustrating.


Quick, pronounce this:

هاورکرافت من پر مارماهى است

This is Persian. It uses the Perso-Arabic alphabet. If you're like me (that is, you don't know Persian), you can't pronounce this easily, so tools that translate from English to Persian aren't of much use. Instead, you want an English to transliterated Persian translator; unfortunately, these don't exist.

So I made one.

How it Works

Google provides APIs for translation, English to Persian transliteration, and Arabic diacritization.

Transliteration is the act of converting the alphabet used to represent a languge from one to another. Transliteration to English (from, for example, Arabic-based alphabets) is well-studied. Often a user will have access to only a western keyboard, but want to type in Persian, or Chinese, or Russian. Tools like Google Transliteration will do this, but they don't go the other way.

Transliterating well is enormously difficult, as hard as translation: word meaning and intent is important. For example, how do you pronounce "bow"? It's not clear, because the word for the thing that shoots an arrow is pronounced differently from the word for bending at the waist. In other alphabets, these could have different transliterations.

However, transliterating poorly is often not hard: make a one-to-one letter map from one alphabet to the other. However in Persian, the written form has no vowels--doing this naively would result in a mess of consonants. Diacritization, the act of adding the vowel pronounciation marks to Persian, can provide the vowels.

To translate from English to transliterated Persian, the tool first translates to Persian, diacritizes the result, and then transliterates the diacritized Persian-alphabet text back to English. The first two steps are done using Google's APIs, and the last step here is implemented, in rudimentary fashion, by me. The first two steps also tend to make errors--after all, translation and diacritization are hard problems. Nonetheless, the end-result seems to be OK: for simple sentences, the end result is readable and either correct or close to correct. Often, the translation itself will be poor, which is a limitation of translation being such a hard problem.

The other direction, transliterated Persian to English, is easier implementation-wise, because Google's will transliterate the "pinglish" back to real Persian (transliteration API) and then to English (translation API). It's just a matter of stringing the APIs together.

(The pronunciation of the phrase at the top? "havercrafte man pore marmahi ast")