Tuesday 22 May 2012

A new system for transliterating Khmer

The very idea of "a new system" has a bad name these days (perhaps because the people who propose them tend to be stuck on the notion of newness as the primary purpose of invention, or else because they pursue systematization as an end in itself).  So, instead, I'm calling this "a modest proposal" for transliterating Khmer (into Romanized phonetics).  Hopefully, the notion of a modest proposal doesn't have such negative connotations.

I'm not offering a lengthy explanation of the advantages of the system, because the main advantage of the system is that it can be understood easily enough without much explanation.

If you're already fluent in Cambodian, you should be able to figure out the correspondence just by comparing the poem in Khmer (top-left) to the transcription (top-right).  If your first language is English, but you already have had the (frustrating) experience of using the inconsistent mix of phonetic symbols found in Khmer-English dictionaries and textbooks, you should find this system relatively easy to guess at.

The system isn't perfect, and it isn't intended to be perfect: it's intended to offer a flexible solution that can easily be used with pen-on-paper to take notes (for students learning Khmer at any level).

It does also have the advantage of a one-to-one correspondence with I.P.A. symbols (more on that later) but we should be wary about pursuing that correspondence too far in the name of perfection: with increasing accuracy of phonetic notation, you have decreasing flexibility for the student and increasing confusion for the native speaker.

If we transcribe Khmer with a high level of phonetic accuracy, you would have very different spellings for different dialects, and even for different accents found within Phnom Penh.  The same is true of English within the city of London, England: if we spelled English more accurately (noting the sound more perfectly) we would spell words very differently for people living in different neighborhoods of London.

If you don't think this is a problem for Cambodian, try comparing the work of Judith Jacob to the work of Huffman and Proum.   They both use I.P.A. symbols in a mutually-incompatible way, partly because of differences in their analyses of the language, but partly also because of differences in the dialect (or idiolect) of the speakers they treated as standard.

Now try comparing both of the above to the more recent work of J.M. Filippi.  If you actually did all of this comparative reading (in a systematic way) you would probably have invented your own transcription system for Khmer by now.

I think there are many students informally creating their own "halfway measures" in the process of learning the language, trying to note down Khmer phonetics in the absence of a practicable standard.  I remember being shocked at the hand-written notes taken by two different foreigners in Phnom Penh, that used a mix of different symbols from different dictionaries (with mutually-incompatible phonemic assumptions).

Returning to the illustration that introduces my own "modest proposal": the phonetic values of the symbols are as simple and self-evident as possible, while giving priority to the ease of rendering them with a pen.

For anyone who already has some knowledge of both Khmer and the Latin alphabet (as used for English, French, etc.), it should be fairly easy to figure out the entire system just by reading the single (short) poem in the top half of the illustration above.

The phonetic values of the symbols are systematic. I don't just mean that they correspond systematically to the I.P.A. symbols; rather, I mean that there's a systematic relationship internal to this ("new") set of symbols.

As you can see from the chart below the poem, there is a systematic relationship between ą and ǫ (one is "open a", and the other is "open o"; thus, without any complex linguistic explanation, it is very easy for a student to catch on to the logic of why ę is different from é, just by studying a few examples).  The "hook below" (forming ǫ from o) thus has a consistent, logical meaning; likewise, the "forte mark above" (forming é from e) has a consistent meaning (so it is easy to remember how á is different from ą, because it follows the same pattern as the other vowel-markings).

These symbols are especially good at rendering sounds that are "foreign to foreigners".  Look at the first word on line five (in both Khmer and transliteration: រឿង = Rÿüŋ).  If you were to find this word in five different Khmer-English dictionaries, you would find five different (awkward) attempts to render the sound into English letters.  If you then consulted the dicionaries' explanations as to how these vowels are pronounced, you'd be even more confused.  The I.P.A. symbols, meanwhile, are very difficult for anyone to interpret who isn't a linguist (and, even worse, they aren't very clear or useful in their application to Khmer).

In I.P.A., Rÿüŋ becomes Rɨɘŋ.  If you try writing them both down quickly with a pen, you'll see what I mean about the impracticality of the I.P.A. symbols: it is very easy for ɨ to look like a lower-case t, and it is hard for anyone to rapidly draw ɘ/ə/e without making mistakes.

Unlike many other systems, this one is actually practical for use with pen and ink.  By contrast, it is impossible to write (in cursive script) the symbols that the I.P.A. chart relies upon (bottom-right of the illustration).  The I.P.A. symbols also rely on the eye differentiating too many symbols that look extremely similar.

On the I.P.A. chart, you've basically got the letter e upside down, backwards, etc. etc., and there is no systematic relationship between them in learning Khmer (e.g., you can't guess how a relates to ɑ by comparing them to how ϵ relates to ϶; but, in the system I'm now modestly proposing, these are á vs. ą and é vs. ę; this is both easier to remember, and has a logic to it that is suited to the contrasts internally necessary for the language).

As I've hinted at before, if you have five different Cambodians say this word out loud, you will not get the exact same vowel sound all five times; however, each speaker will (probably) be consistent in their pronunciation of the cluster -ÿü- in different contexts.  This is another reason to shift away from the direct use of I.P.A. symbols when you don't want to use full phonetic notation: in travelling around Cambodia, you need to be able to think, "Oh, that's the way they pronounce rÿüŋ here"; ultimately, symbols like ÿ just indicate a category (of allophones), and not an entirely specific sound.

Is this system systematic enough?  Is it specific enough?  How much is enough?  Frankly, the answer may be a matter of taste.

If you look at the first line of the poem, you'll see the symbol ː in the middle of the last word, ˀąnláːy.  This symbol is used to explicitly mark a long vowel where it is important, but I don't mark every vowel as long or short with this symbol.  This is, by the way, the standard I.P.A. method of marking long vowels, and it has the awkward name of "the triangular colon" (obviously, in hand-writing, everyone just puts down two dots).  Sometimes, in Khmer, it is important to distinguish a long vowel from a short vowel (as in my former example of ស្លាប់ = sláˑp, not sláːp) but it is only important in a minority of words.  Cambodian also has vowels that can be of variable duration, without changing the meaning of the word.  I don't mark any of the vowels as "especially long" in the first four words of the poem (Préy Véŋ viel véŋ = ព្រៃវែងវាលវែង) because what is really important for the student to know is the type of vowel sound in these words (é is not the same as ę) but it would be possible for someone to add more detail (such as distinguishing véŋ from véːŋ) --adding many more dots and markings to the page (and thus making the system more complicated, but also more accurate).

This discussion (kept as brief as possible) has been completely devoted to vowel sounds.  Why?  Because Cambodian has huge difficulties in transliterating vowel sounds, but almost no problems with transcribing consonant sounds.  The one innovation that the reader will see (on lines two and three of the poem in the illustration) is a distinction between ch and cʜ, to help clear up confusion between the different consonants in Khmer that bear this sound.  Despite the fact that I'm a Pali scholar, I actually don't think it's useful to insert further markings to indicate the correspondence to the Pali and Sanskrit alphabet (you'll notice at the end of line six I have láęy, but the corresponding Khmer "l" is Pali retroflex-l, writ with a dot below, as , looking rather too much like an exclamation point).

The one symbol that may surprise some people is the use of a small circle for the schwa sound (shown in the middle of the chart, corresponding to I.P.A. ə).  This is because Khmer frequently has a very short schwa (I.P.A. , even harder to write than ə).  I have seen this sound noted just with an apostrophe, perhaps because Khmer speakers often omit it, or pronounce it almost like a pause between other sounds.  The small circle proposed here is less confusing (when later reading your own hand-writing) than an apostrophe, and, you'll note, I still use a full-size schwa (ə) for the rare situations in which this vowel sound can be heard in full with some duration and emphasis (e.g., I wrote line 8 as "Préy ˀəy…").

I did use this system myself for several years.  (How many years?  Well, my comparative study of Cambodian phonetics and romanization dates back at least 10 years, and there are still some resources on the internet that I uploaded during that period).

A new system doesn't need to be perfect: it simply needs to offer some advantages over the ones that were used before.

What are my expectations for this system of transliteration?  It seems reasonable to assume that it would never be used by more than five people --although it might be extremely useful to all five of them.

Conversely, there are some positive examples of new transcription systems that have come into use simply through people posting them onto the internet, and allowing flexibility for anyone to use them however they see fit (i.e., unlike a government-enforced spelling standard).

For Taiwanese (a.k.a. Hoklo) a new system was proposed simply on a website and then came to be used in providing sub-titles to karaoke songs on Youtube.  Why did this happen?  Well, people needed a way to write the phonetics of the language that could work well enough on the internet, and they didn't find it in the dictionaries.

Currently, the universe of Cambodian youtube videos is dominated by informal Romanization that's neither accurate nor consistent (see the lyric sheets that accompany almost any Khmer song online).  Partly because the Cambodian internet is dominated by overseas Khmer who really do need phonetic transcription (because they're born and raised in Long Beach or Paris, as the case may be) there is a real need for a standard that is "good enough" --but much more practical than full I.P.A. notation.

My own modest proposal can be simpified further, simply by omitting anything the reader finds annoying (I'm guessing that most Khmer would not bother to note the glottal stop [ˀ] at the start of ˀąnláːy, nor the long vowel [ː] toward the end of it).

So, I've set out a simple grid of symbols, that anyone can use or ignore as they please: that's the modest proposal.  However, the one thing I'm not modest about is that this new system is less of a mess than the other systems that I've seen in use in Phnom Penh, from the classrooms to the street-signs.  Phonetic notation for classroom use has driven the development of the other standards aforementioned (J.M. Filippi uses a confusing hybrid of I.P.A. symbols, but it is probably the best textbook currently available, despite whatever flaws it may have) --but the only way to make a system that lasts is to invent a system that native speakers themselves adopt.  In the 21st century, whatever people prefer to use on Youtube videos probably will come to define the written language (no matter how I may choose to transliterate this poem).