Ranto (JBR Appendix

Allophony is one of those features of spoken languages that even fluent polyglots are often completely unaware of, since it's possible to pick up a foreign system by “feel”, without being conscious of the gory details of what's going on in one's mouth and throat (and occasionally nose).

The “phonemes” that each language builds words out of aren't predefined globally standard units; they are sets of sounds grouped together by criteria made up by that language. Fluent speakers hear all the different realisations of the same phoneme (known to linguists as its “allophones”) as essentially the same thing, but this is because they've learned to ignore what's really hitting their ears.

As an example, English has a phoneme /l/ with allophones that can include the following:

The L in bland is a “plain” voiced alveolar lateral approximant. It's produced by putting the blade of the tongue against the ridge behind the teeth, forming a characteristic kind of incomplete seal (air escapes around the sides, hence the name “lateral”). It's accompanied by the humming effect produced in the throat that's known as “voicing” (which continues uninterrupted all through that word).
The L in athlete is something else: since it follows a dental fricative sound, it tends itself to be pronounced energetically, with the tip of the tongue still touching the teeth, while the rest of the tongue will already be getting in position for the following vowel. Meanwhile, there's no voicing during the TH, and it takes a while for it to start again afterwards, so this time the sound is a partly devoiced dental lateral fricative with palatalisation.
And as for the L in wool, the dominant pronunciation in southern England and parts of the US isn't even a lateral – it's just a semivowel pronounced in the back of the mouth (very like the one at the start of the word). In the dialects that don't go this far (such as my own), it's still normal for this /l/ to be “darkened” (technically, “velarised”), with the back of the tongue raised in the same way that it would be to produce a following /ɡ/.

These within‐set variations may seem trivial – because English phonology defines them as trivial. But different languages build the sets in different ways and have different attitudes to what makes sounds “essentially the same” or “obviously distinct”. It isn't as simple as “all sounds involving a lateral articulation are /l/”; some English /l/s aren't laterals at all, and many languages divide the space up into several discrete lateral phonemes – dialects of Irish have up to four! Which means the things that sound to you like clear instances of L (or R, or T…) may be unrecognisable as such to those with a foreign linguistic background.

So what happens in Esperanto? Are the individual sounds as totally invariant as the corresponding letters, or do they flex slightly to make things easier to pronounce in context? Does N always represent a purely dental/alveolar nasal even in words like sinjoro = “Mr.” or Honkongo = “Hong Kong”? Anybody trying to pronounce the word ŝnuro = “rope” is likely to find it much easier if it's legal to make the N slightly devoiced, with the tongue making contact somewhere further back in the mouth than usual (to match the preceding Ŝ), and possibly coloured by the following vowel – in fact the whole word might be pronounced with lip‐rounding. Is that allowed?

Or consider words like eKZorci = “to exorcise”, oBServi = “to observe”, with their consonant clusters that mix voiced and voiceless sounds. An English‐speaker would naturally expect the whole cluster to follow the voicing of its first element, so that the words are effectively pronounced as if they were written eKSorci, oBZervi. However, Zamenhof's native languages do this the other way round, making them eGZorci, oPServi, and these pronunciations are so widespread among Esperantists that some sources advocate them as the realisations to aim for. Is that standard Esperanto or just a tolerated mispronunciation that learners should eventually hope to eliminate?

Zamenhof's initial writings on this topic were unclear, but when pushed for an answer he went on the record with a position of unequivocal wishy‐washiness. The impossible invariant pronunciations were always the technically correct ones, but there shouldn't be any requirement for Esperantists to learn to pronounce them that way; instead everybody should be allowed to get away with using the pronunciations they find natural, because (not “so long as”!) it wouldn't interfere with mutual intelligibility.

This has crazy implications in two directions at once. On the one hand, it confirms that in correctly pronounced Esperanto the principle of “one sound: one letter; one letter: one sound” is an ironclad law, and that the only rule about what allophonic variation occurs in what contexts is that none occurs in any context. In other words, Esperanto isn't quite a real human language, in that its phonological system lacks the components required to make it speakable.

And on the other hand, Esperantists are encouraged to talk with an accent, mispronouncing and mishearing things in whatever way they're used to. The examples Zamenhof gave were all pronunciations that would be natural for him as a native of Białystok; they weren't likely to result in communication problems, but this is unsurprising given Esperanto's close compliance with the Eastern European phonological standard. Even within Europe, speakers of other languages following Zamenhof's instructions would be likely to cause trouble with their varying habits:

A lot of UK English‐speakers naturally turn aRmo = “an arm” and amo = “love” into homophones;
French‐speakers tend to merge kioM = “how much” with kioN = “what (obj.)”;
Spanish‐speakers have trouble distinguishing raBi = “to rob” from raVi = “to delight” or Juro = “a law” from Ĵuro = “an oath”.

All of these mispronunciations are avoidable, if learners put special effort into overcoming their ingrained articulatory habits. That task is an ordinary part of learning to speak a language… but it's only possible if the target language has a working phonological system of its own that you can acquire!

The official rule is that there are no rules, because there don't need to be rules. As usual the only people who really can get by on their native intuitions without learning any explicit new rules are learners from the same corner of Europe as Zamenhof. Everyone else has to imitate them without the aid of any authoritative guidelines.

LEARN NOT TO SPEAK ESPERANTO

ALLOPHONY