Esperanto was invented in 1887 by an oculist from Białystok, Dr Ludwig L Zamenhof (AKA “Doctor Hopeful”). Even its proponents estimate there to be barely a million Esperanto speakers in the world (largely Central/Eastern Europe); compare Albanian with about six million, Mandarin Chinese with 1000 million, and English with (depending how you count) 400 to 1800 million. Here in Scotland I've met more Klingon speakers than Esperantists!
Most people I know despise Esperanto, but largely for daft reasons – “Everyone speaks English nowadays anyway”, “It sounds a bit foreign”, “It has no cultural identity of its own”, etc. I, on the other hand, dislike it for being:
So the result of Zamenhof's labours is that it's inconceivable that any constructed international auxiliary language, however good, could succeed.
An optimally designed world auxiliary language would be
My contention is that Esperanto contrariwise is
It looks like some sort of wind‐up‐toy Czech/Italian pidgin. And if there's one part of this world that doesn't need a local pidgin, it's Europe, which is not only the continent whose languages are best covered by online translation services, but also the home of the current de facto global lingua franca: English.
If Esperanto vanished from existence, nothing of value would be lost; the world shows no sign of wanting to learn an invented common tongue. Maybe someday that'll change – but if it does, we'll have no shortage of candidates to choose from, since Esperanto has any number of better designed but less well known competitors. (They may have fewer existing speakers, but the difference is dwarfed by the billions they'd need to gain to be accounted a success.) Or the UN could hire a linguist or two and get a language purpose‐built, the way Hollywood now routinely does for fantasy movies!
I'm following standard convention for linguistics here: /slant/ brackets for phonemic analyses, [square] ones for phonetics, and angle brackets (invisible if your browser is ignoring my CSS) for words spelled in the conventional orthography. The proper Unicode characters are now used both for IPA symbols and Esperanto's accented letters, as in ĉirkaŭŝmiraĵo = “stuff smeared around” (pronounced roughly “cheer‐cow‐shmee‐RAH‐zhoh”).
Please bear in mind that my critique is aimed at Esperanto's pretensions as a global auxiliary language; if you're a hobbyist polyglot looking for a seventh European language to learn, feel free to waste your spare time on it. If you're a hobbyist language inventor looking for something to base your constructed language on, I'd advise you to start from somewhere else. This 130‐year‐old design isn't worth trying to fix up; and Esperantists refuse to consider the possibility of directed reforms anyway, because the core grammar established in the “Fundamento” was set in stone in 1905.
The mutually distinct sound units that a given language recognises as elementary building blocks for word‐making are referred to as “phonemes”. Esperanto has 23 consonants and (if we assume the common diphthongs like OJ count as sequences rather than phonemes) just five vowels:
By way of comparison, English (or my dialect, anyway – details available) has 24 consonants and 19 vowels, since diphthongs like OY are normally counted as phonemic.
Languages vary hugely in what articulations they are willing to recognise as an “R‑sound”, or even a “T‑sound” (which in English can among other things be an aspirated alveolar plosive, a glottal stop, or a tap; in Spanish that tap is more likely to be heard as R, and T is unaspirated and dental). In the absence of clear definitions, the compromise Esperantists have settled on is that “R‑sounds” like the French/German version are acceptable but the best pronunciation is the Italian one, which also happens to be standard in the Slavic family.
Does the inventory really need to include a /ts/ phoneme (C as in cico = “a teat, nipple”, pronounced “TSEE‑tso”), given that the second column has an odd gap where its voiced partner /dz/ would go? Esperanto could easily do without some of those consonant phonemes, like all these languages that get by with far fewer:
The languages of Zamenhof's home region show a strong areal resemblance; Belorussian and Yiddish and Lithuanian all use similar sets of consonant phonemes, prominently featuring various sounds that are uncommon in global terms and falling into a characteristically European grid pattern. Compare the Esperanto inventory with the following (parentheses indicate phonemes that are slightly disguised by the spelling system):
Zamenhof included in Esperanto every Polish phoneme with a consistently used letter or digraph of its own, leaving out only the ones that are harder to recognise as phonemes: the nasal vowels, “soft” (palatalised) consonants, and /dz/.
Complaints about the ugliness of this roster of sounds are always brushed off as a matter of taste. But surveys say distinctions like /h/‐vs.‐/x/, /ts/‐vs.‐/tʃ/, /v/‐vs.‐/w/, /z/‐vs.‐/ʒ/ are statistically rare, so it's the people who find Esperanto's phonology bizarre and awkward who are being objective! Zamenhof himself clearly acknowledged some sounds as worth avoiding: /ʒ/ (as in seiZure) occurs in just one common Esperanto root, while /x/ (as in loCH) is even rarer, appearing only in uncommon roots. Adding them to Esperanto was a misstep that could have been fixed easily at any point.
It's bad enough that Zamenhof couldn't see past the spelling conventions of the first language he learned to write with the Roman alphabet, but note that I compare the inventory to Eastern Polish; modern standard Polish has no phonemic /h/‐vs.‐/x/ distinction. It's not just some vague Central/Eastern European bias – it's specifically a local dialect of Polish!
Unsurprisingly, Esperanto spelling is much more regular than English, in which GH is famously unruly – see my own Spelling Reform page. Its graphemes (the contrastive units in a spelling system) can even be charted in a strict one‐to‐one correspondence with its phonemic inventory.
Using a fully phonemic orthography (one phoneme: one grapheme) is a sensible policy. Unfortunately, sources such as “Teach Yourself Esperanto” go further, claiming that it is phonetic (one sound: one letter) – which is (a) pointless and (b) infeasible. In any speakable human tongue, phonemes have permitted ranges of allophonic variation in different contexts, and getting used to these rules is how you learn to speak (and hear) without a foreign accent. But Zamenhof declared both that such variation was forbidden and that Esperantists didn't need to follow the rules. A great help.
The way this spelling system irregularly pairs up the circumflexed G not with K but with C, and S not with Z but with J, must be intended to increase the recognisability of Romance‐derived words like riĉa = “rich”; but if that was such a high priority, why is the language chock‐full of spellings like ekskuzi, ĥoroj, kvarco = “to excuse, choirs, quartz”? As an inevitable side‐effect it erodes the recognisability of words from anywhere else, such as ĝazo = “jazz” or deĵoro = “duty” (Polish dyżur). And what was the point of dressing up the velar fricative Ĥ as a form of the glottal approximant H, or of introducing a special separate “semivowel” diacritic and then only using it on Ŭ, not Ĭ?
Esperanto's distinctive accented letters were a blatant display of parochial spelling traditions. Most of the world's typewriters had a W key; very few had one for Ŭ. And as for Ĉ, Ĝ, Ĥ, Ĵ, Ŝ keys… families of accented consonants (like Ć, Ť, Ż) are a common feature of the writing systems of the area from the Baltic to the Balkans; Zamenhof's idea of how to make it more international was to avoid those particular diacritics in favour of one normally found over vowels in the Romance family. The result was a set of hybrid accented characters that suited everybody equally badly.
The problems with these diacritics were obvious enough to force a concession: the Fundamento permits us to resort to the digraphs CH, GH, HH, JH, SH, plus unadorned U – hence chirkaushmirajho. Many Esperantists nonetheless reject this scheme and advocate other, heretical ASCIIifications such as cxirkauxsxmirajxo.
Just to show how easy it is, here is an alternative system with no diacritics (all compound phonemes become compound graphemes):
I've heard from a good few independent inventors of schemes like this – it's a no‐brainer. But reforming ĉirkaŭŝmiraĵo into txirkawxmirajo just demonstrates how much else you'd need to change before you'd have anything worth using!
Natural languages have sets of “phonotactic” rules governing the ways sounds are allowed to come together in sequences (both the things speakers need to be able to pronounce and the things they need to be able to recognise). Some languages – such as Hawaiian – permit only open syllables with no consonant clusters of any sort; others – such as English – have rules allowing words like hung, strengths, visions but prohibiting the ones we might write as nguh, tle, sionsvi. Instead of giving Esperanto any sort of explicit phonotactic framework, Zamenhof relied on his gut feelings about what kinds of sequence needed to be avoided. As a result his dictionary has, for example, lots of instances of the sequence CI, but oddly few cases of CU, because in the Slavic family /ts/ is historically a /k/ affected by a following front vowel.
Zamenhof coined plenty of words like knabĉjo, postscio, ŝtrumpojn (= “sonny, hindsight, stockings (obj.)”) but none like pjuz, snoŭi, ŭiv (cf. English pews, snowy, weave). Does their nonexistence mean those sequences of sounds are forbidden, or are they just accidental gaps waiting to be filled? New coinages have often led to arguments – and been resolved by the application of a parochial set of phonotactic prejudices. Thus for instance when Indian Esperantists coin the word Bharato as their preferred name for “India”, it ends up in dictionaries as Barato.
Some of Esperanto's individual sound units are themselves phonetically compounds; for instance, the affricate C is like a combination of T and S, but Esperantists are expected to be able to tell the difference between vircenso = “a census of men” and virtsenso = “a sense of virtue” (and likewise for Ĉ/TŜ and Ĝ/DĴ). On the other hand, the common diphthongs EJ, AJ, OJ, UJ, EŬ, AŬ are supposed in theory to be mere ordinary sequences of vowel plus consonant, though this orthodoxy has a couple of problems. For a start it implies that naŭa (= “ninth”) divides into syllables as /na wa/ (cf. na za = “nasal”); but if /wa/ is a legal syllable in a common word, why does Esperanto go to such lengths to avoid it in borrowings like Vajomingo = “Wyoming”? And if AJ is just a chance encounter of A and J, why do they keep happening to end up together like this while O and Ŭ never once meet in Zamenhof's whole dictionary? This is the sort of pattern that usually leads phonologists to suspect such diphthongs of being compound phonemes, just like the similar ones in English.
Esperanto's phonotactics are above‐averagely permissive, or to see it from a learner's point of view, difficult. This is another Eastern European trait – compare the way English avoids initial /ʃt‐/ in favour of /st‐/, German does the reverse, and Spanish avoids either, but the Slavic languages and Esperanto allow both: stelo = “star”, ŝtelo = “theft”. And in one case, Esperanto goes even further, ignoring a phonotactic constraint that's part of a widespread local standard: continental European languages may permit many closed syllables, but they generally cut down on the range of possibilities by losing the voiced/voiceless distinction in final consonants (which is why Austrians pronounce Arnold with a final [t] sound and Russians pronounce Chekhov with an [f]). Esperanto mostly avoids letting words end in voiced obstruents (any of B, D, G, Ĝ, Ĵ, V, Z), but makes a surprising exception for just three words, all taken from Latin: apud, sed, sub = “at, but, under”.
The whole problem is that Zamenhof mistook his own prejudices about “euphony” for an off‐the‐shelf global standard of phonotactic elegance. There is no such standard; Italian is full of tongue‐twisters to Japanese‐speakers (investigarlo = “to investigate it”) and vice versa (hyakugyoo = “a hundred lines”). Creating a phonological system acceptable to everybody requires at the very least a recognition that the language is made up of sounds rather than just letters.
While Zamenhof basically failed to give his brainchild any phonotactic system worthy of the name, he did manage to specify a canonical way to make Esperanto more euphonious: drop some of those final vowels!
The “roots” Zamenhof created for his dictionary appear to abide by a set of rules forbidding repeated letters, or strings of more than two vowels or four consonants in a row; but thanks to the way words are put together, all of these things are nonetheless commonplace in Esperanto sentences: ekssklavoj ekkriis = “ex‑slaves cried out”.
Word stress is the only part of Esperanto's phonology that Zamenhof specified clear rules for: it always falls on the second‐last syllable. This of course means that after you've learned the word VIvi = “to live”, with a stressed I, you need to recognise viVANta with no stressed I as a suffixed form of the same root (= “living”), an annoyance that (for instance) initial stress would avoid. Mind you, this is assuming that everybody agrees what “stress” is, which they don't. In English, it's a combination of factors including loudness, pitch, and duration; but there are plenty of major languages that make some or all of those into independent variables, or that don't stress any syllable in particular.
In this context, simplicity means learnable rules for building speakable words. A good proportion of the world's population find any syllable more elaborate than “(optional) consonant plus vowel” hard to pronounce, which limits things unreasonably; but it would be easy enough for an auxiliary language to make do with syllable structure rules like those of Spanish, where it never gets much trickier than uno, dos, tres.
Zamenhof boasted of having made Esperanto sound pleasantly similar to Italian, but his idea of how to achieve this objective was to begin with consonant‐crammed roots and tack on inflections with initial vowels: lingv‐oj = “languages”. But Italian allows few syllable‐initial strings of consonants (mainly things like /bl‑, ɡr‑, sp‑/); Esperanto permits many. Italian uses closed syllables sparingly (chiefly ending in /l, n, r/); Esperanto loves them. And the rigid penult‐stress rule may be like Italian, but it's even more like Polish.
When two nouns are run together into a compound, potentially creating an ugly consonantal logjam, it's explicitly left to the coiner's personal taste whether the first noun should keep its final vowel (as in mezOnombro = “an average”) or whether it should be dropped (as in pendŝnuro = “a hanging‐rope”). If I cross a hamster with an ostrich, I get to call the result a hamstrstruto! Deferring such decisions to Esperantists' native‐tongue prejudices results in a dictionary full of entries like fiŝknelo, kristnasktago, sciencfikcio = “fish quenelle, Christmas day, science fiction”.
If you hear a string of syllables like “laBLAloSAliAbla”, you might guess that it's la BLAlo SAli Abla. But wait – how do you know it isn't laBLAlos AL iABla? The idea that you can identify nouns by the way they end in ‑O (and so on) breaks down if you can't tell which vowels are word‐final; you have to learn to recognise the individual roots first to be able to pick them out, and the fact you need to ignore the class‐marker vowels while doing so just makes it harder.
Vocabulary is the most arbitrary part of language: words vary in form more or less randomly between unrelated tongues, and none of them work any better or worse than any other. Vocabulary is also the most superficial layer of a language: words are constantly getting borrowed from one dictionary into another, and this can add up to a significant proportion of the lexicon without the result being a different language. And thirdly, vocabulary is the kind of language learning that most people find relatively routine – even if there's a lot of it to do, it's no harder than learning the names of new acquaintances, while the rest is a matter of unlearning entrenched mental habits.
Above all, though, vocabulary is the most obvious aspect of a foreign tongue, so padding out your Warsaw‐centric constructed language with Romance dictionary entries can be an effective way of making it seem less parochial.
In this case I'll take “clarity” to mean having an adequate stock of technical, poetic, and everyday words to be generally usable. Zamenhof was if anything overzealous in this department – the wordlist attached to the Fundamento not only has entries such as lol‑ = “cockleweed” and pips‑ = “pip” (a disease of poultry) but two different roots, kis‑ and ŝmac‑, both equated to English “kiss” (and French baiser).
The inverse problem, overlooked by Zamenhof, is that language learners want to be able to start communicating with as little rote learning of vocabulary as possible. The trick here is to devise a strictly limited shortlist of essential words (like “house” or “clothes”) which can be used as metonyms standing in for more specialised terms (like “palace” or “sou'wester”) and combined into self‐explanatory compound words (like “treehouse” or “nightclothes”). Unless you're designing a Newspeak, you'll also have a dictionary full of fancy words, but their definitions can be written using just the essential set! This idea was pioneered by “Basic English”, which cut its core list to 850 words (largely by cheating); more recent schemes have demonstrated that a language designed from the ground up with lexical efficiency in mind can do much better.
Zamenhof followed notably warped selection criteria. His vocabulary sources were chosen mainly for their appeal to educated nineteenth‐century Europeans – and that's not a question of recognisability; Russian had more speakers than Italian, it was just less fashionable. Then he threw scraps to key local ethnic groups by adding a few words from an assortment of tongues: nepre (= “certainly”) from Russian nepremenno, tornistro (= “a rucksack”) from Danish tornister, tuj (= “immediately”) from Lithuanian tuoj… and so on. A real global auxiliary language would instead make consistent use of the vocabulary sources that have been spread across the whole world, whether by expansionist empires or the scientific community.
The language is full of spelling‐based borrowings – especially placenames, which frequently look as if they've been transliterated into Cyrillic and then back without regard for pronunciation: Jamaica becomes Jamajko, New Guinea becomes Nov‐Gvineo, Washington becomes Vaŝingtono, and so on.
When I say “Romance” dictionary entries, what I really mean is the
members of that family that Zamenhof saw as prestigious. A
lot of the time he used dog‐Latin garbled in a
distinctively dated and parochial fashion, but there are obvious
cases where he picked a basic vocabulary item directly from
French – for instance, the verb “to buy” is
comprar in Portuguese/Galician/
Esperanto goes way over the top in flagging lexical categories (AKA word classes or “PARTS OF SPEECH”) via its neat but somehow risible final vowel system:
This grand scheme was based on the idea that every root is an elementary semantic unit with one associated adjective, verb, and so on, each of which is equally basic – an idea with an attractive air of symmetry and logic, but one that turns out to be fatally flawed. Some Esperantists are still in denial to this day, but the authoritative position of the Academy of Esperanto is that each root has an underlying category; viv‑, for instance, is fundamentally a verb.
Esperanto now has a distinctive derivational system that allows any root to disguise itself faultlessly as an adjective, adverb, verb, or noun by simply changing its final vowel… while simultaneously requiring users to identify the class it originally started in to know how to coin other words from it (the most famous example is: brosi, kombi = “to brush, to comb”, and broso = “a brush”, but kombo = “an act of combing”, not “a comb”). The major reorganisation required to actually fix this might have been feasible in the nineteenth century, but by the time the problem was understood the language's grammar had been declared “untouchable”, so it was just papered over.
Knowing whether a word is functioning as a verb or noun or whatever isn't enough to be able to predict how it's going to function in sentences. Each word class actually covers multiple hidden subclasses:
If you already know the meaning of a word, you may be able to guess what subclass it falls into… or you may not, since words that are otherwise synonyms can differ in behaviour (e.g. say/speak/tell*). The final vowel system may tell you what general category a word fits into, but to know how it will behave you still need to remember its individual dictionary entry.
Grammatical categories like “adjective” or “preposition” are based not on universal logical principles but on arbitrary conventions that vary from language to language. The ones Zamenhof took for granted are based on the traditions of classical grammars, which are a poor fit for many of the tongues of Europe, let alone other continents. Hungarians won't be used to prepositions; Germans have to learn that adverbs aren't the same as plain adjectives; and Slavs have to cope with a definite article…
Shoehorning words into this system can mangle them horribly.
Esperanto is oddly happy to sacrifice final vowels, no matter how much they contribute to a word's recognisability. Asia becomes Azio, coffee/café becomes kafo, quasi (= “as if”) becomes kvazaŭ, and so on from alpaca and banana through to yoga and zebra. If only there were fewer word classes to distinguish, maybe some nouns could end in ‑A or ‑E… which would also make the rhymes in Esperanto poetry more interesting!
Esperanto follows regular morphological rules for both derivation (building new vocabulary items) and inflection (fitting words for their role within a sentence). Zamenhof put a lot of work into creating a range of widely applicable derivational affixes, such as ‑ig = “render” (or “cause, arrange to have done”) and its intransitive partner ‑iĝ = “become” – as in blankIGi = “whiten (something)”, blankIĜi = “whiten (go pale)”. Nonetheless, his original ideas required various amendments before they were usable, and they still look rotten to me.
These affixes are often stretched in unpredictable ways. The suffix ‑aĵ is used to form more‐or‐less “concrete” derived nouns ranging from bovaĵo = “(some) beef” to majstraĵo = “a masterpiece”; then there's ‑uj meaning “(bulk) container” as in cigaredujo = “cigarette box”, which Zamenhof also applied in pomujo = “apple tree” (not “apple barrel”) and Belgujo = “Belgium” (not “Belgian ghetto”). Modern Esperantists have mostly given up on these uses and say pomarbo, Belgio instead. As if to rub it in, Zamenhof even came up with an explicitly meaningless suffix ‑um for use when all other inspiration failed: nazo = “nose”, nazumo = “pince‐nez”. Meanwhile, having both freely attachable prefixes and suffixes inevitably leads to derivational ambiguities such as fireĝido: is that fi‐reĝido = “corrupt prince” or fireĝ‐ido = “offspring of a tyrant”?
Who needs all these affixes? Isn't the two‐word expression “make white” adequate? (Don't tell me we need complex affixing rules to produce indefinably subtle poetic shades of meaning; Literary Chinese had no such rules, but is renowned for its nuanced poetry.) In particular, why do these affixes (like ‑ej in dormejo = “a dormitory, place for sleeping”) need to be treated as a privileged class of special roots? They're all explicitly licensed to appear as independent words (ejo = “a place”), and meanwhile ordinary roots (like loko = “a place”) are equally free to form compounds by the same sorts of regular and productive derivational processes (sidloko = “a seat, place for sitting”), so why treat them as two different kinds of thing? Even dictionaries while itemising the closed lists of “official” prefixes and suffixes admit that there's no way of justifying it.
Esperanto often seems strangely resistant to some of the commonest and most recognisable affixes spread across the globe by the “classical” languages. Compare the prevalence of the abstract noun endings ‑acy, ‐ia, ‐ity, ‐(at)ion with Esperanto's use of ‑eco. Those ‑ion words that Esperanto does condescend to admit have to hide their family resemblance; thus regiono = “region” but nacio = “nation”.
Clockwork morphology can produce some amusing quirks. There are false resemblances (foresta = “absent”, fosilo = “a spade”, grandmama = “big‐breasted”); there are absurdly fussy distinctions (edzigi, edziĝi, edzinigi, edziniĝi, geedzigi (sin), geedziĝi all mean “to marry”); and then there are ambiguities such as kataro = “catarrh” vs. kataro = “a herd of cats” – there are so many of these I've given them their own page.
Strangest of all, though, is the prefix mal‑ (inspired by Russian malo = “little”), which is a meaning‐reverser like Newspeak “un‐”. The only word for “bad” is malbona, “on the left” is maldekstre, “to open” is malfermi, and so on. It's an imaginative vocabulary shortcut, but it's often gratingly artificial, not to mention longwinded (“cheap” is malmultekosta), inconsistent (“to the south” isn't malnorde), and misleading (malodora isn't “malodorous”)!
Esperanto has a special suffix to mark “feminine” (or to be more accurate, female) nouns: ‑in (from German; in the Romance family that's a unisex diminutive). But this has no equivalent “masculine” marker – being male is just taken to be the default!
Turning to the other side of morphology, instead of following the messy fusional inflectional groundplan widespread in Europe, Zamenhof worked to give his invented language the sort of pseudo‐agglutinative model made popular by Volapük.
Most of Zamenhof's explanations of Esperanto grammar were dedicated to setting out its inflectional system, but they assume a readership familiar with the traditional terminology for all the kinds of inflection that happen in classical languages. Indeed, his original pamphlets explicitly addressed a readership familiar with Latin. But didn't those people already have a shared second language?
The big difference between inflection and everything else that has been mentioned so far is that it wouldn't have been hard for Esperanto to do without inflection entirely. Yes, there need to be mechanisms for indicating whether a given argument is the agent, and whether it's plural, and when the narrative is set; but none of these things require mandatory word endings. Consider for instance the English sentence yesterday you hit the three white sheep, which makes everything clear without inflecting anything. The Esperanto version is hieraŭ vi frapIS la tri blankAJN ŝafOJN; but what information would be missing if I was allowed to say hieraŭ vi frapI la tri blankA ŝafO?
Esperanto's morphological system is at least more straightforward than alternatives like the Hebrew/Arabic system of triconsonantal roots, but a lot of the time it's more heavily dependent on inflectional endings than the norm even among modern European languages. Zamenhof could have combined Romance‐style order‐based case‐marking with Germanic‐style periphrastic tense constructions; instead each time he picked an inflecting approach, which just so happens to be the standard followed in the Slavic family.
The down side of having a limited and highly regimented set of inflections to mark common grammatical features is that they can make the language sound oddly monotonous. In Latin, for every adjective that's inflected as a third‐declension feminine accusative plural you're also liable to get one that's neuter or vocative or irregular or something and thus has a quite different form; but Esperanto can easily end up full of repeats of the same distinctive word endings.
Perhaps Esperanto's strangest inflectional feature is its forms like iri RomEN = “to go Romewards”. That may look like a proper noun in some sort of obscure lative case, but Esperanto doesn't inflect nouns like that – honest! No, no, it's, um, a case‐marked adverb.
Esperanto noun phrases can be identified by the fact that they must contain at least one noun (AKA “substantive”), which must end in ‑O and inflect both for number and for case. I mean, unless it's a pronoun, numeral, infinitive, or any of the other things that get to be exceptions; but the rules do apply to personal names, which get turned into Esperanto nouns as in Jakobo amas Julion = “James loves Julia”. However, enough Julias and Karlas and Marias complained that it has become common for female names to cheat and use ‑A instead of ‑O.
It can be hard to explain to people who aren't used to them what case‐ or number‐inflections are for. The former is extremely tricky; but even the latter is hardly obvious if you're not used to it. Why are zero secondS and one point zero secondS plural? Indeed, what's the point of pluralising two secondS? Why do we need to pluralise nutS, oatS, and vegetableS when rice, wheat, and fruit is singular? Esperanto could have eliminated the need for answers to these questions by emulating Japanese, which essentially does without plurals (one ninja, two ninja…), or Tagalog, which marks number only if it seems relevant (using a separate regular plural‐marker word).
The same goes for case; if you're designing a universal language, obligatory inflections are always a bad idea. In a sentence like venigu viaN monoN = “bring your money”, the verb is marked as a transitive imperative, so the inanimate noun that's sitting where direct objects usually go can't be anything else; but it's absolutely compulsory for both the noun and its accompanying possessive to carry the redundant ending. And yet the sky doesn't fall for sentences where the object is incapable of carrying case endings: venigu iom da mono = “bring some money”. Compare la reĝo estos maljunulo = “the king will be an old man”, distinguished from “an old man will be the king” by word order alone. Case‐marking isn't needed, so why make it mandatory?
Languages disagree not only on the most natural way to indicate which of a sentence's components is the subject (Russian gives nouns fusional case endings, Japanese has particles after noun phrases, Swahili uses prefixes on verbs, and Mandarin relies on word order), but even on how to define this traditional notion of “subject”.
These inflections combine to give phrases like ĉiujn tiujn ĵaŭdojn = “all those Thursdays”. That use of ‑J as a regular plural might be familiar to the Italians (one percent of the world's population) who sometimes use ‑I, or even the Slavs (five percent) who use ‑I or ‑И; but compare ‑S, known to practically everybody who has any hope of recognising Esperanto's vocabulary sources! Meanwhile, case‐marking ‑N might be familiar to German‐speakers (two percent), though even in German it's a plural ending too. No; in fact the people who are meant to find ‑ojn endings natural are the speakers of Ancient Greek (zero percent)… except that they never used ‑oin to mark the accusative plural.
The Esperanto ‐N suffix appears not only on direct objects of verbs but on various other sentence constituents – hence lundoN rajdu ĉevaloN nordeN unu mejloN en LondonoN = “on Monday, ride a horse northward one mile into London”. And yet even though it occurs on both adverbs and temporal expressions, it never appears on temporal adverbs such as hodiaŭ = “today”.
Zamenhof's coverage of pronouns in the sixteen rules of the Fundamento actually left out half of them, and failed to recognise the related category of “determiners” (which includes various things he thought were pronouns, “correlatives”, articles, or adjectives). Mind you, the existence of determiners is one of those facts about how human languages work that wasn't recognised until after he was dead, so it's not his fault the language he designed was so bad.
Not all languages distinguish between “a/some fish” and “the fish”, and even within Europe the ones that do it with articles use them in subtly different ways. For instance, the Esperanto article la occurs in dek minutoj post LA unua = “ten past one”; LA Dio benu vin = “God bless you”; and LA birdmigrado estas mirinda = “bird migration is remarkable” (all out of “Teach Yourself Esperanto”). Far from clarifying the situation, Zamenhof declared this to be something that people should consider giving up on.
I'm not saying Zamenhof should have devoted a separate word class to determiners. He already had too many of those; even if he really needed to have a definite article, it didn't need to be in a special irregular category of its own! Come to that, there are languages (like Japanese) that don't treat pronouns as a special case – syntactically and morphologically, they're simply nouns.
Instead of trying to find any sort of standard, Zamenhof copied his whole system of “personal pronouns” from English, though most of the individual words were given Romance disguises. No mechanism is provided for translating “I (humble masculine)”, “we (exclusive dual)”, or even plain “you (plural)”; but if you're accustomed to a compulsory distinction in the third‐person singular (only) between male, female, and nonhuman referents, you're in luck! That might look appealingly regular if your native tongue is a typical Indo‐European one where “she” can just mean something arbitrarily feminine‐gender, but almost everyone else is used to having a single third‐person pronoun that can apply to anybody.
The possessive forms are a total mess. The pronouns that end in I each get their own special pseudo‐adjectival form (ĝia = “its”, ilia = “their”). However, the “correlative” pronouns (such as iu = “someone”) only get a quasi‐genitive form (ies = “someone's”), which isn't shared with the plural and nonhuman forms – instead those are apparently expected to make do with plain prepositional phrases (de iuj = “some people's”, de io = “something's”).
These word‐forms may not display much regularity, in the sense of behaving like normal nouns, but they do score highly for uniformity, in the sense of “did you say li estas, ni estos, or mi estus?”
Esperanto adjectives end in a superficially latinate ‑A, then add inflections to agree with the noun they modify. If there's any logic behind this, wouldn't it imply you need to put similar markers on the definite article la? That's how things work in the natural languages Zamenhof copied the word from: if there's one place in a noun phrase where inflections belong, it's on articles.
Zamenhof's things‐ending‐in‐A category includes “third”, but not “three”; “many” and “every kind of”, but not “every”; “their” and “one's”, but not “whose”… part of the problem is that many of the words that he classed as adjectives (and many he didn't) are technically determiners, and follow subtly different grammatical rules. You can say la nova domo = “the new house”, but not (as in Italian) la mia domo = “the my house”.
Above all, why oh why did Zamenhof give his “simple” international language universal obligatory case‐and‐number concord? The Esperanto for “the houses are new” is la domoJ estas novaJ – which is on the fussy end of the scale even by European standards. Compare French les maisonS sont nouvelleS, where the “plural endings” are silent; German die HäusER sind neu, where the predicate shows no concord; or Russian domA novY, which has a special short form. Even Volapük didn't get it this wrong – domS binom nulik!
English may depend on an adjective to say “a new house”, but many languages go about things differently. Some, like Japanese or Korean, prefer to express the same idea using stative verbs (“being‐new house”); others, like Quechua, use appositional nominals (“new‐thing house”). Doing without the lexical category of adjectives can eliminate the need for a whole bunch of grammatical rules.
Thanks to the root‐classes fiasco, talking about abstractions like “awkward‐ness” in Esperanto requires knowledge of the root's underlying class:
The standard European model is to have inflectional comparatives (like “newer”), but just this once Zamenhof instead adopted an idea from as far away as France: the phrasal comparative as in pli nova = “more new”. Mind you, globally speaking the most widespread approach is to say something like “this house is new beyond that one”.
Zamenhof declared numerals an entirely separate part of speech from nouns and adjectives and so on, with no case‐agreement (though bizarrely “one” does have a plural: unuj kontraŭ aliaj = “against each other”). This is another giveaway of his background: in the Slavic languages “one sheep, three sheep, five sheep” all work differently (Polish jedna owca, trzy owce, pięć owiec), and if that's what you're used to, putting them all in the same oddball special category looks sensible – though it turns out there's still a dividing line between tri mil ŝafoj = “three thousand sheep” and tri milionoj da ŝafoj = “three million sheep”.
Tridek duonoj is “thirty halves”; tridek‐duonoj is “thirtyseconds”; and tri dekduonoj is “three twelfths”. They're distinguishable in writing, but we hardly needed an invented language to get an internationally standard way of writing “³ ⁄ ₁₂”!
The “basic” number‐terms tri, trio, tria (= “three,
threesome, third”) are a crowded jumble, making a mockery of the
You might think it's obvious that numbers always appear in a fixed position before whatever they enumerate – there's wide agreement on this among the world's biggest languages, regardless of which way round they put adjective and noun. But a startling proportion of more minor languages (such as Fula, Malagasy, and Thai) have postposed numerals, and if there isn't a rule forbidding that order, dek unu (literally “ten one”) is ambiguous: “eleven” or “a single ten”?
Numbers naturally get strung together in complex combinations – but unlike the big four word‐classes, they can end in plain consonants (and even consonant clusters) with no class‐marker vowel, resulting in tongue‐tangling compounds like sescent‐sepdek‐kvara = “674th”. Compare the personal pronouns, which never need to form compounds and yet always have a root‐final vowel.
Why, other than because of European tradition, do we need a one‐word label for 10³ (“thousand” = mil instead of “ten hundred”) but not for 10⁴ (“myriad”) or 10⁵ (“lakh”); and a label for 10⁶ (“million” = miliono) but not for 10⁷ (“crore”) or 10⁸ (a Japanese “oku”)? If Esperanto was built around the S.I. system of prefixes this might make sense, but there's no sign Zamenhof ever heard of “kilo‐” etc. Indeed, pico is the Esperanto for pizza!
Esperanto has several dozen prepositions – a word class so named because they are positioned before nouns in phrases like in phrases. European languages with large numbers of cases, such as Russian, divide prepositions into subsets depending on whether they are followed by genitive, dative, or whatever, and many of them can alternatively take the accusative case (also used on direct objects) to show “motion towards”. Esperanto borrows this trick, but collapses all the other cases into the nominative (subject) form, with results that can be confusing: in effect it's “of I”, not “of me”.
Esperanto's ‐N ending simply replaces some prepositions, modifies the meanings of others, and never associates with the rest. Zamenhof didn't just mix these prepositional functions confusingly into his case system, he also made them officially ill‐defined!
Esperantists like to pretend that the difference between sub la tablo = “beneath the table” and sub la tablon = “(to) under the table” is some sort of abstract, semantically triggered phenomenon unconnected to the individual prepositions, but that won't wash since a few of them (including al = “to”) indicate “motion towards” without the ‑N. “Motion away” gets no such special treatment; instead Esperanto just creates a compound prepositional phrase, el sub la tablo = “out from under the table”. If we're allowed compounds, why not use them for al sub la tablo?
You might be surprised how few languages have the category “preposition”. Where Yiddish expresses the phrase jump onto a table via a preposition slightly assisted by case‐marking, Vietnamese uses chained verbs (“jump ascend table”); Finnish has highly specific cases (“jump table‐ALLATIVE”); and Punjabi goes for postpositions (“jump table onto”). Even English prepositions disobey the usual European rules by appearing with no following argument: I broke the table I jumped onto.
Prepositions are another kind of word that can suffer from an unavoidable vowel shortage in compounds like postftiza, subskvamoj, transŝprucis = “post‐consumptive, underscales, gushed across”.
Many of Esperanto's preposition‐plus‐verb compounds are element‐by‐element clones of opaque idiomatic expressions:
“Adverbs are formed” (we are told) “by adding e to the root”; but this is only true of adverbs that have adjective equivalents (cf. English “‐ly words”). Plenty of other words that function as verb modifiers (such as plu = “more, further”) are irregular, while the set of words ending in ‑E also includes things like absolute = “absolutely”, which turn up modifying absolutely anything and only get labelled “adverbs” because that's a traditional wastebasket category.
Some of Esperanto's adverbs belong to the esoteric word class of things ending in ‑aŭ. This set includes, for instance, the plain adverb baldaŭ = “soon” (which even has a comparative and a superlative), the less regular mostly‐adverb ankoraŭ = “still”, and the non‐adverb anstataŭ = “instead of”. Other obvious candidates, such as jam = “already”, were arbitrarily left out as irregular bare roots.
Esperanto grammar favours a proliferation of adverbs. “Whistling” in “whistling, I set out” can't be a mere adjective fajfantA describing the subject – no, it's got to be “whistlingly”: fajfantE mi ekiris. Likewise, “it's good that you came” becomes ke vi venis estas bonE; and “last night it was raining” becomes hieraŭ noktE estis pluvantE – literally, “yesterday nightly there‐was rainingly”.
While English (like the Romance tongues) derives “‐ly words” from adjectives by adding a distinctive suffix, many languages survive happily without any such category, instead making do with adjectives and phrasal expressions – and I'm not talking about Classical Nahuatl here; I mean languages like German, where schnell covers both “quick” and “quickly”. Esperanto instead follows the model of Polish and distinguishes adverbs from their adjective equivalents just by their final vowel.
The only justification for the omnipresent agreement marking on adjectives was that it might occasionally make it easier to keep track of which word modifies which if they get reshuffled for some reason. So why do adverbs behave so differently – don't Esperantists want to be free to express themselves by scrambling I ate only a slightly surprisingly cooked sausage into surprisingly I ate an only slightly cooked sausage without affecting the meaning? Agreement markers on adverbs could give them that sort of freedom, but they somehow lose interest in this principle when it doesn't result in Esperanto becoming more like a European language.
The bare root ĉiam = “always” has an adverb counterpart ĉiamE = “perpetually”. But wasn't it already an adverb with essentially that meaning? What's really going on here is that ĉiame only exists as a by‐product of ĉiamA = “perpetual” (compare ĉie, ĉieA, ĉieE = “everywhere, ubiquitous, ubiquitously”). The adverbs that Esperanto flags as a primary lexical category on a par with verbs and nouns are exactly the ones least deserving of that status: the ones that are minor variants of the adjectives.
Unlike adjectives, adverbs, and nouns, which layer their inflectional endings “on top of” the word‐class marker vowel (as in dom‐O‑J‑N), in the case of verbs the different endings each replace the final ‑I of the infinitive and take over its class‐marking function in addition to their own (in a surreptitiously fusional manner). Leaving aside participles for now, Esperanto verbs have five alternative endings: the imperative/jussive ‑u, the tense marks ‑is/‐as/‐os, and the conditional ‑us.
If you're wondering why Esperanto needs a special conditional inflection, appearing (unlike the Romance equivalent) in both “if” and “then” clauses of a condition, and masquerading as an extra basic tense… well, it might have something to do with the fact that's what Polish has. The conditional can be confusing for learners at the best of times, but just to make things especially bad for those reading the English version of the Fundamento, they're told it's the “subjunctive mood”. No, that's the verb‐form in “long live the king!”, covered in Esperanto by vivu!
It should be apparent to anglophones that special verb endings for infinitives, conditionals, and future tenses are a redundant complication. Likewise the imperative inflection: fancy polite forms are all very well, but for obvious reasons most languages arrange it so commands can be given via the most basic verbal “stem” available! What may be less obvious is that English is itself over‐complex in some ways, with its vestigial subject‐agreement and its obligatory tense distinctions even where the context makes them nonsensical. None of this is necessary; mandatory tense inflections for example can be replaced with auxiliary verbs (“will”), adverbs (“soon”), or if you insist, optional inflections.
One feature displayed by verbs in almost all human languages, though sidelined in Latin‐based grammatical folklore, is aspect, the distinction (e.g.) between I forgot and I have forgotten. The mechanisms Esperanto provides for marking aspect are a random collection of unreliable makeshifts, such as Slavic‐style uses of derivational prefixes to give near‐synonyms with added aspectual overtones.
I always thought the forms that Zamenhof picked for his tense inflections (‑os for future?) were bafflingly unmotivated; finally it turns out that he was following an established tradition. Various earlier and more obscure constructed languages had adopted similar schemes; Pantos‐Dîmou‐Glossa even picked the same three arbitrary vowels to indicate past, present, and future.
Verb valency features – transitivity/intransitivity, passivisation and so on – are another of those fields where Esperantists congratulate themseves about how logical and regular the language is, but I'm not so sure.
Participles such as viv‐inta/ ‐anta/ ‐onta = “having lived/living/about to live” merge tense, aspect, and voice alternations into a single (fusional) unit. Even when given a final ‑A or ‑E they're still verblike enough to have objects: fajfante melodion mi ekiris = “whistling a tune, I set out”.
In the active voice, you can use either plain tensed verbs ending in ‑is/‐as/‐os or compounds using the participles, which have extra aspectual implications; but in the passive, compounds are the only option, so it's unclear whether they're meant to carry the same connotations. This design flaw eventually led to a schism within the ranks of the Esperantist movement: should “smoking is forbidden” be la fumado estas malpermesITA or la fumado estas malpermesATA? The trouble with using ‑ita is that ‑is describes a past event, which seems to imply that the prohibition has ended (“smoking used to be forbidden”). The trouble with ‑ata is that ‑anta describes an ongoing process, which seems to imply that the prohibition is only just coming into force (“smoking is being forbidden”). After decades of squabbling, the “itismo” faction succeeded in having their interpretation declared orthodox, but the truth is, both sides were right: Zamenhof's scheme makes no sense.
The tendency of European tongues to form passives by way of elaborate participial circumlocutions is an accidental side‐effect of the way their actual passive‐voice verb endings have eroded away; there are much more streamlined ways of doing it. Look at Mandarin: wŏ mà tā means “I scold him/her”, and just as we insert extra verbs to express “can scold” or “will scold”, they have one for “undergo”: wŏ bèi mà = “I am scolded”.
Forms like vivanta are designed to superficially resemble those used in compound tenses in the modern Romance languages, but none of those languages use constructions like vi estas vivontaj = “you (pl.) are about‐to‐live”. Guess what language builds a “future tense” out of what is etymologically an (imperfective) present form of “to be” plus a participle (agreeing with the subject)? Yes, Polish: (wy) będziecie żyli = “you (masc. pl.) will live”.
This is another context where inflectional regularity can mean repetitive sentences: “you were going to have lived” is vi estis estonta vivinta, or vi estis estontaj vivintaj if you're plural. Some Esperantists say that participles, being adjectives, can be freely converted into verbs, so it would be better to compress this into vi estontis vivinta(j) or logically even vi vivintontis. Fortunately, nobody does.
The fact that esperanto means “someone hoping” is itself a glitch. The standard pattern is clear enough: if bonA means “good” and you want to say “someone good”, you add a suffix: bonULo. But this breaks down for participles: brulantA means “burning”, but “someone burning” is just brulantO, which should by rights mean something like “current burningness” or “an ongoing conflagration”.
Conjunctions are an entirely traditional lexical category, but their intricacies are mostly a matter of syntax rather than morphology, so Zamenhof's inflection‐obsessed grammar guide implicitly denies the category even exists, let alone distinguishing the different types.
Letting a noun function unmodified as a verb is unthinkable in Esperanto; and yet outside the system of class‐marker vowels this sort of unsignposted category‐swapping happens all the time. For instance, some temporal prepositions moonlight as subordinating conjunctions: dum = “during/while”, ĝis = “up to/until”. Zamenhof didn't include any equivalent for “since”, leaving that to be covered by de kiam = “from when”; but as de is already badly ambiguous, modern Esperantists mostly seem to avoid that in favour of ekde kiam.
Correlative conjunctions are ones that operate in pairs, sometimes as the same word repeated, like ĉu… ĉu… = “whether… or…”, and unpredictably sometimes not, like tiel… kiel… = “as… as…” (more literally “that‐much… how‐much…”). You might expect “the bigger, the better” would involve some variant of that last, but instead it uses a pair of super‐specialised conjunctions stolen from German: ju pli granda, des pli bona.
It's actually not all that rare for a language to lack some or all types of conjunctions… but it doesn't tend to make their grammars any simpler! Still, one frequently used trick deserves a mention: at least half of the world's non‐Indo‐European languages treat nouns conjoined in lists as if the “and” was a preposition (or equivalent) meaning “along‐with”, entirely separate from clausal “and‐then”.
A lot of Esperanto's conjunctions come from Latin (nek, sed, tamen = “neither, but, however”), but the single commonest one is an especially eccentric choice. “And” happens to be i (or something similar) not only in most of the Romance languages but also the Slavic family; and yet instead Zamenhof went all the way to Ancient Greek for kaj.
The question‐forming word ĉu is a neat idea… though maybe a bit redundant, when interrogative intonation or punctuation will do – you agree? But its form is copied from its inspiration, the Polish czy (or maybe Ukrainian chy), rather than resembling the question words like kiu = “who?” etc.
Some traditionally prestigious languages like Latin and Greek (and others such as Polish) have what's known as “free” word order, which doesn't mean that sentence constituents can be shuffled at random with no effect on the meaning; it means that instead of sentences always being Subject–Verb–Object (or whatever), the ordering is determined partly by information structure: words conveying new or important information get the best seats. Esperanto propagandists sometimes seem to confuse free word order with free speech, but setting word‐order defaults is no more an infringement of your civil liberties than is any other kind of grammatical rule.
The linguistic feature most effective at freeing adjectives to wander the sentence without ambiguity is gender agreement (preferably with a dozen or more different genders); for nouns, it's having the verb heavily inflected for agreement with its arguments. Esperanto lacks any trace of either feature; all it's got is number agreement and a single overworked case distinction, so it's more limited in its options. It often relies on word‐order rules to make sentences intelligible, but most of these rules are undocumented.
Europeans are familiar with the idea of sets of sentences being related via order‐shuffling rules such as question‐inversion: I am reading it → am I reading it?. That's a complication Esperanto doesn't share; it's mi legas ĝin → ĉu mi legas ĝin?… which only makes it more perplexing that it does have WH‐extraction. When the question is mi legas kion? = “I am reading what?”, Esperanto avoids that simple word order just as English does – instead question words like “who/where/why?” move to the start of their clause: kion mi legas? = “what am I reading?”
Some of Esperanto's word‐order conventions are no more than optional defaults; others (although taken for granted in grammars) are unbreakable. “Yesterday you hit the three white sheep” may legally become la tri ŝafojn blankajn vi frapis hieraŭ, but it's never ŝafojn hieraŭ tri frapis la vi blankajn! The following “obvious” order rules demonstrate classically European default assumptions:
Esperanto's pretensions towards flexible word order are intended to allow those accustomed to (for instance) Subject–Object–Verb sentences to keep that familiar order and say mi ĝin legas = (literally) “I it am‐reading” (cf. French je le lis). But if reshuffles might happen for no better reason than that, we can't rely on them for the purpose they serve in most natural languages: to allow expressive shifts of emphasis.
Not only does Esperanto's so‐called free word order fail to allow for exotic possibilities like postposed articles, in some cases it forbids reorderings that are commonplace in English, such as a pie of which I ate only half → a pie I only ate half of.
Zamenhof's efforts to explain Esperanto grammar focussed on its morphology and neglected its syntax, so it's no surprise that Esperanto's phrase structure rules and so on usually turn out to be like the ones he grew up with. The syntactic rules of a heavily inflecting language like Russian can afford to be relatively lightweight, but Esperanto needs something slightly better thought out.
Zamenhof didn't entirely ignore the topic: one whole sentence out of his sixteen rules deals with syntax, and in particular, negative constructions. Instead of explaining anything about how they work, it defines a way they don't work, wrongly! The approach that he was assuming we need to be warned not to use was the system of negative concord that was standard in the Slavic languages but derided by nineteenth‐century schoolteachers because it wasn't used in Latin.
Polish has strong subject agreement on verbs, and usually omits subject pronouns – compare Latin cogito ergo sum = “(I) think therefore (I) am”. In particular it always leaves the indefinite pronoun implicit in sentences like pada = “it's raining (neut. sg.)”. Esperanto uses explicit arguments instead of subject agreement – but it does borrow the idea of impersonal verbs, misinterpreting them as literally subjectless. Then to really complicate matters Esperanto allows ellipsis of repeated subjects. So does la fiŝoj estas bongustaj sed pluvas mean “the fish are tasty but it's raining” or “the fish are tasty but are falling as rain”?
The way reported speech works in Esperanto is another obtrusively parochial feature. If on Tuesday I tell you morgaŭ mi iros tien = “tomorrow I will go there”, on Wednesday you might report that as vi diris, ke hodiaŭ vi venos ĉi tien = “you said that today you would (literally: will) come here”. Everything is rendered from your point of view, with the exception of tense inflections, which are always preserved as direct quotes. The rule could have been to make everything consistently direct (“…that tomorrow I will go there”) or consistently indirect (“…that today you were going to come here”), but Esperanto insists on mixing the two, because that's what the Slavic languages do.
Relative clauses in Esperanto, such as la homo, kiu gajnis = “the person who won”, use pronouns out of the interrogative column on the so‐called correlatives table. Muddling these two functions is a trademark misfeature of many European languages that causes unnecessary confusion in sentences like “I asked the person who won”. Approaches without this ambiguity can be a lot less trouble all round; for instance Esperanto could just imitate the languages that allow la homo, ke tiu gajnis = (literally) “the person such‐that that‐one won”. This makes possible sentences like ĉiuj, ke mi estas pli juna ol tiuj = “everyone I'm younger than”, which is otherwise unrelativisable in Esperanto (you can't say “everyone than whom I am younger” because ol = “than” is a conjunction, not a preposition). It could even distinguish between “restrictive” and “descriptive” relative clauses the way English does, although that might require it to abandon its use of strict Central/Eastern European comma placement rules.
Esperanto reflexives use a pronoun si that covers all of “it/him/her/one/themselv(es)”, but not “my/our/yourselv(es)”; it's ili vidis SIN = “they saw themselves” but vi vidis VIN = “you saw yourselves”. If you're guessing this is another Slavicism, no: for once it's a case where early Romance‐ and Germanic‐speaking learners managed to impose their native habits on Esperanto before its syntax was completely established. Zamenhof himself tended to forget this rule!