Esperanto's phonemic inventory looks perfectly normal, as long as what you're comparing it with is the Central/Eastern European standard model; it's only when measured against the global average that it looks frankly bizarre. Of course, Zamenhof wasn't in a position to perform such a comparison – it wasn't until 1984 that (for instance) the UCLA Phonetic Segment Inventory Database was compiled (UPSID for short), sampling hundreds of representative members of different language families around the world.
Since we are in a position to do that, let's have a look! We'll need to start by massaging the data slightly – we don't want to be told, for instance, that /m/ is nearly four times as common as /d/ just because the database has split the vote for “alveolar” and “dental/alveolar” subvarieties of /d/ (which are barely distinguishable). But with that taken care of, let's see what sort of basic phonemic inventory you get if you collect up all the sounds that occur in at least half of the languages in the survey. It ends up like this:
Yes, /ŋ/ as in haNGiNG is considerably commoner in worldwide terms than any of the sounds that Esperanto spells C, Ĝ, Ĥ, Ĵ, P, Ŝ, V, Z! And as a matter of fact if I wasn't letting /i/ claim the votes for its minor variants then it would also fall off the list. That's a sign that this inventory might be a bit too minimalist (copying the basic Latin five‐vowel system was one of Zamenhof's few good decisions!), so let's try relaxing the criteria and letting in the sounds that are shared by at least a third of the surveyed languages:
If you evened out that one gap in the middle, either by adding a /dʒ/ (as in JuDGe) or by subtracting the /tʃ/ (as in CHurCH), you'd have something quite close to a workable, regular, “average” phonemic inventory which happens to be about the same size as the Roman alphabet. But even after growing to include the glottal stop /ʔ/ (as in “uh‐oh!”) and palatal nasal /ɲ/ (Spanish Ñ), the consonant inventory above still lacks most of those strange, exotic sounds like V.
Of course, when linguists bring real‐world evidence to debates like this the Esperantists start whining – “But those languages in UPSID are mostly ones I've never heard of! When you take into account the fact that people are more likely to speak European languages than ones out of the depths of the rain forest, Esperanto's phonemic inventory looks fine, probably!” Now, this argument hardly seems consistent with their usual attitude to the prospect of anglophones getting to impose their preferences on the small fry, but more importantly, I've checked, and the answer is no. When you survey the world's top two dozen or so languages, weighting the vote by the number of native speakers they have, Esperanto is still obviously parochial, because then the biggest voting bloc isn't Poland and its neighbours, it's Southern and Eastern Asia. Speakers of major languages are:
- more likely to have a phonemically aspirated /tʃʰ/ (CH as in aTCHoo!) than to have either /dʒ/ or /ts/ (Esperanto Ĝ, C), and in general, more likely to have a systematic phonemic distinction between a whole column of plain voiceless plosives and one of aspirated plosives (/p t k/ vs. /pʰ tʰ kʰ/) than to have a plain /h/;
- more likely to have retroflex plosives (the characteristically Indian‐sounding /ɖ ʈ/, pronounced with the tongue curled back) than to have /ʒ/ (i.e. Ĵ);
- more likely to distinguish between /n/ and /ŋ/ than between /v/ and /w/, and in general, more likely to have four different nasal phonemes than to have only one or two (Esperanto M, N).
Meanwhile there's a fairly even three‐way split over handling of suprasegmental features. One third of the votes go towards having phonemic stress distinctions (ABstract vs. abSTRACT), and one third towards having phonemic tones (as in Mandarin mā/má/mă/mà = “mother/hemp/horse/scold”); but there's no way of taking an average, and the only sane approach is to side with the one third of votes that go for option three: have neither.
However, coming up with a segmental inventory the same size as Esperanto's from such a survey is straightforward enough; gathering up the top scorers gives:
It's really not much more like Zamenhof's. Furthermore, while the UPSID approach is designed to provide information about human language in the abstract, this one is subject to passing fads. The survey results change from decade to decade along with the world's demographics – and the trend is currently away from Eastern Europe. (I should also mention that polling only the linguistic heavyweights disenfranchises Africa, with its numerous medium‐sized languages.)
That sixth vowel /ə/ (which occurs twice in AgendA) just beats /tʃʰ/ to make it onto the chart, spoiling what would otherwise have been another nice plausible five‐vowel system. The problem is that this isn't something that has been designed as an intuitive, coherent phonology; a setup with three plosive columns, /b/ vs. /p/ vs. /pʰ/, just happens to be what you get when you average out the two‐column grids common in Europe (/b p/) and China (/p pʰ/) with the four‐column layout popular in India (/b bʰ p pʰ/). Most learners would probably find a setup with only a two‐way distinction easier to handle; and by the same token, a global auxiliary language probably shouldn't have a voiced fricative column just for /z/, either.