A Phonemic Substitution Cipher

Justin B Rye 01 Sep 12


So you want an imaginary High Elven Tongue as supplementary colour for your tabletop fantasy roleplaying campaign?  Or maybe you're doing the rough draft of your epic space opera trilogy and need placeholder names for the alien planets?  Or perhaps you're only after some token mock‐foreign verbiage to go in the background of your cartoons?  Well, if you were J. R. R. Tolkien it's obvious what you'd do, but there's no need to go to the lengths of inventing an entire functional constructed language and using that; all you need to do is take some existing real‐world language and apply a simple set of rules to turn it into an unrecognisable cryptolect!


People all over the world have been doing informal versions of this for centuries, usually so that they can communicate safe from eavesdropping while swindling the uninitiated.  This is what's known as a “cryptolect”, a phenomenon that anglophones are most likely to be familiar with via the examples of Cockney rhyming slang (“Flying Squad” → “Sweeney (Todd)”) and Pig Latin (“Flying Squad” → “eyeingflay oddsquay”).

Neither of those is quite what we're after, though.  When you convert a whole paragraph of text into something like rhyming slang, it's still quite patently in English – in effect it just randomises the dictionary slightly.  Pig Latin has a lot of relatives in other languages, and a few playground variants in English such as Eggy‐Peggy (“flegguyegging squeggod”), but all of these language games tend to leave a text obtrusively garbled on a more fundamental level, and would have great difficulty passing for a real foreign language!

Another model that anglophones might not consider, but which really has been used in various languages in a deliberate effort to make a “thieves' cant” more impenetrable to outsiders, is “Backslang”.  This can work in any of several ways, including:

That's still not much closer to being something you could have your imaginary high elves speaking with a straight face; and the thing that makes it conspicuously backwards‐looking is mainly the unnatural strings of consonants in reversed phrases like “xof nworb kciuq eht”.  Let's take that as a hint: this sort of game is made much easier if we use as a starting point some language without the complex consonant clusters and irregular orthography of English.  Of course, it needs to be one that ordinary non‐language‐specialists are familiar with at least to the level where they can handle passing it through Google Translate, but luckily enough there's an obvious candidate: Spanish!

Spanish has the great advantage that it only allows a rather limited variety of “syllable‐shapes”, and its sounds form well‐behaved families; “aspe”, “istu”, and “osca” are all possible words, but “apse”, “iftu”, and “osc” are against the rules.  This makes it easy to systematically mess about with these sounds without any worries that the result will be unpronounceable.

However, the approach I'm going to follow isn't based on Backslang, or any of the other approaches I've mentioned so far.  In fact, if it resembles anything, it's the ROT13 encoding traditionally used as a trivial cipher on USENET.  In ROT13, the alphabet is “rotated” by thirteen places, replacing “A” by “N” and vice versa, and likewise for “B” ↔ “O”, “C” ↔ “P”, “D” ↔ “Q”, and so on; as a result “SPOILERS” turns into “FCBVYREF” (and “JBR” turns into “WOE”).  The trick I'm going to propose is a bit more selective – basically, you find matched pairs of sounds and switch them around wherever they occur.  The simplest demonstration is to reverse “E” and “O” and get the cryptolect of “Ospañel” – it's not much of a disguise, but it would be no trouble to learn to talk this sort of nearly‐Spanish and then strengthen the obfuscation step by step until you get something completely unrecognisable.

If you already store your private life on Facebook then I suppose writing your diary in this sort of “personal secret language” might in fact be a step up, but seriously: as cryptosystems go, this is pure distilled snake‐oil (with an algorithm that's easily deducible and trivially reversible), and shouldn't be trusted any further than you would trust ROT13.  It's all very well relying on “security through obscurity” when the people you're keeping secrets from are happy to play along, but if you use it to encrypt your terrorist plot, don't blame me when you end up in an orange boilersuit.


The first step is to get the sentence in Spanish.  If you're in the fortunate position of being able to do this for yourself then things are easier – and nobody's saying it needs to be good Spanish!  A tourist‐grade rendering will be perfectly adequate for most purposes.

Hints and tips for users of online translation services:

Beware misspellings
It may spot obvious typos, but if you type in “witch won” then that's what it'll translate.
Beware ambiguity
When you say “draw her back”, do you mean “depict the rear of her” or “tug her away again”?  Try to phrase your input so it only has one interpretation.
Beware idioms
Make sure that the one meaning it has is literal – you can't expect the Spanish for “eat like a horse” to involve horses!
Beware indigestibles
Any word that passes through unchanged is a bad sign: the translator may have given up on it, assuming it's a brand name or something.
Expect foreignness
By which I mean: don't panic if the output has scrambled word order, odd verb forms, or a different supply of “small words”.
Divide and conquer
Start by asking for the translations of the individual phrases, so you can build up an idea of which bit means what.
Use full sentences
…But always make sure you finish by asking for a full sentence, even if you won't be using it; “tall and thin” translates differently if it occurs after “my aunts are…”
Wiggle it
Try asking for slightly rephrased, reworded versions and see what difference it makes.
Google it
Do a search for the output phrase and see whether the words commonly occur together like that.  Alternatively, if you want to know at a glance what kind of “mole” a “topo” is, try an image search.
Wiki it
If you're looking for some text in Spanish using a given word in context, try just going to its Wikipedia page and clicking “Languages: Español”.

Once you've got the Spanish, if you're feeling lazy you can take a short cut: Spanish spelling is almost regular enough for a substitution cipher to work directly on the written version and still produce something mostly speakable.  So if you find my schemes below too complicated you might prefer to fall back on this cheap‐and‐cheerful approach, despite its rough edges.  Just replace the letters “BCEFILMV” with the letters “DPOJURÑZ” (respectively) and vice versa, turning “Español” into “Oscamer”, “la Brigada Móvil” into “ra Dlugaba Ñézur”, and “chiquita” into “phuqiuta”.

Alternatively, carry on to the next stage:


Before you can shuffle the sounds around you have to identify them, and that means you need to convert from the traditional orthography into something completely reliable.  For Spanish this is so easy I'm tempted to provide some JavaScript to automate the process; it's mostly just a matter of recognising the cases where the spellings still encode distinctions that have vanished in most modern spoken forms of the language (I'll be assuming a vaguely New World variety).  As usual I'll be putting standard spellings in angles, imaginary ones in double angles, and phonemic transcriptions in slashes(though unfortunately your web browser's ignoring my CSS).  And I'm going to be throwing around all sorts of Unicode phonetic symbols without much in the way of explanation, since as long as they work as unique labels it doesn't really matter whether you know how they're pronounced (and it doesn't really matter that the phoneme I'm calling b might equally well be called β, and so on).

CONSONANTS: the following conversion rules need to be applied in approximately the given order, at least in the cases where I say “otherwise”:

  1. B and V both → b
  2. C before E or I (or É or Í) → s (ignoring the parts of Spain where it's θ)
  3. CH (and SH in a few loanwords) →
  4. Otherwise Ck
  5. G before E or I (or É or Í) → x
  6. GU before E or I (or É or Í) → g
  7. gu
  8. In a very few loanwords such as hámster, Hx
  9. HI before a vowel → j
  10. Otherwise H is silent (and converts to nothing)
  11. Jx
  12. LLj (ignoring the dialects where it's ʎ)
  13. Syllable‐initial M (that is, before a vowel) → m
  14. Otherwise Mn
  15. Ñɲ
  16. QUk
  17. RRr
  18. Syllable‐initial R (that is, word‐initial or after l, n, or s) → r
  19. Otherwise Rɾ (not the same!)
  20. In a few loanwords such as Xibalbá, X (ignoring the dialects where it's ʃ)
  21. In a very few others such as México, Xx
  22. Otherwise, between vowels, Xks
  23. Otherwise Xs
  24. Word‐final Yi
  25. Otherwise Yj
  26. Zs (ignoring the parts of Spain where it's θ)

VOWELS: these can have accents to indicate non‐default stress.  The simplest approach is to cheat and treat ÁÉÍÓÚ as áéíóú, though if you want extra credit you can add accents even on the default cases and then mangle them in later stages.

Everything not covered in the above rules corresponds directly to its IPA equivalent: ABDEFGILNOPSTUabdefgilnopstu.


There are just eight “pairs” to be switched around (bearing in mind that when I say e I also mean é, and so on):


Everything else (that is, a, g, k, n, , and s) can stay as it is.

The point of the ROT13 analogy is that you can decrypt a message using exactly the same Secret Decoder Ring algorithm as you used to encrypt it.  Since it's working in terms of spoken Spanish, you won't necessarily be able to retrieve the precise string of letters you started with, but at least it'll sound the same.

Once you get the basic idea of this you may wish to try coming up with an alternative scheme of your own, but beware – you can't just start altering sounds at random (or at least, not if you want the results to resemble a natural language).  For instance, you might think that you could just add “s ↔ n”; but then you'd end up with the word obstrucción turning into ednpliknués, which tends to spoil the effect.

Another approach you could take to obscuring the language's phonological silhouette a little at this point is to try replacing some of its sounds with others that don't occur in Spanish at all – for instance:

If these short descriptions aren't enough to make it clear what sounds I'm talking about then you're probably better off sticking to the shallow end; faking up a plausibly naturalistic phonemic inventory for an imaginary language can be harder to achieve than you'd expect.  For instance, you might be surprised to learn that languages with no p or z are common, while one lacking n or s would be distinctly fishy.

Mind you, if you're keen and experienced enough you might even want to do something more interesting with those acute accents, such as cycling all the stresses one syllable to the right.  That's more than I'm bothering to do, though.


You could just convert straight back into conventional Spanish orthography at this stage, but assuming you want it difficult to recognise I would instead recommend using a spelling scheme where:

Introducing ambiguities here (such as allowing SH to mean either ʂ or ) would be unaesthetic, since it would mean you could no longer rely on being able to “rotate” the output back into (spoken) Spanish and have it guaranteeably intelligible.  Nonetheless, if you're careful it is possible to build in some redundant decorative curlicues – maybe k is spelled either C or K depending on the following vowel, or maybe syllable‐final s is written as Z.  The details don't matter as long as it's systematic and reversible.

You needn't respect Spanish tradition in details of punctuation, capitalisation, or word division/hyphenation, either; “We won't give it to the soldiers!” is ¡No vamos a darle a los soldados! in Spanish, but it might equally well have been  No vamos a dar‑le a‑los soldados!  or even  No‐vamosadar le alos Soldados ! , so you should feel free to camouflage it as  Ne‐daňesabal ro ares Serbabes! 


First, a collection of handy phrases for any tourists planning a visit to Pig Latin America (with gradually increasing quantities of obfuscatory postprocessing):

El Pueblo de Nuestra Señora la Reina de los Ángeles del Río de Porciúncula Or Tiodre bo Niospla Somela ra Youna bo res Ámforos bor Yíe bo Telsuínquira
Esto es un cifrado por sustitución fonémica Ospe os in sukhlabe tel sispupisuén khenónhuka
Buenos días, me llamo Alejandro Martinez y trabajo como lavaplatos en la Isla Grande de Tierra del Fuego Diones bûas, nho rhanhe Arofamble Nhalpunos u pladafe kenhe radatrapes onra Usra Glambo bo Puoja bor Khioge
¡Mi aerodeslizador está lleno de anguilas! Nhu Aole‐Bosrusabel ospâ vone bo Anguras!
¡Mi postillón ha sido alcanzado por un rayo! Nhu Tespuvên a sube arkansabe telin Jave!
El ingenioso hidalgo Don Quijote de la Mancha (de Miguel de Cervantes) Or unšonuese Ubarge Ben‐Kušepo bora Ňanqa (bo Ňugor‐bo‐Soldampos)
Dale a tu cuerpo alegria Macarena, que tu cuerpo es pa' darle alegria y cosa buena Ba ro a pi Kiolte Aroglua Ňakalona, ko pi Kiolte os ta bal ro Aroglua u Kesa diona
El original no es fiel a la traducción Or Elušunar ne‐os huor ara Plabiksuên
Todos los seres humanos nacen libres e iguales en dignidad y derechos y, dotados como están de razón y conciencia, deben comportarse fraternalmente los unos con los otros.  Pebez rez Soloz ińanez nason rudloz o ugiaroz on Bugnubab u Boloʾez, u, bepabez keńe ozpąn bo Jasęn u Tensuonsua, bodon kentelpal so hlapolnar‐ńompo rez Inez ken rez Eplez. 
Traducido del inglés al español  Плабысубӭ бор Ӱнгрос ар Ӧстамэр 

Next, cutting the scrambling back down to a more moderate level, a traditional demonstration text:

Génesis 10:1–9 (Nueva Versión Internacional): Šônosus 10:1–9 (Nioda Dolsuên Umpolnasuenar):
En ese entonces se hablaba un solo idioma en toda la tierra. On oso Ompensos so‐adrada in sere Ubueňa on peba ra Puoja.
Al emigrar al oriente, la gente encontró una llanura en la región de Sinar, y allí se asentaron. Ar Oňuglal ar Eluompo, ra Šompo onkemplê ina Wanila onra Jošuên bo Sinar, u awû so‐asompalen.
Un día se dijeron unos a otros: «Vamos a hacer ladrillos, y a cocerlos al fuego.»  Fue así como usaron ladrillos en vez de piedras, y asfalto en vez de mezcla. In Bûa so‐bušolen Ines‐a‐Eples „Daňes a asol Rabluwes, u a kesol res ar Hioge.“ Hio asû keňe isalen Rabluwes ondosbo Tuoblas, u Asharpe ondosbo Ňoskra.
Luego dijeron: «Construyamos una ciudad con una torre que llegue hasta el cielo.  De ese modo nos haremos famosos y evitaremos ser dispersados por toda la tierra.» Rioge bušolen „Kenspliwaňes ina Suibab ken ina Pejo, ko wogo aspaor Suore. Bo oso Ňebe nes aloňes haňeses u odupaloňes sol bustolsabes ter peba ra Puoja.“
Pero el Señor bajó para observar la ciudad y la torre que los hombres estaban construyendo, Tole or Somel dašê tala edsoldal ra Suibab u ra Pejo, po res Endlos ospadan kenspliwombe,
y se dijo: «Todos forman un solo pueblo y hablan un solo idioma; esto es sólo el comienzo de sus obras, y todo lo que se propongan lo podrán lograr. u so‐buše „Pebes helňan in sere Tiodre u adran in sere Ubueňa; ospe os sêre or Keňuonse bo sis Edlas, u pebe re, ko so‐tleteňgan, re teblân reglal.
Será mejor que bajemos a confundir su idioma, para que ya no se entiendan entre ellos mismos.» Solâ ňowel, ko dawoňes a keňhimbul si Ubueňa, tala‐ko wa ne‐so‐ompuomban omplo owes‐ňusňes.“
De esta manera el Señor los dispersó desde allí por toda la tierra, y por lo tanto dejaron de construir la ciudad. Bo ospa Ňanola or Somel res bustolsê bosbo awû tel peba ra Puoja, u telre Pampe bošalen bo kenspliul ra Suibab.
Por eso a la ciudad se le llamó Babel, porque fue allí donde el Señor confundió el idioma de toda la gente de la tierra, y de donde los dispersó por todo el mundo. Telose ara Suibab so‐ro‐waňê Babel, telko hio awû bembo or Somel keňhimbuê or Ubueňa bo peba ra Šonto bora Puoja, u bo bembo res bustolsê tel pebe or Ňimbe.


The language I tried this with originally was French, which required quite a bit more effort – for instance un chiffre de substitution phonémique became aʾ bádla fa góx‐gyáyó‐gwiʾ derain‐áþa.  French may have the advantage of being the foreign language that non‐linguists here in the UK are most likely to have had some exposure to at school, but its orthography is fairly complicated, its phonology is hard to handle, and of course a good proportion of my US readers would find it unfamiliar anyway.

Another alternative I played around with was Japanese, which has the opposite set of advantages and disadvantages: it's tricky to get from an English phrase typed into an online translator to the Japanese equivalent in a useful transcription, but the limited range of permitted syllable structures makes it a cinch to devise a workable cipher – so onso no kaejishiki angou (maybe?) became ihziri bauto‐dobo ahfin.

As soon as I gave up on those ideas and switched to Spanish, everything got a lot simpler.

Another strategy you might try if you're really in the market for this sort of toy language would be to pick some simple constructed international auxiliary language and start from that.  Esperanto disqualifies itself with its needlessly consonant‐clotted syllables, but there are plenty of easier options if you google around.  However, I'd better warn you in advance – if you take somebody else's personal conlang and use it as a basis for designing a fantasy cant of your own, sooner or later you'll find yourself improvising, and once you're hooked it's a downhill slide into Tolkien's Secret Vice.