EUROPAN

The Decipherment of Non‑Linear B

23 Jun 2016 Justin B Rye


INTRODUCTION

Non‐linguists may not realise that one of the few attributes all human languages seem to share is a characteristic underlying grammatical structure that can be represented using “tree diagrams”.  When you learn Spanish, or even something more exotic like Japanese, you might have trouble with the vowel sounds or the pronouns or the verb endings, but one of the things you don't need to relearn is the idea that sets of adjacent words go together in phrases and phrases go together in clauses and clauses go together in sentences – an overall phrasal structure that organises strings of words into a hierarchy of syntactic nodes.  You know, this sort of thing:

Sentence  
  ⭩ ⭨  
Noun Phrase Verb Phrase
⭩     ⭩ ⭨  
Pronoun    Verb  Noun Phrase
We   love  trees

Different languages then add different amounts of extra complexity on top of this underlying structure, allowing branches of the tree to be lopped off and reshuffled.  For instance, if you say “Grammar lessons I always detested!” then you're throwing the spotlight on the object noun phrase “grammar lessons” by displacing it to a special position at the start of the sentence.  But the structure is still there underneath – for a start, it's what stops you moving the phrase all the way out of the sentence and saying something like “Grammar lessons.  Do you remember your schooldays too?  I always detested!”

But is that tree‐like syntactic hierarchy a necessary feature of languages?  If we ever meet aliens, can we expect this to be something our languages and theirs have in common?

Back in the nineties a language hobbyist named Jeffrey Henning invented a fictional alien language that he called “Fith”.  Instead of a hierarchical tree structure, Fith used a stack‐based grammar (and if you're wondering what that means, well, his original web pages long ago went the way of all good things, but there's still a potted summary here).

Computer science types tend to be unimpressed with stack parsers, since in principle any sentence in such a grammar can always be reanalysed in terms of a tree structure.  I've always thought this was rather missing the point, since the native speakers' mental grammars weren't processing it that way, but okay, let's forget stacks.  It's perfectly possible for a language to have a grammar whose sentences are structurally incompatible with a representation in terms of tree structure, and here's an example to prove it.

Any time‐tourists who've been following along by visiting the settings of my various other chronlangs should note that this one is very much not for casual daytrippers.  See also my pages of SF linguistics and linguistic SF; or if you're bored with my stuff, try this out‐of‐copyright classic.

(Do I need to put another of those “PLEASE NOTE: THIS BIT'S FICTION” disclaimers here?  I'm rather hoping it's obvious that the idea of life on a moon of Jupiter is made up – and that background's just intended as superficially believable quasiscience, not as something I'm interested in defending as likely.  But there's a sense in which the grammar of a made‐up language isn't fictional, any more than Tetris is fictional.  We may only have played it as a simulation on a computer screen, but as long as it has real rules, it's a real game!)


PALAEOGRAPHY

“Europan” is an alien language known from relics found on the moons of Jupiter.  The general assumption is that its vaguely cephalopodlike speakers evolved approximately half a billion years ago on Europa, which still has traces of an extinct biosphere at its Jupiter‐facing pole, but this is conjecture; it is quite conceivable (for instance) that they originated elsewhere and Europa was only a colony.  The impression that their sphere of influence was limited to these satellites may be an artefact of the higher chances of relics being preserved in the outer system.  It is known that there were originally multiple obelisks containing deliberately constructed archives on each of the Galilean moons, but none have been found on Io, and the three magnetic anomalies identified on Europa itself have yet to be excavated.  The available data comes from one partial cache on Ganymede and two pristine obelisks on Callisto, each containing thousands of engraved platinum plaques.

The Europan writing system was purely logographic, with each glyph representing a particular word.  The closest terrestrial equivalent is the Chinese system, but even that takes shortcuts by incorporating phonological hints into most of its characters via a “rebus” principle that incidentally provides modern scholars with clues to the historical pronunciation of Chinese.  In the case of Europan we have no such clues to the level of the language's structure analogous to phonology; more or less all we know is that the physical medium they used for communication was electrical rather than auditory.  Conversations required two individuals to be close enough to one another for their electro­locatory auras to overlap, and for the modulation produced by each separate electrical organ to be distinguished – in effect, “face to face”.

Since each tentacle was associated with one or more electrical organs, a large number of them could be in range at any time, and a similar number of individual words could be transmitted in parallel.  We might have expected it to be natural for their language to treat words expressed via neighbouring organs as syntactically related, but in fact the reverse was true: related words tended to be transmitted on widely separated channels, with associated tags to indicate the relationship.  There may well have been a physiological basis for this – it is possible, for instance, that the mechanism used to tag two words as related may have required the secretion of the same trigger chemical in both electrocytes, and that depleting the secretory capacity of one would also reduce that of its neighbours while leaving more distant organs unaffected.

The physical circumstances of Europan communication are also likely to be the explanation behind the fact that sentence lengths in the known corpus cut off abruptly at a maximum of thirty words.

Written Europan glyphs are in effect hexgrid braille cells made up of 24 dots, each either filled or empty, making them conveniently convertible for cataloguing purposes into six‐digit values in base 16.  Where one word is syntactically dependent on another, the child has a final row of cells below the main glyph body which echoes the top line (and first digit) of the parent glyph.  Then each sentence is written as a line of glyphs, or in special circumstances as a ring (which may represent a survival of an earlier standard practice, rather like the way we start our sentences with a capital letter styled to resemble the lettering of Roman inscriptions).  In cases where the child/parent relationship between words is ambiguous, the parent is the candidate furthest away in the sentence‐ring.

Other than the use of whitespace between words, sentences, and paragraphs, there is no trace of punctuation; subclauses, quotations, questions, parenthetical remarks, and so on are all clearly signposted either by the dependency labelling or by extra marker words.

It is plausible that the glyphs may have originated as pictograms of some sort, but if so then just like their East Asian equivalents they subsequently developed from iconic into arbitrary symbols.  It has also been suggested that the words in the “spoken” language were themselves in some sense iconic represent­ations of sensory data, though given that the Europans' two main sensory modalities were electro­location and sonar it is unlikely that this would make the glyphs' meanings any easier for humans to guess.


SYNTAX

The syntactic analysis implicit in the design of the glyphs recognises just two lexical categories, which turn out to be essentially equivalent to our verbs and nouns; if there are any words that can be either (as in English “talk the talk”), they are disguised by the writing system, which represents only their independent functions.  The standard glosses used in the Europan Epigraphic Catalogue conventionally distinguish verbs from nouns by writing the latter in capital letters.

The fundamental rule of Europan syntax is that each word may have any number of children, but can have no more than one parent (the arrows in structural diagrams radiate outwards from parent to children).  Since words come in two types and each word can have either sort of parent, there are just four different standard types of syntactic relationship in the language.  This apparent simplicity comes at the cost of making individual dictionary entries more idiosyncratic.

N→N (noun child of noun parent): “specifier”

This construction is used to form quantifiers (“a drop of water”, “some people”), determiners (such as THE), and inalienable associatives (thus for instance sore→PROSOMA→OFFSPRING→ME, “My child has a headache”).

Other N→N constructions (such as FOOD→BUG, “litter, rubbish”) may be more like English compound nouns, in that forming a new one is like coining a new vocabulary item rather than applying a grammatical rule.  Proper nouns are a special case of this, commonly made up of two or more noun glyphs strung together in a chain; repeated references to the same name tend to truncate it to the parent glyph.

V→N (noun child of verb parent): “argument”

A verb may have zero, one, or more nouns attached as arguments, but all of them have the same relationship to their parent verb.  In human languages, verbs may have a subject, a direct object, an indirect object, and so on; but Europan fractionates all such constructions into multiple intransitive verbs that are then chained together in V→V constructions:

own →move→ adjust
YOU FOOD ME

“I put the food in your possession”; or in other words, “I give you some food”.

No overt distinction is made between verbs that are argumentless because the subject is being left vague and ones that function as standalone modifiers with no role for an argument.  When a verb has more than one argument, they don't have any inherent order, so where temporal sequence is significant it needs to be specified explicitly (see next).

N→V (verb child of noun parent): “descriptor”

Descriptors are verbs subordinated to nouns, usually translatable in terms of an adjective or relative clause (SIBLING→speak, “a sibling, who is talking”).  Many Europan verbs are glossed in terms of predicate adjectives anyway (clever→SIBLING, “the sibling is clever”), so when these verbs take a noun as parent it makes sense to translate them as attributive adjectives (SIBLING→clever, “a clever sibling”).  The two constructions “a clever sibling”/“the sibling is clever” may seem to say much the same thing, but while a noun can be modified by any number of N→V descriptor constructions, it can take part in only one V→N argument construction, so that one stands out as the primary focus of the clause.

A descriptor verb may have subverbs (usually requiring the relative clause translation: SIBLING→speak→obstruct, “a sibling, who can't speak”).  However, it cannot take an explicit argument, perhaps because the noun to which the descriptor is subordinated is felt to be effectively its subject.

Not all verbs in N→V relationships function as modifiers like this; INSTANCE, for instance, is one of a specialised family of nouns that serve to “nominalise” the attached subclause.

V→V (verb child of verb parent): “subverb”

This construction resembles the strings of verbs that can occur in English sentences (“I can't help wanting to keep trying to…”).  Many terrestrial languages use “serial verb” chains heavily, dividing ideas that English expresses using a specialised word (such as “collect”) into a concatenated sequence of subcomponents (“come/grasp/go”).  Europan can take this much further, since its syntax never needs to be limited to simple linear chains – just as a verb may have any number of arguments, it is also possible for a single parent verb to have multiple modifiers and subclauses attached:

ME ←believe→ false
YOU← equal ←quote→ own →THE
SMITH stand →FOOD

“I do not believe that you are Smith or that the food belongs to them” (E#C2576C = SMITH being the commonest Europan personal name element).  In complex sentences, it may not be apparent from the syntax whether a child verb is a modifying subverb (like false above) or a full subordinate clause (like equal), but in practice any ambiguity could be resolved via further modifiers.

All the above sentences are tame by the standards of the Europan language.  Some of its sentence structures are difficult even to fit onto a page as 2D diagrams, but the Europans themselves could parse them on the fly when received as batches of words “spoken” in parallel, with their inter­relationships shown by the accompanying tags:

[11‐glyph sentence]
false SMITH YOU quote stand THE believe equal FOOD ME own

TOPOLOGY

Europan is like terrestrial languages in that its sentences can be diagrammed as directed graphs – that is, sets of nodes linked by arrows running from “parent” to “child”.  In all terrestrial languages, the graphs are simple, connected, acyclic, rooted trees.  They have one and only one node without a parent: the “Sentence” node, which despite being the “root” of the tree is conventionally placed at the top.  Europan doesn't recognise the same sort of syntactic category nodes, only overt verbs and nouns as in a dependency tree, but it does allow verbs (and only verbs) to be parentless.  Indeed, as the two example Europan sentences given so far demonstrate, they often have just one such “source” node as the ultimate ancestor of every other, thus forming a classical “tree structure”.  However, this is not invariably the case; it is also possible for there to be several parentless nodes (where the sentence is incompletely connected) or for there to be none (where a graph forms a cycle).

“Disconnected” sentences made up of adjacent but syntactically independent clauses are common, often serving to present causally unrelated events that are being considered together, as in “I continued eating as the current grew stronger” (or equally “The current grew stronger as I continued eating”).  Various more complex relationships between clauses are also handled by just putting them together within a sentence:

go →posit infer← go
YOU potential ME

“If you're going then I'm going.”  Although diagrammed as three separate subgraphs, this would be uttered as a single batch of seven words, and written as something along the lines of YOU posit go potential go infer ME, where each instance of go is the parent of the noun and subverb on the other side of the sentence.  Note the presence there of another kind of parentless node: the free‐floating word potential, which marks the sentence as conditional.  Effectively, it works like an English initial adverb (such as “effectively”), which stands on the margins of a sentence and modifies it as a whole.

Such constructions are in theory sufficient on their own to make Europan incompatible with a description limited to conventional tree structures, but this is arguably an illusion caused by our choice of terminology; we might instead have decided to call each connected subgraph a “sentence” and used some other word (such as “utterance”) for a set of subgraphs spoken in parallel.

The more significant topological anomaly permitted in Europan syntax is the cyclic graph (or “loop sentence”).  These are less common, but far from unusual in contexts such as reciprocal interactions:

adjust →ME→ own
MONEY← move move →FOOD
own ←YOU← adjust

This sentence could be translated relatively literally as “I give food to you who give money to me” or more idiomatically as “I sell you food”.  Pop‐science accounts of the decipherment of Europan that insist on seeing every facet of their culture as deriving neatly from their biology invariably attribute this interest in symmetrical social relationships to the fact they were hermaphrodites.  Be that as it may, the claim often seen in such accounts that loop sentences were used to express reflexive verbs is a mis­understanding – on the contrary, the whole category of potential loops passing through a single noun was avoided.  Even in the layer‐two sentence “So does it [the set of all sets] contain itself?”, the grammar of the reflexive construction is straight­forward, by Europan standards:

infer ←reciprocate→ contain
query THAT stand

Europan syntax makes it impossible for multiple loops to occur within a single connected subgraph, but a sentence containing several separate loops is possible in principle.  There are no cases in the available corpus, but this may not be significant; the clearest evidence we have that Europans found some types of loop sentences entirely natural is that cycles were used in contexts that show no obvious need for them.  For instance, they occur as a way of “fine‐tuning” sequential or positional relationships:

ME upstream stand
stand south YOU

That is, “I am southward of you, who are upstream of me”, or equally, “You are upstream and to the north of me”.  Tighter loops are possible, but the only known example of a three‐verb loop is still undeciphered:

e#9b3169
  ⭩ ⭦  
e#cb5ad3 e#f76600
E#5AB561 obstruct
E#5D8CA8

The N→N construction E#5AB561E#5D8CA8 may be a name, and the verb e#f76600 is also used elsewhere in the same text, which may be discussing either a game or a legislative process.  There is also one clear case in a philosophical treatise of the tightest possible kind of loop, though it is presented more as an obtrusively clever epigram than as a piece of normal conversational Europan:

bad←?OVERVIEW← increase
↑   ↓
increase →FACT→bad

The apparent meaning is along the lines of “The less reliable the data, the less reliable the worldview; the less reliable the worldview, the less reliable the data”.  It is especially notable because it casually breaks another of the rules obeyed by all human languages: the structural diagram requires two arrows between the same pair of nodes, which means it isn't a “simple” graph.

[6‐glyph sentence]
?OVERVIEW increase bad bad increase FACT

LEXICON

The full Europan Epigraphic Catalogue has over ten thousand entries, which for the size of the corpus is a notably small vocabulary, but still big enough that a large number of them will almost certainly never be deciphered.  The following selection of important vocabulary items should be enough to give a general impression of the language.  Each entry gives:

  1. The glyph itself.  Noun glyphs (uppercase) are composed of asymmetric “strands”, while verb glyphs are symmetrical patterns.
  2. The EEC glyph‐code, which is simply the hexadecimal equivalent of the glyph's binary series of full and empty dots.  The catalogue uses this as a natural “alphabetical order”.
  3. The gloss, which is a single word conventionally used as the human‐readable equivalent of the E‐number (but to be understood more as a mnemonic than an accurate translation).  All of the examples given have glosses accepted by both cadmus and esa schools of decipherment.
  4. Notes on the word's usage.

Verbs predominate over nouns in this selection not because there are more of them but because they tend to be both more linguistically interesting and easier to assign a definite meaning.  Many nouns are like E#B6CB38: we know it refers to some basic means of transport, but we can't be sure whether it was the equivalent of a pony, rickshaw, or jetpack.



AFTERWORD

Many things remain unclear about the Europans.

The normal assumption is that the Europans felt the answers to be somehow sensitive information.  It has been widely suggested that they were in fact postbiological entities who only represented themselves as still being cephalopodlike organisms because they were censoring the part of the story where their species developed the technologies that eventually led to their tragic fall (or glorious transcendence).

A newer and more subversive variant of this theory is that the Europans were not the creators of the message but its intended recipients.  The obelisk builders passed through this system just as the locals seemed sure to develop the means to emerge from their home ocean at some point within the next million years, so they left a message in a format designed (with superhuman skill and attention to detail) to be easy for them to learn, and incorporating a corpus of texts translated from the various tongues of the Europan Iron Age.  Of course, if this theory is correct, the existence of unopened obelisks is evidence that the locals failed to live up to the expectations of their patrons.  It also suggests that we have been misguided in our own expectation that the message would ultimately prove to have a valuable payload of technological arcana.  Perhaps the real gift was the part we've mistaken for wrapping paper: the Europan language itself.