OSS (JBR RFC)

Status:

Proposed Standard

Stream:

On‐Demand

Category:

Euclid

Abstract:

One of the problems with copy‐editing documentation for software projects is that many of the specialist vocabulary items used have no consensus pronunciation. IT professionals who habitually read them out in unconventional ways (and I don't just mean the non‐native English speakers) are often unaware of this lack of consistent reproducibility, which can lead to problems when we need to agree whether dialogue boxes should talk about “an URL” or “a URL”.

Fortunately the IT world has an established procedure for solving this sort of problem: all I have to do is announce an RFC proposing a set of best‐practice pronunciation guidelines and as long as nobody else writes one overruling it I'll be free to cite it as an authoritative source.

(It's more conventional to cite RFCs that have already reached the point of being adopted as Internet Standards, but fortunately RFC 23059 will introduce the option of alternative formal layering models, so I get to reference any future standards I like as long as I include that one. RFC 23059 is immediately superseded by RFC 23060, which makes forwards layering mandatory again, but we can ignore that because of course it hasn't come into force yet.)

Requirements:

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document REALLY OUGHT to be interpreted as described in RFC 2119.

Considerations:

The main factor making technical terms hard to deal with for any sort of text‐to‐speech conversion is that there's a strong tradition of abbreviating things as much as possible, both to reduce the typing required and (in the old days) to save time on slow teletype terminals. The abbreviations come in a variety of forms:

Acronyms: an abbreviation produced by taking the first letter of each word is often sounded as a single vocable in its own right. The initialism option (see below) tends to be reserved for when this isn't convenient; however, some cases that might be treated as acronyms never are (IT, UX), some are disputed (FAQ, UFO), and some surprising strings of letters are traditionally given acronymous interpretations by the addition of “token vowels” (SCSI = “scuzzy”, VRML = “vurmle”).
Initialisms: these are similar abbreviations that are spelled out letter by letter. In other contexts it's not uncommon for these to be made typographically distinct from acronyms (as e.g. N.S.A. versus Nasa), but in technical writing many words need to stick to canonical forms composed of lowercase alphanumerics. The software world also seems keen on confusing the issue with hybrid acronym/initialisms like jpeg = “jay‐peg” and ldapd = “el‐dap‐dee”. Notice that the stress tends to go on the last syllable in initialisms but the first in acronyms and hybrids.
Affixisms: this category is treated similarly to the hybrids above, but has a distinct origin – the widespread use of a single letter as a distinguishing prefix (or occasionally suffix) attached to an existing term, commonly serving to mark it as having a particular origin. Thus “GNU zip” becomes gzip (= “jee‐zip”) and “new compress” is ncompress (= “en‐compress”).
Numeronyms: an abbreviation mechanism that turns “abbreviation” into a10n and “internationalisation” into i18n. In these cases the standard way of reading them out is always to reexpand them.
Truncations: long words cut down to just the first syllable or two. This approach is especially common for standard directory names such as /sys = “siss”, short for “system”.
Disemvowelments: command names like cp and mv are transparently recognisable shorthand forms, taking the vowel sounds for granted as too trivial to bother recording (a popular approach since Ancient Egypt). These can be read out complete with the implied vowels (“copy”, “move”); the alternative of spelling out the short forms character by character is best reserved for the cases that actually are initialisms.
Contractions: the “lossy compression” version of the above, as in chgrp and fmt, which throw out arbitrary internal consonants as well as vowels. In these cases the omissions seem more significant; instead of being fully restored (giving “change‐group”, “format”, etc.) they can be treated rather like the sort of contractions that use apostrophes (“wouldn't've”), and padded with the bare minimum supply of indistinct token vowels to make the abbreviation speakable: “tch'grup”, “f'mmt”.

The only 100 % regular strategy would be to read absolutely everything character by character: unix = “you‐en‐eye‐ex”, and so on. This becomes awkward very quickly, hence the need for normative guidelines on when to use which of the other available approaches. Initialisms also have the rarely considered drawback that there are no ISO‐approved names for the letters of the Latin alphabet – dialectally variable cases include H, J, R, W, and Z, though fortunately almost all of these have a clearly dominant pronunciation that we can declare official. Even where the letter‐names are uncontroversial they sometimes lack standardised spellings: is Q the letter “cue”, or “kew”, or “queue”, or what? The Unicode Consortium has really dropped the ball on this one, with its circular definitions like “U+0051 Q = LATIN CAPITAL LETTER Q”.

A further complicating factor is the way a software engineering background tends to involve familiarity with the mathematical use of characters from the Greek alphabet such as alpha, mu, and chi. As if things weren't confusing enough, there's also a strong tradition in IT of writing them in ASCIIified forms – udeb instead of “μdeb”, TeX instead of “Teχ”, and so on. Fortunately there is at least general agreement on what these letters' names are and how those names are spelled. While there are two major traditions in pronouncing them, for once the more modern and international scheme is more widely used in the US, so let's adopt that as canonical: β = “bayta”, not “beeta”, τ = “tao”, not “taw”, and so on.

The International Radiotelephony Spelling Alphabet (also unhelpfully misnamed the “Nato phonetic alphabet”) is available to cover a few use‐cases, but falls at the first hurdle by failing to distinguish Latin A from Greek alpha; besides which, if NFS is going to be pronounced “november foxtrot sierra” then we'd have been better off not abbreviating Network FileSystem in the first place.

Guidelines:

The following examples of best current practice are to be adopted on the RFC Internet Standards Track.

#: Twitter's one positive achievement was getting everybody in the world to agree that they were creating hashtags, not octothorpetags or anything else like that, so at last we can declare “hash” the victor. This frees up “number sign” as an unambiguous label for № and “pound sign” for £.
-: this ASCII symbol can stand in for a hyphen, em‐ or en‐dash, or minus sign, but officially according to the Unicode standard each of those names refers to a different specialised multibyte glyph, while this seven‐bit character is named “hyphen‐minus”. Of course, any time it really is functioning as a hyphen it passes unmentioned, as in apt-get = “apt get” and UTF-8 = “you tee eff eight”; where it does get a name it's usually “dash” (not to be confused with dash).
-o: this convention for command options has been around since the Multics era without it ever being properly settled whether that leading character should be read out as “dash” or “minus”. However, for long options --option is canonically “dash‐dash‐option”, and standardising in that direction also makes things like -0 easier to deal with. It's only in the few cases where +o does the opposite of -o that it makes sense to highlight this fact by reading the latter as “minus‐oh”. Note meanwhile that “hyphen‐minus‐oh” is completely out of the running.
.: there's no general agreement whether the punctuation mark is a “period” or a ”full stop”, although the Unicode standard sides with the latter. Fortunately the character that appears in abbreviations isn't that punctuation mark anyway: a version string like 2.0.10-1 is neither a sentence nor a decimal fraction, so the only way to read it out is character by character, “two dot zero dot one zero dash one”. (Speakers of Commonwealth English ought to be able to agree not to read 0 as “nought” or “nil” if Americans can refrain from calling it “oh” or “aught”.)
.ext: things would be tidier if we could simply insist on filename extensions being read out character by character like initialisms (“dot ee‐ecks‐tee”), but this rule seems unlikely to be widely adopted without an explicit exemption at least for consonant‐vowel‐consonant extensions such as .doc and .zip.
/: Unicode thinks this is called “solidus”; wrong again, it's “slash” (muddling it with backslash is forgivable if and only if you also confuse your ps and qs). The character is usually left unspoken in terms like TCP/IP or in filesystem paths like /dev/null, but the top‐level “root directory” / is a special case: to avoid confusion with the /root directory it always needs to be read out as “slash”. Users with directories full of fan‐fiction therefore need to be careful how they label them.
acl: commonly pronounced “ackle”, as in getfacl.
adm: adm and admin are both common as abbreviations for either “administrator” or “administration”, so instead of long‐winded and ambiguous reexpanded readings it makes sense to give them distinct truncated pronunciations: mdadm = “em‐dee‐ad'm”, but kadmind = “kay‐admin‐dee”.
arch: almost always a truncation of “architecture” rather than “archive”, but either way the pronunciation is “ark” as in multiarch. One significant exception is that arch linux is pronounced like “archfiend”, not “archangel”.
ascii: as an anglocentric legacy standard, this is entitled to a matching pronunciation, like an old‐fashioned Anglo‐Latin rendering of the plural of “ascius” (“equatorial”): ascii = “assy‐eye”.
bin: nothing to do with the “Recycle Bin”; it's a truncation of “binary”, so for instance binutils is “byne‐you‐tills”. The sbin directory was so named at the dawn of the epoch because it contained “static binaries”, not as people nowadays often assume “system binaries”, but either way that's “ess‐byne”.
c#: the spec for this language carefully dictates that its name is written as a pair of ASCII characters but pronounced instead as C sharp, a note with a frequency of 277.18 Hz.
c++: ++ isn't a sequence of addition operators, it's a single postfix‐increment operator, so attaching it to c gives “see‐inc”. However, using an incremented operand in the same expression as an additive modifier gives unspecified results, which means “C and C++” can be pronounced any way you like.
c2go: the numeronym c??go might represent “cargo”, but that expansion would risk confusion with the Rust package manager of the same name, so it's safer to read it as “congo”.
cache: a French loan that rhymes with “panache”. If there was any such thing as “cash memory” then it might make sense to differentiate this with an anglicised spelling‐pronunciation, but then it would end up rhyming with “backache”, which isn't an improvement.
cfg: this and its variants cf, conf, and config occur not only as extensions but as affixes and freestanding words; it can be hard to guess which variant a given piece of software will use or whether it'll stand for “configure”, “configuration”, or (assuming there's any such word) “configurator”. This makes reexpanding them essentially unworkable – especially on Debian‐derived systems, where “config files” need to be kept distinguishable from “conf‐files”. The solution is to leave the simple truncations conf and config as they are while giving “token vowel” pronunciations to the contracted forms (cf = “kuff”, cfg = “kuff'g”).
ch: in commands such as chacl, chroot, chsh the prefix is a lossy abbreviation for “change”. This entitles them to be padded out with token vowels where necessary: “tchackle”, “tch'root”, “tchush”.
char: almost always (as in chardet, charmap, charset) a truncation of “character”. Depending on the accent this either comes out as “car” or “care” – it doesn't matter which as long as it isn't treated as a truncation of “charade” or “charcoal”.
ck: this works much like cfg. The forms ck and chk occur unpredictably as abbreviations of “check”; the more contracted version can be kept distinct from the one that merely elides a vowel and silent consonant, giving cksum = “k'‐sum”, chktex = “check‐TeX”.
ctl: an abbreviation for “control(ler)” that mostly occurs as a suffix; the dropped consonants are left dropped, so ioctl = “eye‐oh‐cuttle”.
dash: a shell seemingly named with the intention of making things like dash -c dot awkward to communicate over the phone. Fortunately where present it tends also to be available under the name sh (a lossy contraction, not an initialism).
dd: it doesn't exactly stand for anything, but it's clearly at least pretending to be an initialism, so it might as well be treated the same way as all its less pathological neighbours like dc, df, and du.
debconf: An example that shows the importance of paying attention to where the stressed syllables were. The Debian package configuration tool debconf is “debconf“ while DebConf, the annual Debian Conference, is pronounced “debconf”.
dev: this can occur as a truncation of either “device” or “developer/development”, but either way it's unstressed, which means the vowel is automatically reduced to something between “div” and “duv”; thus for instance udevadm (with u for “userspace”) is approximately “you'd've‐'ad'm”.
dir: a truncation of “directory”. That unstressed first syllable has several variants on both sides of the pond, but to make it reliably pronounceable with the final r included we need to standardise on “dire”, as in chdir = “tch'dire”, dirmngr = “dire munger”.
dracut: the software is named after Dracut, Massachusetts, which is in turn named after an English hamlet that for once spells its name less obscurely than its US namesake: Draycot = “dray‐c't”.
efi: “ee‐fye” for “extensible firmware interface” barely counts as an acronym rather than initialism, and its successor uefi = “you‐ee‐fye” takes it a step further.
eiffel: a programming language invented by a francophone, so it's “eff‐ell”, not “eye‐full”.
etc: a top‐level directory name that comes pre‐abbreviated, but ignore the folklore about /etc standing for “Editable Text Configuration” or “Extended Tool Chest” and /usr for “Unix System Resources” or “Universal Standard Repository” or some such, they were just the “miscellaneous” directory and the “user” directory (originally reserved for non‐system files). etc isn't really even an initialism for “Et Cetera” (that would be “E.C.”), so it makes more sense to use the reexpanded form, just as usr gets reexpanded to “user”.
exec: a truncation of “execute”, so for instance rexec is “ar‐ecks‐ick” and O_CLOEXEC is “o'kloh‐ecks‐ick”.
fs: pronunciations of fsck are very poorly standardised even though the first half is routinely an initialism and the second we've already got a rule for; the canonical answer is “eff‐ess‐uck”.
fuse: “f(ilesystem in) use(rspace)”, so “eff‐yous”.
git: not in fact an abbreviation, and pronounced with a “hard” g as in gif.
gn: instead of the g being silent as in English “gnaw” it's traditional for it to be pronounced as in French or Italian, so gnocchi = “nyawk‐kee”, gnome = “nyohm”, gnupg = “nyoop‐jee”, and so on.
grep: for consistency with gedit, ghex, gparted, gzip, and so on, this is “jee‐rep”.
gui: this is often given an acronymous pronunciation as “gooey”, but it's more consistent to treat it like cli, tui, ui, ux, and so on, which are almost universally initialisms.
h: the dialectal variant “a haitch” is arguably more logical than “an aitch”, but it has to be deprecated so that we can talk about HTTPS URLs without the old “a or an?” problem making a comeback.
id: where this isn't Freudian psychoanalytic jargon it's often mistaken for an initialism, but in origin it's just a truncation of the word “identification”, so the correct pronunciation is “eyed”, as in pid = “pee‐eyed”, uid = “you‐eyed”, and uuid = “double‐you‐eyed”.
if: where this means “interface” it always needs to be spelled out as an initialism to avoid confusion; so for instance ifup is always “eye‐eff‐up”, never “if‐you‐pee”.
init: a truncation for “initialisation”, so /etc/init.d = “et‐cetera‐inish‐dot‐dee”, sysvinit = “siss‐five‐inish”.
ip: unlike if this one doesn't need to take evasive action and can be read out acronymously.
iw: this network utility has inexplicably been given a name that means “of goat‐willows (Salix caprea)” in Polish, and is therefore correctly pronounced “eef”.
k: an especially common one‐letter distinctive prefix (here usually representing “KDE”); thus knotes = “kay‐notes”, kate = “kay‐ate”, konqueror = “kay‐onkeror”, and so on.
k3b: in this case the numeronym “k???b” can only mean “kebab”.
lib: a truncation for “library”, so for instance glibc = “jee‐lybe‐see” and libexec = “lybe‐ecks‐ik”.
libreoffice: the only language this can plausibly be in is French, so “lee‐braw‐feece”.
linux: this postdates the general acceptance of the Teχ convention, which means it would normally be expected to have a final X. Since instead it hearkens back to the older tradition of unix it has always needed to provide a definitive HOWTO.
ln: this is “link” with a significant consonant discarded along with the vowel, so it's in the most radical category of contractions. Reexpanding it would get it mixed up with link, a separate tool for doing the same thing, so instead it needs to be padded out with just a token vowel as “lun”.
logo: a programming language with a name officially taken from the Greek “λόγος” (“logos”, with short vowels) meaning “word”, not from the English “logo” (“low‐go”) meaning “corporate emblem”.
ls: a consonant‐losing contraction of “list”. Making it pronounceable doesn't even require a token vowel, exactly; it'll work with just a syllabic l as in “it'll” plus a final s as in “syllables”. Likewise, lsattr = “'lls‐att'r” and lsinitramfs = “'lls‐inish‐ram‐eff‐ess”.
m4: not the fourth version of anything, and not a multiplier (“mmmm”!); it's officially a numeronym, “m????” = “macro” (which is in turn a truncation of “macroinstruction”, but that would be m15).
mgr: ambiguous in some contexts, but in the IT world it's never short for “monseigneur”, and is fully expanded as in qmgr = “queue manager”.
mit-scheme: German, “mit‐shehmuh”.
mk: common as a prefix, as in mknod, mkswap, mktemp and so on. For strict consistency with the usual guidelines these would be expanded as “make‐node” and so on, but in order to keep them distinct from invocations of /usr/bin/make, the suggested approach is instead to use a token unstressed vowel and standardise on “McNod”, “McSwop”, “McTemp”.
mysql: the prefix “My” here was originally a Scandinavian name (ultimately derived as it happens from the word mu) with a vowel sound that doesn't exist in English, but anglophones usually pronounce it as if it was the unrelated Scandinavian name “Maj”. Meanwhile the pronunciation “sequel” has deeper roots than the initialism; SQL was in fact renamed from SEQUEL to avoid a trademark. If in doubt, switch to mariadb which makes everything much simpler.
nginx: like “minx” except with initial ng as in “haNGiNG” and of course a final X.
opt: occurs as a truncation for both “option(al)” and “optimise(r)/optimisation”, so getopt = “get‐opsh” while ocamlopt = “oh‐camel‐opt”.
passwd: this might have been named pass or pw or something, but since instead it's technically a “lossy” contraction it's legal to pronounce it as “passw'd”, assuming you distinguish that from “password” in the first place.
php: although this was originally an initialism for “Personal Home Page”, since it abandoned that interpretation it's acceptable to read it out simply as “phup” (but not “ph'p”, which can lead to code injection exploits).
pi: just as β is “bayta”, not “beeta”, the Greek letters with names ending in I are all correctly pronounced to rhyme with “ski”, not “sky”: ξ = “ksee”, π = “pee”, φ = “phee”, χ = “khee”, and ψ = “psee”.
plymouth: indirectly named after a UK city with a spelling that matches the way Chaucer would have pronounced it; these days it's “plimmuth”, or probably to the locals “pl'mmph”.
po4a: a slightly unorthodox numeronym, to be read out in its expanded form, “polenta”.
proc: a truncation for “process(or)”, as in nproc = “en‐proce”, procmail = “proce‐male”.
ps: where this stands for PostScript it's just an initialism, so for instance psutils = “pee‐ess‐you‐tills”. On the other hand where it's a lossily contracted form of “process(es)” it can get by with just a token vowel – so for instance pstree = “pus‐tree”.
python3: a Monty Pythonism, traditionally pronounced “throatwobbler mangrove three”.
route: the choice here is between the original French‐style pronunciation as a confusing homophone of “root” and the alternative that makes it a confusing homophone of “rout”. The solution is to go a step further into spelling pronunciation and standardise on “ro‐yute”.
rx: the use of rx for “receive/reception” and tx for “transmit/transmission” is a deviant abbreviation style left over from nineteenth‐century telegraphy (where rx/tx were faster than r./t.), but in speech it's simpler to forget about dots and dashes and pretend they're initialisms.
sata: sata, like atapi and pata, derives from the original ATA standard, which for trademark reasons never stood for anything. There's no consensus on whether they're “at‐uh” or “ay‐tuh”, but at least unlike “data” they don't seem to have a breakaway “ah‐tuh” faction.
socks: network protocols with an s for “secure” almost always treat that as a separate one‐letter affix (imaps = “eye‐map‐ess”, scp = “ess‐copy”, etc.). However, this one is just a truncation of “sockets” with the pluralisation restored, and is therefore pronounced “sox”.
sox: obviously “soX”.
std: in an IT context this is always a lossy contraction of “standard”, so it just needs a token vowel to make it pronounceable: stdin = “st'd‐in”, zstd = “zayta‐st'd”.
su: originally this stood for “SuperUser” – the “Set/Substitute/Switch User/UID” interpretations were later retrofits. Some argue that its core function is to create a new shell, which need not necessarily involve a change of UID, so its name is best seen as short for “subshell”. However, its younger sibling sg is definitely an initialism for “switch‐group” or similar, so for consistency they have to be “ess‐you” and “ess‐jee”.
tab: where it doesn't represent the name of a key (Tab↹), “tab” is a suffix truncated from “table”, so crontab = “krontayb” and fstab = “eff‐ess‐tayb”.
TeX: short for the Greek “τέχνη” (“technē”) meaning “art/skill/craft”, which is why it's pronounced with a χ = ”kh” sound rather than an X = ”ks”. A major trendsetter, followed by LaTeX, XeTeX, MathJaX, and so on.
tla: a T.L.I. would be “tee‐ell‐eye”, but a three‐letter acronym is correctly referred to as a “tlah”.
tmp: a disemvowelled shorthand for “temp”, which is in turn a truncation of “temporary”; as with cfg, when multiple abbreviated forms are in use we need to give audibly distinguishable pronunciations to cases like tempfile (“temp‐fyle”) and ones like systemd-tmpfiles (with “t'mp‐fyles”).
tr: a very lossy contraction of “translate”, or arguably “transliterate”, but that only makes it clearer that we don't want it reexpanded: just add a dummy vowel and make it “tur”.
u: where this occurs as a prefix it can be hard to guess whether it's u for “uniform”, “universal”, “Unix”, “user(space)”, or something else, but the initial is the same anyway. The tricky cases are the ones where it isn't a prefixed u.
ubuntu: unlike variants such as Kubuntu and Xubuntu, this isn't a word with a distinguishing prefix. It's a Zulu term that might reasonably be rendered (ignoring complexities such as tones) as “ooh‐booh‐ntoo” or “oob‐oont‐oo”, but probably not “you‐bunt‐you”.
umount: again the u isn't a distinguishing prefix, it's short for “un”. Omitting a single consonant while leaving all the vowels is a perverse use of the lossy contraction strategy (especially when there's also an established abbreviation mnt), but to go by the book we ought to pronounce it “uh‐mount”.
unix: this predates the general acceptance of the Teχ convention, so it's just plain “you‐nics”. The pun on “eunuchs” only works in dialects with strong reduction of unstressed vowels.
url: the early‐nineties RFCs defining URIs, URLs, and URNs treated them consistently as initialisms (“a U.R.L.”). If the alternative strategy of treating them as acronyms (“an Url”) was ever going to take over it would have happened decades ago, so it's time for the Society for the Prevention of Cruelty to Dead Horses to step in.
usleep: a case where the initial is a distinguishing prefix but isn't u: it's an ASCII placeholder for µ meaning “micro”. The Ancient Greek letter's name is correctly pronounced “moo” (or pedantically “mü”), not “mew”.
utils: a (repluralised) truncation for “utilities”, so for instance findutils = “fyned‐you‐tills”.
var: a truncation for “variable(s)”, so efivar = “ee‐fye‐vair”.
vi: this isn't an initialism, it's the first two letters of “visual”, and that isn't pronounced “vye‐zhew‐al”, so there's only one defensible pronunciation: “six”.
w3m: “w???m” can only mean “whelm”.
www: a so‐called abbreviation that's much longer than the phrase it's meant to be shortening, unless you go for the Welsh solution and pronounce it “oo‐woo”.
Χ: the Chi windowing system. Remember that χ represents a “kh”‐sound as in “loch”, and the pronunciation of the letter‐name “chi” recognised as correct by students of Ancient Greece isn't the anglicised “kye”, it's “khee”, as in “Loch Freuchie”. Strictly speaking that's post‐classical, but only pedants use the older aspirated‐stop pronunciation “k‑Hee”.
Χ11: this is a numeronym abbreviating an initial Χ and final 1 with a single character in between; since that character was in fact a 1 this expands to Χ11 = “khee‐one‐one”.
xss: an exceptional case, since it's neither xss nor χss: it's ❌︎ss for “cross‐site‐scripting”, so the initialism is “cross‐ess‐ess”.
z: there's no agreement on whether the last letter of the Latin alphabet is “zed” or “zee”, so it's safer to assume Z represents its Greek lookalike Ζ, as in zlib = “zayta‐lybe”, and the compression tool ΧΖ = “khee‐zayta”. It would be unnecessarily pedantic to use the authentic Ancient Greek pronunciation ζ = “zdeh‐ta”.
zsh: commandline shells tend to have pronounceable acronymous names (“bash”, “dash”, “fish”, etc.), so instead of spelling out an exception like this one as if it stood for “Zymotechnic Session Handler” or something, it's more consistent to say csh = “cush”, ssh = “sush”, zsh = “zush”.