ENGLISH FOR SOFTWARE LOCALISATION

2010–2024 Justin B Rye
(Non‐Geek Escape Route)

SECTION A – FOREWORD

Welcome to a reference collection of tips from my documentation reviews on the debian-l10n-english mailing list, now slightly updated (and I suspect there may be some additions on their way eventually).

A1: introduction

These notes could go on the Debian Wiki, if it wasn't for the fact that typing a paragraph or two of text into my web browser is enough to remind me that editing text is easier in a text editor.  Besides, I don't want to have to defend my notes against well‐intentioned sabotage by people who half‐remember some piece of mumbo‐jumbo handed down to them by their English teacher; this may be a prescriptive style guide, but it's one primarily designed to help people write the way competent native speakers really do in the twenty‐first century.  The idea is that next time I'm reviewing something claiming to be “an unix software that allows to run an own irc‐based proxy” I'll just be able to point at prefabricated summaries of what's wrong with it.  (Yes, that's how I'm highlighting “bad example” usages.)

A2: contents

SECTION A – FOREWORD
1: introduction | 2: contents | 3: folklore
SECTION B – VOCABULARY
1: disallowed! | 2: false friends | 3: ambiguities | 4: odds and ends
SECTION C – GRAMMAR
1: relativisation | 2: definiteness | 3: tenses | 4: plurals | 5: modifiers
SECTION D – STYLE
1: dialect | 2: colloquialisms | 3: formalisms | 4: miscellaneous
SECTION E – ORTHOGRAPHY
1: spelling | 2: case | 3: hyphens | 4: flyspecks | 5: listings | 6: leftovers
SECTION F – CONTENT
1: general | 2: debconf | 3: extended descriptions | 4: synopses
SECTION G – AFTERWORD

A3: folklore

First I'd better get out of the way some “grammar folklore” rules with no particular basis in linguistic reality.  They have never been real features of the grammar of English as used by even the most universally admired writers – they're delusions propagated by people who want to be able to look down on all the members of the general public who fail to obey their imaginary rules.  We might nonetheless choose to abide by these taboos just to avoid the arguments.

“Restrictive Which”
English‐speakers may introduce “restrictive” relative clauses (see C1) with either “which” or “that”.  The myth says the only grammatical variety is the one in which the introductory word is “that”.
“Sentence‐Final Prepositions”
Not only are these completely grammatical, sometimes they're compulsory – here's one for your school English teacher to think about.
“Sentence‐Initial Conjunctions”
People claim you can't use words like “and” or “but” to start a sentence.  But why shouldn't we?  It seems to have no trace of a rationale.
“Singular They”
This has been idiomatic since Middle English, and is the only natural way of saying something like “I suppose either Alice or Bob must have lost their key”.
“Split Infinitives”
Some constructions seem to more than suggest that allowing modifiers between “to” and its verb can be unavoidable.

Some prescriptivists insist on usages like “none of you knows whom he would choose if he were I”, even though they're long extinct in most brands of natively spoken English.  Following their advice is a good way of making yourself sound as if you were brought up on the Lost Island of Snooty Robots.


SECTION B – VOCABULARY

English offers plenty of opportunities for picking the wrong word.  Sometimes it even seems to be systematic about it; for instance, it often presents a three‐way choice between “‑ing” noun, plain noun, or “‑ation” noun, all of them more or less synonyms (“some counting”, “a count”, “a computation”).  The “‑ing” words can be tricky to fit into a sentence, since they keep some of their old verbal habits, while “‑ation” words tend to be fancy and abstract.

B1: disallowed!

This one crops up so often I'm putting it right at the top.

You can't “allow to” do something (as in “this option allows to compile code”).  You can say that “this option allows you to compile code”, or “this option allows code compilation”, or even “this option allows code to be compiled”; but if there's no direct object noun phrase immediately after the verb, it's almost certainly ungrammatical (and the same goes for “permit to”).  Native‐anglophone readers will know what you mean, but they'll also suspect you've got a funny accent.

Besides, unless the software is something like PAM, how likely is it that it literally “allows” me to do something otherwise forbidden?  It enables or simplifies doing things, or helps me do them, or simply does them.

B2: false friends

Well known cases where the English word doesn't mean what speakers of most European languages expect.

beware of… when you mean…
actual current
arrive succeed
conscience consciousness
consequent consistent
demand request
especially specifically
eventual random/possible/hypothetical
experiment experience
few several/a few
funny fun
mention give/specify
pretend claim
relative relevant
respective corresponding/appropriate
sensible sensitive

B3: ambiguities

Each of the following words has more than one well established idiomatic meaning, so you need to be aware of the possible misinterpretations.

“Archive”
Package repositories are “archives”, but so are individual .deb files (they're ar archives).  There are quite a few technical labels for subdivisions of the Debian archives, including “area”, “distribution”, “component”, “release”, and “section”.  Most of them present opportunities for confusion.
“Binary”
If you mean to include Perl utilities and exclude JPEGs, “executables” is clearer.  If instead you're talking about Debian “binary packages”, those are officially so called regardless of whether they contain binary data or ASCII text; even the ones providing kernel source‐code count as “binary” rather than “source” packages.  Both kinds are accessed via the kind of “sources” listed in /etc/apt/sources.list.
“Console”
This is commonly used as the opposite of “graphical”, but also more narrowly as “run in a VT login” (like startx).  And then there are “console games”…
“Database”
The .odb file?  The collection of abstract tables?  The package?  The RDBMS executable?  The process?  The information it stores?  Even “MySQL server” can be either software or hardware.
“Desktop”
In software terms, either the virtual workspace presented by a Desktop Environment or the suite of programs used to implement this; in hardware terms, anything from one specific type of workstation (“desktop versus tower form factors”) to a general term for non‐smartphones (“desktop versus mobile apps”).
“Directory”
A folder in my file system or an LDAP‐style database?  (Oh, and is a “file system” the storage volume presented as a directory hierarchy under some mount point, like /home, or is it the storage format, like NFS?  But somehow this one never seems to cause trouble.)
“Email”
Is “an email” an address or a message?  (Compare “the IP” – an address or a piece of Intellectual Property?)  There's some disagreement over the spelling, but increasingly many authorities recommend the unhyphenated version, so I've retrained my own fingers to agree with them.
“Online”
Is the “online documentation” for an “online game” stored on my /usr partition or their wiki?  This confusion is traditional, but not all traditions are worth preserving.
“Orphan”
A package without a maintainer (as per Debian Policy) or a stray installed package with no reverse dependencies (see deborphan(1))?  Sometimes that second type is labelled as “obsolete” packages, but that's the word used by APT (for instance in apt-patterns(7)) to mark installed packages with no current version in the archives.
“Root”
All sorts of things get called “root”, from directories to servers to windows (and things are even worse for those of us who pronounce “route” as a homophone).  Always make it clear whether you're talking about the administrative login for my addressbook database or whether you mean the system superuser.
“System”
This usually means the OS or one of its subcomponents rather than, say, the oppressive patriarchal capitalist system, but it can be hard to guess exactly which bit, so remember to include a hint.  It's increasingly used in place of “individual computer” in circumstances where that might not mean physical hardware.

B4: odds and ends

Abbreviations
It's easy to let abbreviations from your native language (“p. ex.”, for example) slip through untranslated.  “Resp.” (or, worse, “BZW”) is a particular giveaway: English doesn't have a generally recognised abbreviation for “respectively”, because we hardly ever use the word.  Most of the time the best idiomatic translation is either “or” or nothing.
Come to that, even abbreviations that do occur in English may be worth avoiding for stylistic reasons.  Replacing the Latinisms “i.e.” and “e.g.” with equivalent English phrases (“that is”, “such as”) can make a text seem subtly less technical, and eliminates the danger of confusing them.
Based
The word “‐based” is often unnecessary padding.  An “Ajax‐based” app is the same as an Ajax app, a “network‐based” connection is a network connection, a “Qtbased” GUI is a Qt GUI, and so on.
Logins
Is it “to login to my PC”, “to log in to my PC”, or “to log into my PC”?  Well, the noun is one word, a “login”; but for the verb, since you can “log yourself in” it must be two words (the same rule applies for “backup”, “breakdown”, “checkout”, “logout”, “lookup”, “setup”, and “shutdown”).  Then the “in to” isn't the kind that means “into”; it's just a coincidental sequence of “in” and “to” (compare “giving in to temptation”), so the form I'd recommend is “log in to”.
Management
Although admins spend their time maintaining their systems using APT while developers are managing software releases, it's the former activity that's known as “package management” while the latter is “package maintenance”!
Wares
All the ‐ware words are uncountable; that is, there's no such thing as “a firmware” or “several hardwares”.  Instead it's treated as a material – “some glassware”, “a piece of malware”.  Much of the time if you've written “softwares” the word you were looking for was “programs” or “applications”.  While I'm on the subject, notice that software is installed on a computer, but hardware is installed in a computer.

SECTION C – GRAMMAR

By which I mean an obviously incomplete survey of syntax, morphology, and so on.  If you're looking for apostrophe‐pedantry, it's filed under Orthography.

C1: relativisation

English has four basic types of relative clause.

  1. Ones like this, which you construct using “which” (or “who”, “whereby”, or some other “WH‐word”) preceded by a comma.  These are “descriptive” relative clauses, and only add supplementary, parenthetical information; Germans should be careful not to confuse them with the following.
  2. Ones which are constructed using a “WH‐word” without a comma.  These are known as “restrictive” relative clauses, on the grounds that they define an identifying characteristic of the entity in question.  The main problem with them is the fanatical which‐hunters who want to have them declared ungrammatical.
  3. Ones that you construct using “that”.  These are another brand of restrictive relative phrase; they have the advantage of not waving a red flag at the pedants, but then again, using “that” rather than “who” with a human referent tends to sound a bit stilted to many native speakers (including me).
  4. Ones you construct using no such word.  A third way of forming restrictive relative clauses – lightweight, but often hard to follow.

If in doubt, don't overlook the option of cutting it into two or more separate sentences.

C2: definiteness

Definite vs. indefinite vs. nothing is far too complicated to explain here beyond the rule of thumb that the definite article “the” is for when both writer and reader can identify the thing being referred to.

The question of whether it's “the file FOO” or “the FOO file” is a similar issue of information management, since the answer is that it's either, depending mainly on what's news and what's background knowledge:

Non‐native speakers also tend to have trouble guessing whether to refer back to a previously mentioned idea with “this” or “that”.  This can be really difficult (that was an example).

On the other hand, the rule for whether the indefinite article is “a” or “an” is clear‐cut as long as you ignore the spellings – what you need to know is how the following word is pronounced, and whether it begins with a consonant sound (“a laptop”, “a one‐off”, “a USB device”) or a vowel sound (“an option”, “an hour”, “an xterm”).  Unfortunately, there are a few debatable cases, since some of the things we may need to refer to don't have established consensus pronunciations.  Is it “a FAI server” or “an FAI server”?  How about “fsck” or “SASL” or “URL”?  Sometimes people even alternate between saying “an /etc/hosts file” (pronouncing it “etceterahosts”) and “a /etc subdirectory” (“slash‐ee‐tee‐cee”).

C3: tenses

There's no room here for a full explanation of the rules of the English tense system (besides, if I said that technically it has a grand total of two tenses I would only confuse people…) but here are some hints for the bits I see causing trouble most often.

Watch out for the subtle distinction between the simple past tense, which marks things as over and done with, and the perfect construction with “have”, which marks them as having continued relevance (not quite the same thing as being recent).  There are slight differences in usage between dialects, but basically, a warning message saying “FOO was broken” suggests that it is now fixed; a warning message saying “FOO has been broken” implies the opposite.

English has a system of “sequence of tenses”, where past tense marking on a main clause spills over onto subclauses: “I said my name was Sam”.  This can even happen when the tense mark on the main clause doesn't really indicate past time: “I could stop tomorrow if I wanted”.

The “used to” construction, as in “I always used to get this wrong”, has the annoying quirk of lacking any present tense equivalent – it would be logical if you could carry on with “…and in fact even now I'm still using to get it wrong”, but alas, natural languages don't run on logic!  What's more, even when you get the grammar right, the “used to” construction can easily lead to confusion.  For example, “the software used to do this” would be fine in speech (since the past‐habitual marker is pronounced “yoost” instead of “yoozd”), but it can be ambiguous when written down.

Some dialects have complex rules for when you should use “shall” rather than “will”, but not mine – grep tells me I only use “shall” when I'm quoting something that includes the word “shall”.

C4: plurals

All of the following things are singular in English (or at least, it would be grammatically correct to follow them with “is here” rather than “are here”):

Non‐count nouns also take singular agreement: “all software is fallible”, and so is “mathematics”, or (this side of 1950) “data”.  On the other hand, “a lot of people are here”, while “Alice and/or Bob” can go either way.

Although “each politician” takes singular agreement, and faces are ordinarily distributed on a one‐per‐person basis, it's entirely non‐satirical to say “the politicians showed their faces”.  “Their face” would imply it was shared.

Nouns that modify other nouns are usually unpluralisable (just like the adjectives they resemble), so a collection of managers of windows is a “window manager collection”, not a “windows managers collection”.  But then again, a conference of managers of events is quite likely to be an “events managers conference”, and I can't offer any sort of rationale for these exceptions.

C5: modifiers

Most adjectives can occur either before the thing they describe or after a linking verb (“lonely Jim is lonely”); a few can't (“the lone ranger is alone” vs. “the alone ranger is lone”).  The word “own” may resemble an adjective, but it isn't allowed to appear in either position without the support of a possessive word.  “Its own name” is fine, but “an own name” has to become “a name of its own”.

Nouns “used as adjectives” don't behave exactly like natural adjectives.  They pile up immediately before their head noun, never mixing in with the adjectives to participate in phrases like “a simple, shell, useful script”.

The “dangling modifier” is another of those deprecated constructions that native speakers get away with all the time: “after reinstalling my PC the bug got worse!”  Interpreted pedantically, this sentence claims that the bug performed the reinstallation…


SECTION D – STYLE

Matters of style are essentially arguable, but if you don't want my advice, you don't have to ask for it.

D1: dialect

When people say something is “bad grammar”, what they often mean is that it obeys the grammatical rules of the wrong dialect, which is a stylistic issue.  The real reason for avoiding slangy or dialectal usages isn't that they're inherently bad, it's that they're less universally understood, especially by readers who are themselves non‐native speakers.

As you may have noticed, even though this page is itself written in my usual British‐English HTML style, the variety of English it recommends for debconf templates is the one that goes with an en_US locale.  Other Debian subprojects use en_GB, or have no standard – and even in package description reviews we're often better off letting people follow whatever standard they know best rather than forcing them to adopt one they're uncomfortable with.

Educated American English isn't completely homogeneous anyway; and where there's variation we need to avoid confusing or annoying speakers of either variety.  Take for example the unpleasantly ambiguous phrase “in case”.  For some anglophones, the instruction “unplug your PC immediately just in case of a short circuit” means “conditionally, if and when a short circuit occurs, unplug your PC”; for others (including me) it means “unconditionally, to avert a short circuit, unplug your PC now”.

D2: colloquialisms

Using an informal register has the advantage that it can give a friendly impression; but there's also a risk that this chumminess may be unwelcome in a context where your readers just want you to get on with conveying information concisely and coherently.  Spoken English tends to leave more things implicit, since a real‐world context normally makes what you mean instantly apparent.  A classic example of a usage that's frowned upon in formal writing but taken for granted in conversation is the ambiguous use of “like”: does “options like FOO” mean “those options that resemble FOO, possibly excluding FOO itself, and certainly excluding options unlike FOO”?  Or does it mean “any arbitrary option, such as FOO”?

Colloquial English often uses sequences of independent clauses, you just splice them one after another with nothing to signpost how they fit together, they're called run‐on sentences, like this, see?  Constructions like that are deprecated in writing, but often all that's needed to fix them up is a few commas promoted to semicolons.

Addressing the audience directly with second person (“you/your”) has advantages and disadvantages – it can make life harder for translators – but first person (“I/me/mine/we/us/our”) in documentation is generally a bad idea; it's not only informal, it's also confusing.  Is the speaker the upstream author, some random NMUer, or an animated paperclip?

D3: formalisms

An excessively formal register should also be avoided.  Convoluted uses of balanced antitheses within multi‐line relative clauses within hypothetical conditionals can be a very concise way of saying something, but they force readers to do extra work to “unpack” it.  Even when your display of syntactic knotwork is technically perfect, if it bores everybody into skipping that paragraph you might as well not have written it.

Long, elaborate sentence structures can increase the risk of scoping ambiguities: “One should not fail to avoid making a foolish error and leave the button unpressed”.  On the other hand once you start breaking everything up into bite‐size chunks there's the danger you'll introduce referential ambiguities: “There's a button above the off switch.  The off switch should be recognisable because it's red.  Press it.

The impersonal pronoun “one” (as in “using this emulator one can play arcade games”) almost always strikes me as hopelessly formal; either replace it with generic “you” or rephrase the whole thing.  Similarly, over‐reliance on passive verbs (“a test‐tube was heated…”) is generally unpopular.  Contrary to its bad reputation, the passive voice sometimes provides the most natural and direct way of continuing a sentence (“walking in the door, I was greeted by my friend Pat, so I went over…”); but that's no excuse for saying “please note that it is important that the button should immediately be pressed” when you mean “press it!”

Revising a sentence to introduce or eliminate a passive construction is an opportunity for syntactic problems to creep in and leave you with your pronouns pointing at the wrong things:

[ORIGINAL] “Once FOO has installed BAR, it should be removed.”
[HALF‐EDITED] “Once BAR has been installed, it should be removed.
[FIXED] “Once BAR has been installed, FOO should be removed.”

D4: miscellaneous

NOTE that tags saying “NOTE” are a bad sign.  Documentation is entirely constructed out of strings of notable points, tacked together into (preferably) coherent paragraphs.  If you need to sprinkle it with labels saying READ THIS PART, that probably means it's a bit of a mess.

Gender‐neutralising by explicitly saying “he or she” is often clunky (though not as ugly as telling half the human race they don't count as people).  If you want to avoid breaking the taboo against “they” with a singular, there are some alternatives that avoid the issue:

Avoid unnecessary redundancy and repetition.  Even if it makes sense to refer to the same thing several times, it's considered poor style in English to use the same word repeatedly unless it's deliberate emphasis.  This rule can cause a lot of trouble if you're trying to describe how users usually used to use useful userspace usage‐monitors…


SECTION E – ORTHOGRAPHY

This is the field where I'm most likely to be bossy, since languages and writing systems are two different kinds of thing.  Once there's a community of mother‐tongue English‐speakers who have grown up talking about “less items”, complaints from people who say “fewer items” are pointless – it's one of the ways English is spoken, so it gets to be listed in dictionaries.  But orthographies are artificial rule‐systems propagated via schools, and have no native speakers.  If you spell it as “fiewer itoms” then you're just failing to comply with the standard.

E1: spelling

If you run lintian with all the optional bells and whistles turned on it has checks for quite a few common typos.

Yes, I'm an en_GB‐er myself, but US spellings strike me as a clear improvement in the vast majority of cases.  The best known difference is that en_US expands “i18n” as “internationalization”, while en_GB mostly uses “internationalisation”.  However, the OED prefers “‑ize” (as did “The Times” when I was young), and there are a few words that are “‑ise” in both systems, including “advertise”, “compromise”, “exercise”, “promise”, “revise”, “supervise”, and “surprise”.

Other major categories of divergence:

GB US Notes
centre center (but always ogre, auger)
colour color (but always glamour, error)
dialogue dialog (but always fugue, Prolog)
mediaeval medieval (but always aerial, era)
travelling traveling (but always felling, feeling)

The un‐American spelling “programme” still exists, as a British word for TV shows and the like, but these days the computer variety is always “program”.

“Disc/disk” is a strange one: it started as a regional spelling variation but has taken root as a technical distinction between the Compact Discs and other optical media standardised by European audio companies and the hard or floppy disks standardised by the US computer industry.

E2: case

Package synopses are rather like titles, but that doesn't mean they take Lots of Uppercase Letters; the Developer's Reference recommendation is not to capitalise them.  This doesn't mean that you should write “gNU”, though!  We have to distinguish situational capitalisation, imposed by context, from lexical capitalisation, which is part of the spelling of a word.  A normal word can vary from all‐lowercase to first‐letter‐uppercase to all‐uppercase depending on factors like whether it's at the start of a sentence or whether it's in a newspaper headline.  But words like “GNU” or “Linus” or “English” involve letters that are inherently uppercase, written that way regardless of context.

Words with intrinsically lowercase characters are rare outside the world of science and technology (where it can mean the difference between “millitesla” and “megaton”).  But in IT, strings such as “/usr/bin/perl” or “itsupport@example.org” often have to be invoked precisely verbatim, and even strings like “https” or “usb” may need to be entered in a configuration file in lowercase.  The same logic is often applied to package names such as “awk” or “gnome”, which may be left uncapitalised at the start of a sentence in documentation – after all, “apt show GNOME” won't find anything.  Rather than insist on a stylistic policy for this issue that requires people to agree on some particular obscure analysis, it's safest to advise keeping package names out of sentence‐initial position where possible.

Upstream software project “brand names” are a different matter, and are upstream's decision.  If they call it “FOObar” or “FooBar” we should respect the capitals, but if their website calls it “the foobar project” it's not clear whether they're leaving it unmarked or declaring it uncapitalisable.  Incidentally, does anybody have any idea under what circumstances it's appropriate for Debian documentation to label brand names as registered trademarks?  My own suspicion is that there's never any serious reason for us to put such labels on anything; if we were going to get sued for not saying Microsoft® Windows® it would have happened decades ago.

One context where I'm happy to see what looks like titlecase in a package synopsis is for things like cups, where including the expansion as “Common UNIX Printing System” makes it easier to see at a glance that it's doing double duty as an explanation for the name as well as a description.

E3: hyphens

Compounds like “front end” tend to become “front‐end” and then “frontend” as the term gets used more.  Programmers are often early adopters of new jargon, so there's an unfortunate tendency for documentation to be written in a style that's unfamiliar and offputting for the readers who need it most.  Feel free to talk in your private shorthand on the development mailing list, but try to stick to the more newbie‐friendly forms (“file system”, “web server”) when you're addressing the wider public.

I know of a couple of gotchas: being “online” isn't the same as being “on line”, “plaintext” is not the same thing as “plain text”, a “username” is not the same as a “user name”, and “userspace” isn't “user space”.  You'd think the hyphenated versions would make good compromise candidates, but that rarely seems to work… instead my own rule of thumb is: if Wikipedia still treats it as two words, that's what the average reader probably expects.

Structurally complex noun phrases tend to acquire hyphenation not because they're becoming single words but just to make it easier to distinguish (e.g.) a “real‐time machine‐translation system” from a “real time‐machine translation‐system”.

Extra hyphens also occur with phrasal modifiers like “an easy‐to‐use application”, but here they serve to mark the whole thing as a unit; the hyphens aren't needed when the same phrase appears after a linking verb (“it's easy to use”).  You might think the same applies to multi‐word modifiers made up of adverb plus participle, as in “an easily used application”, but since these are never structurally ambiguous a hyphen is considered redundant.

E4: flyspecks

(A cover‐term for backticks, apostrophes, and opening or closing single or double quotation marks.)

The rules for apostrophe use are an obstacle course of arbitrary complexities, where errors are usually spell‐checker‐proof (and the real joke is that they almost never cause ambiguity – we could get along happily with no apostrophes anywhere).  English possessive apostrophes are particularly shambolic.

There's some debate about the use of apostrophes on inflected forms of numbers, acronyms, and so on (“GUI's”, “GPL'ed”, “1990's”).  Most style guides recommend leaving them out (“one OS, many OSs”), but this advice isn't widely followed.

The “logical” style of quotation mark placement, where punctuation is kept outside the bracketing quotes unless it's part of the original text, is prohibited by many US style guides… so let's ignore them in favour of the Jargon File.

And then there's the question of single vs. double quotation marks vs. fancy Unicode ones.  I personally prefer to stick to ASCII in contexts where users are likely to want to do command‐line searches or use copy‐and‐paste.  I also use the “"” character by default, reserving the “'” character for use as an apostrophe or second‐level quotation mark.  Although that's what I learned at school, people tell me it's the American style; and by happy coincidence it's also the style preferred on d‑l‑e, but as long as a given text is consistent I won't object particularly.  (Well… not unless you're using ``TeX'' quotation marks, that is.  Please don't; I'm sure they would get typeset into something beautiful if only they were being post‐processed by LaTeX, but sitting there in my terminal emulator they'll just look rubbish.)

Some writers use single quotation marks not to indicate quotations but as an ASCII workaround for tagging verbatim strings – the sort that I'm HTMLising here in a nonproportional font.  Thus for instance they might say that 'remake' is yet another "simple" replacement for 'make'; this is all very well, but trying to apply it consistently would often make text look too fussy.

E5: listings

Lists where some of the items are themselves slightly complex often benefit from being rephrased (and in particular re‐ordered) for clarity.  For instance, “it supports FOO, BAR, and BAZ with QUUX or QUUX2” is ambiguous in a way that “it supports BAZ with QUUX or QUUX2, plus FOO and BAR” is not.  Another tactic is to upgrade the separators between list items from commas to semicolons:

[UNCLEAR] “spam, bacon and eggs, and spam, eggs, bacon, and spam
[CLEARER] “spam; bacon and eggs; and spam, eggs, bacon, and spam”

Where a list is organised by bullet points, d‑l‑e has developed a sort of house style.

    It features:
     * leading single‐indented asterisk (or maybe dash);
     * semicolon at the end of each item;
     * final period (full stop).
   

However, a simpler approach, less integrated into the surrounding text, is still okay by me as long as it's self‐evident what it's a list of.

     * Independent items
     * Asterisks
     * Capitalisation
     * No other punctuation (or not much)
   

Lists read more smoothly if items are kept structurally “parallel” – usually all adjective phrases, all noun phrases, or all verb phrases, not a mixture.

    Avoid writing them like this.
     o   broken parallelisms!
     o   insufficiently similar;
     o   Don't go together very well
   

Mind you, if it's only two or three bullet points it might work better as a plain old sentence; lists with sublists are particularly worth flattening.  And although it's important to make it clear whether the list is exhaustive, it's easy to overdo it – there's no need to say “some of its features, for example, include (but are not limited to) FOO, BAR, and BAZ, among many others”!

E6: leftovers

Ampersands:
Using “&” within text is considered informal (though for some reason it's okay for “Baz & Quux, Solicitors”).  Slash as a shorthand for “or” is often worth avoiding too, since it's easily misinterpreted (compare “text/html”, “TCP/IP”, and “CVSROOT/config”).
Commas:
Commas present different difficulties depending on where you acquired your punctuation skills.  Europeans should beware of excess commas changing the meaning of their relative clauses; native anglophones should bear in mind that splicing paragraphs together with just commas is very informal.
There's a weak consensus in style guides that lists like “FOO, BAR and BAZ” usually need an extra (“serial”) comma: “FOO, BAR, and BAZ”.
Digits:
Lowish integers should usually be written out, while the rest follow LC_NUMERIC=en (999.999 is almost a thousand, and 999,999 is almost a million).  The European tradition of interpreting “billion” as “tera‑” rather than “giga‑” (and so on) is almost extinct in the UK, but meanwhile we've got “tebibytes” to worry about.
Ellipses:
The use of “(FOO, BAR, )” to indicate an open‐ended list may be standard C syntax, but it isn't common in English prose; use an “etc.” instead of an ellipsis.
Emphasis:
The accepted ugly ASCII stand‐in for emphatic mark‐up is to tag text as *bold*, _underlined_, or (rarely) /italic/.  Keep it to a bare minimum, though – excessive emphasis is REALLY ANNOYING.
Exclamation marks:
These can occasionally be justifiable, but see above on emphasis!!1!
Question marks:
The use of interrogative forms in debconf prompts is tightly regulated: you're only allowed a question mark if it's Type: boolean.  When you need to turn “is the Pope catholic?” into something that technically isn't a question, the easiest approach is to transform it into “please specify whether the Pope is catholic.”
Spaces:
There should be no space before “:”, “?”, or “!”.  Between sentences we're standardising on one space rather than two, which isn't what I'm used to, but for a start it's more resilient against HTMLification.

SECTION F – CONTENT

This is arguably outside the remit of a localisation mailing list, but while we're reviewing a piece of documentation it makes sense to do some fact‐checking and general editing.

F1: general

The setting determines where the dividing line is between things being technical jargon and general knowledge.  TLAs usually ought to be expanded or explained the first time they're used – and if they aren't used more than once, why waste time introducing the abbreviation in the first place?  But that doesn't mean you need to interrupt your DIY Integrated Circuits HOWTO to explain what a “P.C.” is.

F2: debconf

See the Debconf Spec and the existing Templates Style Guide (now part of the Developer's Reference).

Debconf dialogues should almost never need to mention debconf, or even “the installer”; these are technical implementation issues that should be transparent to the user.  Besides, mentioning installation in the middle of an upgrade or dpkg-reconfigure run is just confusing.

When you need to give an example hostname, don't give free advertising to myhost.com, randomword.com, or foo.com; use an RFC‐compliant one like example.org.

It isn't necessarily appropriate to ask “would you like to reconfigure your server?” if the reader might be a sysadmin reluctantly following corporate guidelines for software installations on the company's server.  All you know for sure is that it's up to the reader to answer the question “should the server be reconfigured?”

F3: extended descriptions

See DevRef 6.2.1 to 6.2.3 (and salvaged from the archives, some old guidelines by Colin Walters).

Questions like how the software is implemented and what standards it conforms to can wait.  The basic point of a package description is to announce what this .deb is for – what can it do to solve users' problems and make their lives more fun?

The project homepage is the easiest place to get this kind of text, but don't take that to mean you should just copy it word for word off GitHub: their blurb isn't designed to convey the same information as a package description.  So “diverging from upstream” isn't an issue here any more than it's a problem that the man page is different from the FAQ.

Upstream blurbs may involve confusingly divergent specialised uses of terms like “distribution” or “contrib package” or (if you're unlucky) “free software”, and may be full of hard‐sell advertising copy designed to compete with some unmentioned proprietary equivalent.  Remember that the interests of our users always take priority over the developer's ego; stick to an objective summary of the software's pros and cons.

Unless it's going in “Section: (lib)devel”, you should try to avoid “developerese”; the typical user only wants to know what your application is good for, not how it's implemented.  If libeg-bin is part of EGlib and provides a utility called eg_tool, don't assume that's self‐evident, and make sure that the text makes sense as a description of libeg-bin.  If the significance of the name isn't obvious, the extended description is a good place to put an explanation.  (If it's a TLA you may be able to get away with just using the expansion as the package synopsis.)  I'm the kind of user who finds it easier to get a mental handle on a piece of software if its name has some intelligible connection to its function, so I often ask “why the name?” in d‑l‑e package reviews.  There seem to be quite a few programmers out there who are content to dub their project “yix” just because that's a quick and easy key‐sequence to type on a Dvorak layout, but that label will often be the first aspect of their brainchild that people encounter as they browse through the menus.  Think of it as the most basic starting point of the user interface!

Reimplementations of existing software should be careful not to live in the past, phrasing their descriptions purely in terms of how libfoo-tng was an improvement on libfoo – especially if libfoo2 might have all the same features.  At best, once libfoo-tng succeeds in becoming the standard implementation and libfoo vanishes from the repositories, users will be left relying on software archaeology to work out what purpose your package serves.  And I can never resist pointing out just how eighties the fad of calling things “The Next Generation” is!  Beware dated content – references to boot‐floppies or X11R6 support, game reviews assuming that 3D acceleration is a novelty, and so on.  In fact it's a good idea to avoid claiming that your package is notably “modern” (in ten years when it's an orphaned relic that text will be an annoyance); say what its features are (e.g. “graphical”), and let readers make up their own minds about whether that's an advantage.

Some other varieties of Too Much Information:

F4: synopses

The balancing act between too little information and too much is particularly hard for short descriptions.  One thing you should usually leave out is the programming language – it might fit in the long description, but it's a waste of space to say that python-pylibpython-mcpython (Section: python) is written in Python.  Use debtags!

The Developer's Reference says that package synopses should be (articleless) noun phrases referring to the package – that is, they should fit the template “$PACKAGE provides a/the/some $SYNOPSIS” (though the alternative two‐part format popular with large families of packages also has explicit DevRef backing).  They should not follow the example of the man pages that base their description line on verb phrases (“$BINARY lets you $DESCRIPTION” or “$BINARY is designed to $DESCRIPTION”).  The logic of standardising on noun phrases goes like this:

Apologies for the linguistics jargon (which strictly speaking isn't even accurate – I should be talking about N‑bars, not NPs).  I advise non‐syntacticians just to focus on the template approach.


SECTION G – AFTERWORD

I suspect my reviews on the mailing list give the impression I'm some sort of nit‐picking dimwit, so please bear in mind that the best way of spotting typos, grammatical ambiguities, missing definitions, and so on is to approach the text from the point of view of somebody who doesn't already know what it's trying to say.  If you find that sort of ignorance annoying, I apologise; but this may be an indicator that you should delegate the task of writing user documentation to others.

If you disagree with me about some point of grammar or style or whatever, don't worry; at the end of the day, it's the maintainer's decision, not mine, and you're welcome to join the mailing list to provide an alternative viewpoint!