From Mists of Time

This post was originally published on Language International 8:3 (1996).

Dear Editor,

The article by Dr William J. Niven on machine-friendliness and reader-friendliness in machine translation in the 7:6 (December, 1995) issue of Language International has urged me to submit to Dr Niven a few remarks.

Dr Niven is perfectly right in maintaining that highly formatted text is used in technical writing to induce the reader to read more slowly or draw the reader’s attention to some part of the text. As a general rule, however, good formatting serves MT purposes as well as reading and human translation, while bad formatting produces poor results even from the most brilliant of human translators. Page and text layout could and should be modified when changes in the layout are functional to translation, intended in this case as a peculiar kind of documentation localization. With the principle of real localization in mind, reader-friendly could eventually be equal to machine-friendly and vice versa as easiness in reading for a human being could mean easiness in reading even for a machine, and a document being easy to read for a machine is definitely easy to read even for a human being.

In this perspective it is not always true that the tabular format, for example, provides more logical structuring and clearer distribution of information. Technical writers use tables very cautiously as they are difficult to read even for a well-educated human being. With respect to MT’s own difficulties in reading formatted text, this problem does not pertain to MT only and, in some way, this could explain why technical writers are increasingly shifting to SGML. SGML and mark-up languages in general have the unique capability of describing information in a software-independent format, and all major word processors are now capable of formatting text using mark-up techniques, thus making it easier to separate text from formatting codes. In addition, the algorithm for tag isolation and shedding should be quite simple, hence format decoding should not be a titanic endeavor. Analysis and pre-analysis could be highly improved by effective use of spelling and syntax and grammar checkers which technical writers actually scarcely use, while controlled language could be used for simplified writing and punctuation usage which could vary greatly from one culture to another.

Technical writers, however, are rarely aware of what writing for machine translation or with translatability in mind means. Unfortunately, technical developers do write their own manuals: what technical writers receive to work on is mostly a document in itself which can be “refined” but not “rewritten” as is often necessary. Despite the lack of time and inclination, developers go well beyond technical specification and do not trust technical writers’ ability. This attitude comes from the popular belief that an arts degree corresponds to an inadequate grasping of technical concepts, which in fact are often technicalities.

Technicalities are, for example, the ambiguities in the reading and interpretation of “fault”, “defect”, “error”, and “mistake”, and controlled language could serve to set the question. But also knowledge based MT systems could, should they make wider use of thesauri and use descriptors as pointers to subject-area flags. The use of synonyms, however, should not be a problem when using a controlled language where one term points to one meaning and terms could possibly be organized in a well-defined structure.

When technical writers are given the proper space to work, they often have to cope with the typical European attitude towards technical documentation: do not offend the reader (the user, the buyer, the client) with too simple prose.

As to passive and nominalization, this is a simple expedient to solve the common problem to approach the reader: German and Italian both use the courtesy form, but directly addressing the reader is still considered inelegant and clumsy. In Italian, the infinitive is now widely used as a form of indirect imperative, while nominalization is used when the agent is evident or made clear elsewhere.

Ultimately, a translator is a technical writer alone and an MT system is a translator too. Modem CAT systems also produce over-literal translations: no machine is capable of converting one culture into another. This is not a syntax problem, writing is the process of reproducing human thought in a readable manner.

The spread of English as the technical language has made too many bad translations acceptable and accepted. Quality in technical translation is no longer a linguistic property, its usability and suitability is. And this is as true for human translation as for machine translation.

The question of machine-friendliness versus reader-friendliness is misleading. Translators do not like post-editing because they feel themselves overridden by their clients on behalf of a machine. Technical writers cannot write with translatability in mind because they are not taught, nor are translators and MT developers taught to cope with the subtleties of technical writing. But who should teach a technical writer to write with translatability in mind? And who should teach technical writing to and for translators to allow them to better understand the technical documents they are called to process? MT developers sit on an ivory tower in a world apart: their work is not for trivial translators. MT is still academy and this provides for a good explanation of the success of poor but cheap MT systems for the PC. They can depict formatting codes, and even if the output translation still reflects the original syntax, even if it is coarse, garbled and flawed it can tell the user whether the document is worth a costly human translation.