In God we trust; others must provide data

Pressure gaugeThe earliest match known of the quote in this post’s title was spoken in 1978 by pathologist Edwin R. Fisher while addressing a subcommittee of the U.S. House of Representatives. Fisher stated that the adage was already a cliché, so the originator remains currently anonymous. The quote has since often been (mis)attributed to W. Edwards Deming, but there is no substantive support for assigning it to the launcher of the TQM movement.

This is just another blog post on translation quality to fulfill the promise made to ContentQuo’s Kirill Soloviev to give a look at the product’s introductory video.

The video provides no real insight of the product, and not much information is available either on the website about it, apart from a generic listing of features and the usual marketing claims.

The first statement in the press release reads “data-driven linguistic quality management,” but there is no evidence that the product makes any use of data in a machine-learning fashion, i.e. using rules (algorithms) to analyze data and learn patterns and glean insights from the data.

On the contrary, the “solid theoretical approaches to quality measurement” boasted in the press release and on the website are actually the same old-fashioned error-counting routines that have been used for decades and are known for being costly and ineffective.

Indeed, probably to entice the many traditional translation buyers, ContenQuo’s website insists quite obviously on error annotation, error categories, error severities, error penalties and error weights. Unfortunately, no standard definitions exist of error and error categories, and ContentQuo website suggests agreeing on what quality is for each product or customer and complying with the MQM-DQF “issue catalog.” Alternatively, customization of quality models is prospected through changing the list of categories, severities, error penalties and weights, and pass/fail scores.

The hinges of “modern” translation quality assessment are accuracy, fluency and adequacy. These concepts have been devised in the usual academic circles and are largely subject to subjective interpretations. In fact, there is not one single method to describe, identify, and evaluate errors. Let alone weighing them.

Adequacy is loosely defined as the extent to which the meaning expressed in the gold-standard translation or the source is also expressed in the target translation. Likewise, fluency is defined as the extent to which a translation is well-formed grammatically, contains correct spellings, adheres to common use of terms, titles and names, is intuitively acceptable and can be sensibly interpreted by a native speaker.

No metric still exists to measure any of these properties.

As a matter of fact, accuracy, in statistics, is defined as the difference between a measured value of a quantity and its true value. In statistics, so it is measurable, and it is indeed measured. Obviously, the common definition of accuracy in translation does not match that in statistics. Indeed, it could not, as its definition is extremely vague, having much to do with the similarity of meaning. Some define it as a bilingual notion referring to the correspondence between the source and target text (adequate transfer of meaning.) One way to measure it might be via edit distance, which has unfortunately proved to be largely ineffective. Recently, a metric has been devised in the MT field that could help. Its name is LEPOR, and it takes purely statistical elements into account, like precision and recall.

In the end, ContentQuo appears to be nothing more than a nice try. Undoubtedly, it looks like a comprehensive “solution” in the current context, where no TMS provides anything similar, and if you are a traditional translation buyer, with a typical TMS, stuck in the middle of TEP, etc. ContentQuo might possibly be made for you.

In this respect, prompted for commenting this post before publication, Kirill Soloviev pointed out that ContentQuo has been designed as a platform to enable multiple different and yet practical approaches to translation quality measurement and management in an enterprise setting in the belief that no “silver bullet” exists in language quality.

He also pointed out that the initial product release focuses on the analytical, human-driven approach to translation quality measurement, as the one most popular to-date despite its well-understood limitations.

Unfortunately, though, machines are prevailing in every arena where non-creative tasks are involved, allowing considerable savings in terms of efficiency, speed, reliability, and expense, and every smart company today puts its data in good use and uses ML for (predictive) analytics, even for translation quality.

Not only can ML be used to predict translation quality outcomes, every predictive score should be compared with actual outcomes. Machines are much faster at sorting through content than humans and can produce results that can be statistically validated. Machines can compute a post-factum quality score from content profiling and initial requirements (checklists), traditional translation “QA” (i.e. checking for machine-detectable errors in punctuation, numbers, inline tags, capitalization or extra spaces, missing translations or terminology inconsistencies,) correlation and dependence, precision and recall, and edit distance. Like in e-discovery, machines could also forward on only samples that are questionable thus possibly requiring humans to review.

ContentQuo website also boasts a smart random sampling function, although without providing no further info on it, any feature to set quality thresholds, or even defining “smart.” And yet, this would be extremely useful in combination with the equally boasted real-time quality scorecards that are said to be calculated instantly as soon as linguistic errors are annotated.

ContentQuo seems to be just another tool of the trade to allow translation industry players to pompously pretend they measure quality, while wallowing in the same old information asymmetry ever. On the other hand, if the only tool you have is a hammer, you are going to treat everything as if it were a nail. Likewise, you cannot solve the problems you are facing using the same kind of thinking you used when they were created.


Author: Luigi Muzii

Luigi Muzii