A decade or so ago, the idea had taken place that translation memories and glossaries were assets, in a budgetary sense, as if they carried an intrinsic value. Indeed, this was an axiom, a statement taken as a self-evident truth, so evident to be accepted without controversy or question and to serve also as starting point for further reasoning and arguments.
This axiom was a corollary to another axiom, the quality axiom, stating that fewer translators produce a more consistent output. This absurdity is as deep-rooted as deceptive, as if a reader could distinguish some ten thousand words in a million. Also, since the beginning, translation memories have been disputed as per quality control, intellectual property, impoverishment of the language, and value chain. Indeed, 30 years after their arrival, translators still tend to correct full matches in a translation memory, as Marco Trombetti noted in a recent article. This shows that, contrary to what is assumed in the asset axiom, rather than being intrinsic, the value of translation memories and glossaries comes from the ability to exploit them.
The asset axiom was a brilliant invention of technology providers to bring service providers purchase translation memory management software and leverage the recovery of profitability that would ensue from discounts on full and fuzzy matches. Nevertheless, although brilliant, the discount pitch was also unethical and poisoned the whole industry to the point that, still today, also due to an immobilizing coopetition, no standard whatsoever has been reached on how to calculate these matches.
Also, translation memories and glossaries do not have any inherent feature for quality assessment. On the contrary, metadata can be successfully used in quality assessment and management, as Smartling managed to do with the Quality Confidence Score and the transition from the traditional reactive approach to a predictive one.
How? With metadata, specifically translation metadata, the data describing every translation project, from the due date to the delivery date, from the number of days worked to the scheduled ones, etc. Not to mention the ratings assigned to linguists and managers.
This also shows why and how metadata and performance measurement are linked. Metadata gives companies a lot of options to measure performances, provided it is available and, more importantly, in real time so that, instead of measuring past performances, it is possible to get the trend and predict future performance. In other words, the power of data lies in how it is used.
Memsource has released something similar to Smartling’s QCS with the Machine Translation Quality Estimation, which reckons MT quality scores before post-editing in a way very close to those for translation memory matches. MTQE compares any new output with past post-editing data to infer the post-editing effort: The higher the score, the lower the effort.
In the last few years, applications in the realm of machine learning and neural networks have proliferated, and yet few in the translation industry seem to have really seen this coming. And although radical technological advances are no new phenomenon, no slew of unexpected changes happens at once.
This explains the noisy clamor around AI, when real AI has not entered the translation industry yet. And it is honestly hard to see “translators being the ones who will turn off the lights in the offices after everyone else has long gone home.” In no coming future.
On the other hand, data is useless without context: It is necessary to know how, when and/or why it is collected to infer any conclusions. That is what is in metadata, and when it is analyzed, it helps understand the associated data and becomes a resource of its own. Also, like any other data, metadata is only worth having if you use it.
So, here is where machine learning comes to help, to improve process and business efficiency. By using translation metadata on both the buyer and the vendor side, it should be possible to forecast translation demand/supply cycles, especially as to content types and domains, as well as quality endpoints. Performance analysis based on project data would help improve and fine-tune processes starting with dynamic pricing.
The mistaken admixture of ML and AI in an improbable equation becomes apparent in quality assessment. Any QA tool that is ‘smart enough’ to autonomously perform any accurate translation quality measurements would also be ‘smart enough’ to do any translation itself, better, faster and cheaper. And we would be all fool not entrusting every translation task to it. In contrast, however sophisticated, any QA tool can perform only comparative checks and often runs into false positives precisely because it is not ‘smart enough.’
Unfortunately, seldom are project managers willing to spend much of their time filling out forms, especially when they do not see any immediate return. In fact, entering data to describe a job might be boring, despite the usefulness of this data for discovery, documentation, and management.
This metadata concerns project management, process management, and business administration.
On the other hand, missing, inadequate, or incorrect metadata, and its misuse can be a serious hindrance. A typical example is segment-level TU-metadata in TMs. TMs have the mechanism to preserve this metadata that, unfortunately, all too often, is missing, irrelevant, or simply wrong, thus damaging the quality of TMs and, hence, of the MT systems that are supposed to be trained with them. Despite TMs have been around for almost 30 years now, and corpus linguistics has not been a weird discipline for a long time, very few industry players, especially those who are supposed to get the most from TMs have learnt to deal with TU metadata. Are TMs assets, then?
Smartling’s QCS is then a real innovation allowing buyers to know what they are going to spend for, thus assessing and budgeting their translation effort on factual data. The next step might be improving QCS through translatability assessment, and then compare it with a post-factum score from a combination of checklist scores, correlation-and-dependence, precision-and-recall and edit-distance scores. Wait a minute, the last three scores would require TM metadata!
This lack of attention to metadata is a side effect of the coopetition afflicting the translation industry with industry players belatedly launching one standardization initiative after another without really focusing on any that might actually produce any remarkable and lasting outcome. This is the case with the TAPICC effort: Despite being promising, it has produced little more than nothing in two years.
So, it is very hard to see the rationale and imagine a future for a language data marketplace. Metadata might be worth exchanging, provided a standard format would ever be reached. In any case, why using tokens? Where’s innovation? Where’s the vision? Timothy Leary had some too, and he was not even Georgian.