Let’s face it: Translation is prevalently considered as a trivial job. Investigating the rationale of this view is pointless, so let’s take it as a matter of fact and focus on the ensuing behaviors, starting with the constantly increasing unwillingness to pay a fee—however honest and adequate—for a professional performance. Also, the attitude of many industry players towards their customers does not help correct this cheapish view either.
Unfortunately, in conjunction with the prevailing of the Internet, the idea has been progressively established that goods—especially virtual ones—and indeed services, should be ever cheaper and better. The (in)famous zero marginal cost theory has heavily contributed with to the vision of an upcoming era of nearly free goods and services, “precipitating the meteoric rise of a global Collaborative Commons and the eclipse of capitalism.” But who will pay for this? Are governments supposed to subsidize all infrastructural costs to let marginal cost pricing prevail? Well, no. But keep reading anyway.
Is there anything in all this having to do with data? Data has entered the discussion because of big data. The huge amounts of data manipulated by the large tech corporations have led to the assumption that translation buyers, and possibly industry players too, could do the same with language data. This has also led to an expectation with respect to the power of data, an expectation that may be overly exaggerated or beyond any principle of reality.
Indeed, from the pulverization of the industry a series of problems comes that have not yet been resolved. The main problem consists in the fact there is no one player really big and vertical having such a large and up-to-date amount of data—especially language data—to be even remotely considered big data or to be used in any comparable way.
Also, more than 99,99 percent of translations today is performed through machine translation and the vast majority of the training data of major online engines comes from sources other than the traditional industry ones. Accordingly, the data that industry players can offer and make available and even use for their business purposes are comparably very little and poor. In fact, the verticality of this data and the width of the relevant scope are totally insufficient to enable any player, including or maybe especially the largest ones, to impact the industry. A certain ‘data effect’ indeed exists only because online machine translation engines are trained with the huge amount of textual data available on the Internet regardless of the translation industry.
For these reasons, a market place of language data might be completely useless if not even pointless. It might be viable but the data available could hardly be the data needed.
For example, TAUS Matching Data is an elegant exercise, but its practicality and usefulness are yet to be proved. It is based on DatAptor, a research project pursued by the Institute for Logic, Language and Computation at the University of Amsterdam under the supervision of Professor by Khalil Sima’an. DatAptor “aims at providing automatic methods for inducing dedicated translation aids from large translation data” by selecting datasets from existing repositories. Beyond the usual integrity, cleanliness, reliability, relevance, and prevalence issues, the traditional and unsolved issue of information asymmetry persists: A deep linguistic competence and subject-field expertise as well as a fair proficiency in data management are needed to be sure that the dataset is relevant, reliable and up-to-date. And while the first two might possibly be found in a user querying the database, they are harder to find in the organization collecting and supposedly validating the original data.
Also, several translation data repository platforms are available today generally by harvesting data through web crawling. The data of highest-resourced online machine translation engines comes from millions of websites or from the digitalization of book libraries.
The initiative of a self-organized group of ‘seasoned globalization professionals’ from some major translation buyers may be seen as part of this trend. This group has produced a list of best practices for translation memory management. Indeed, this effort proves that models and protocols are necessary for standardization, not applications.
TMs are not dead and are not going to die as long as CAT tools and TMSs remain the primary means in the hands of translation professionals and businesses to produce language data.
At this point two questions arise: What about the chance of having different datasets from the same original repository available on the same marketplace? And what about synthetic data? So, the challenge of selecting and using the right data sources remains unsolved.
Finally, also the coopetition paradox applies to a hypothetical language data marketplace. Although many translation industry players may interact and even cooperate on a regular basis, most of them are unwilling to develop anything that would benefit the entire industry and keep struggling to achieve a competitive advantage.
For all these reasons, blockchain is not the solution for a weak-willed, overambitious data marketplace.
As McKinsey’s partners Matt Higginson, Marie-Claude Nadeau, and Kausik Rajgopal wrote in a recent article, “Blockchain has yet to become the game-changer some expected. A key to finding the value is to apply the technology only when it is the simplest solution available.” In fact, despite the amount of money and time spent, little of substance has been achieved.
Leaving aside the far-from-trivial problem of the immaturity, instability, expensiveness, complexity—if not obscurity—of the technology and the ensuing uncertainty, maybe blockchain can be successfully used in the future to secure agreements and their execution (possibly through smart contracts), hardly to anything else in the translation business. Competing technologies are also emerging as less clunky alternatives. Therefore, it does not seem advisable to put your money in a belated and misfocused project based on a bulky, underachieving technology as a platform for exchanging data that will still be exposed to ownership, integrity, and reliability issues.
Metadata is totally different: It can be extremely interesting even for a translation data marketplace.
The fact that big data is essentially metadata has possibly not been discussed enough. The information of interest for data-manipulating companies does not come from the actual content posted, but from the associated data vaguely describing user behaviors, preferences, reactions, trends, etc. Only in a few cases text strings, voice data and images are mined, analyzed, and re-processed. Even in this case, the outcome of this analysis is stored as descriptive data, i.e. metadata. The same applies to IoT data. Also, data is as good as the use one is capable of making of it. In Barcelona, for example, within the scope of the Decode project, mayor Ada Colau is trying to use data on the movements of citizens generated by apps like Citymapper to inform and design a better system of public transport.
In translation, metadata might prove useful for quality assessment, process analysis and re-engineering and market research, but it is much less considered than language data and even more neglected than elsewhere.
Language data is obsessively reclaimed but ill-curated. As usual, the reason is money: Language data is supposed to be immediately profitable, by leveraging it through discount policies or by training machine translation engines. In both cases, they are seen as a means at hand to sustain the pressure on prices and reduce compensations to linguists. Unfortunately, the quality of language data is generally very poor, because curating it is costly.
Italians use the expression “fare le nozze coi fichi secchi” (make a wedding with dry figs) for an attempt to accomplish something without spending what is necessary, while Spanish say “bueno y barato no caben en un zapato” (good and cheap don’t fit in a shoe). Both expressions recall the popular adage “There ain’t no such thing as a free lunch.”
This idea is common to virtually every culture, and yet translation industry players still have to learn it, and possibly not forget it.
We often read and hear say that there’s a shortage of good talents on the market. On the other hand, many insist that there is plenty and that the only problem of this industry is its ‘bulk market’—whatever this means and regardless of how reliable those who claim this are or boast to be and are wrongly presumed to be. Of course, if you target Translators Café, ProZ, Facebook or even LinkedIn to find matching teams you most possibly have a problem in knowing what talent is and which talents are needed today.
Let’s face it: The main reason for high-profile professionals (including linguists) being unwilling to work in the translation industry is remuneration. And this is also the main reason for the translation industry and the translation profession to be respectively considered as a lesser industry and a trivial job. In an endless downward spiral.
Bad resources have been driving out the good ones for a long time now. And if this applies to linguists that should be ribs, nerves and muscles of the industry, let alone what may happen with sci-tech specialists.
In 1933, in an interview for the June 18 issue of The Los Angeles Times, Henry Ford offered this advice to fellow business people, “Make the best quality of goods possible at the lowest cost possible, paying the highest wages possible.” Similarly, to summarize the difference between Apple’s long-prided quest for premium prices and Amazon’s low-price-low-margin strategy, on the assumption it would make money elsewhere, Jeff Bezos declared in 2012, “Your margin is my opportunity.”
Can you tell the difference between Ford, Amazon, and any ‘big’ translation industry player? Yes, you can.