Harnessing the Power of Data: A Change in Translation Education

This post was originally prepared as a proposal for the AUC/SCE 1st Conference on Localization, Translation and Interpreting: Bridging Gaps (April 15 & 16, 2019, Cairo, Egypt). The proposal was rejected for being outside the scope of the conference.

Over the last few years, the need has become acute to adapt educational practice in university-level schools for translators to the rapidly changing real-life scenario. Nevertheless, almost everywhere, translation education still follows an old-fashioned approach.

The main obstacles newly translation graduates encounter when starting practicing is the blatant inadequacy to cope with the realities of an ever-evolving world.

Even young translators are seldom computer-savvy numerate, while they are mostly and broadly—frequently obsessively—interested in languages only, regardless of the new requirements coming from an increasingly technologized world. In fact, translation practice is evolving outside the usual conduits.

Today, translation competence is a three-legged table, based on data, tools, and knowledge. These three legs must be of the same length, and then grow at par, for the table not to wobble.

Language is a technology, while knowledge is the foundation for the skills that make a translator capable of mastering all the tools of the trade, producing, accessing, and using data.

Data has entered the discussion because of big data, to the point that a now common refrain says that data is the new oil. However, just like oil, data is of little use until it is refined into something profitable. To this end, data owners should care and protect their reserves as a precious resource, but managing data could be problematic, time-consuming and costly, even for those who know how to do it. So, it is something that should be learnt at the earliest convenience.

Data curation may require different tools but the same kind of skills and, especially in the case of language data, one kind of knowledge.

On the other hand, the huge amounts of data manipulated by the large tech corporations have led to the assumption that translation buyers, and possibly industry players too, could do the same with language data.

This has also led to an expectation that may be overly exaggerated or beyond any principle of reality. Indeed, the data that industry players can offer and make available—and even use for their business purposes—is definitely no big data; on the contrary, it is very little and poor. In fact, the verticality of this data and the width of the relevant scope are totally insufficient to enable any player, including or maybe especially the largest ones, to impact the industry. A certain ‘data effect’ indeed exists only because online machine translation engines are trained with the huge amount of textual data available on the Internet regardless of the translation industry.

What makes standard (big) data different from language data is the deep linguistic competence and subject-field expertise needed to understand it together with a fair proficiency in data management to be sure that any dataset is relevant, reliable and up-to-date.

This applies to the many projects aiming at harvesting, classifying, systematizing, and cleaning language data and even more to the constantly-increasing synthetic data, aggravating the unsolved challenge of selecting and using the right data sources.

In this respect, translation data can turn to be extremely interesting and useful for quality assessment, process analysis and re-engineering and market research, but it is much less considered than language data as this is supposed to be immediately profitable, by leveraging it through discount policies or by training machine translation engines.

We often read and hear say that there’s a shortage of good talents in translation. The main reason for being unwilling to work in the translation industry are weak career profiles and poor remuneration. To broaden their career opportunities, translation graduates should develop a set of new, additional skills that are going to be different every year.

Data management is going to be the primary tasks of this time. Are translation academic institutions ready and willing to deal with translation and language data management? To prepare future translation professionals to take the most out of their own data and the data of their customers and partners? What do future translators have to learn with regard to data?

The ability to exploit the power of data will be a crucial component of a translation business and practice. Data is a precious resource to cherish and take advantage of and new educational programs are needed to teach translation students to deal with data management.


Author: Luigi Muzii

Luigi Muzii