The Entry of AI into Everyday Life and the AI Brain Drain
AI research dates to 1956 and for over fifty years it was mostly confined in universities, with long-term projects aiming largely at serving science.
With the upsurge of computing power and data explosion, research in artificial neural networks has enjoyed a tremendous boost in the past decade, with AI eventually finding extensive practical application and commercial uses, mainly in computer vision and NLP where rule-based AI performed very poorly.
Today Gartner estimates that, in 2021, AI applications will help recover 6.2 billion hours of worker productivity for $2.9 trillion and this drew interest from VCs and spurred a growing hype around AI.
Indeed, tech giants have poured virtually unlimited financial and computing resources into AI sweeping up the best talents. At the same time, venture investments in AI has roared, mostly in a handful of incredibly well-funded companies that can therefore afford to compete for brains.
Regrettably, the shortage of people with the necessary skills and knowledge to run even basic AI projects is as big across different industries. Therefore, AI is getting more and more a thing for business with deeper pockets. This has led to AI brain drain, with scientists and researchers leaving AI labs at universities and flocking to the commercial sector for more lucrative jobs.
Under these circumstances, it is going to get harder and harder for universities to hire and keep scholars, researchers and teachers to cultivate the new generations of AI scientists, and the AI gap will widen further.
As long as the quest for AI-related skills and talents keeps going, wages in this sector will remain high. And so will the flow of venture capitals. Democratization might be for tech not brains.
The VC Raid
VCs raided the translation industry too, although with notable differences in scope and focus. Most VC picked tech-like companies; others went for LSPs with substantial cashflow that would guarantee some profitability at least in the medium term.
Tech-like companies are companies who do not focus exclusively on technology and rather take advantage of it to sustain some basic low-cost human-based workflows.
Raising over $150 million in a full A-C round series should also raise a few questions. On the other hand, the hot water in which some apparently more innovative company might be would suggest that many PEs in translation business are not naive investors who may not really understand the business, but they most probably know the technology. Or, more simply, that investors just put their money where they could figure how to use it.
This would explain why draining money might look easy to certain people who think that the interested backing of some mesmerizing people, some fancy buzzword and the hype of the moment are enough to lure people into investing in some technology they don’t understand—maybe through an opaque funding mechanism like an ICO—without being accountable.
If it is so, the Theranos case must have taught nothing to no one. Indeed, one of the key parts of this case was the ability to raise huge amount of money from outside traditional funding channels without going public.
On the other hand, there is plenty of VCs searching for yield and chasing for disruption, and as long as investors invest their own money, they can put it wherever they want. That’s ‘venture’.
The AI Scantiness
Indeed, taking as reliable any surveys from firms that do not exactly shine for scientific rigor, transparency and ethics, despite the persistent and yet unfounded claim that the translation industry is a high-tech industry, only a slight majority of LSPs says they are using some form of MT. Also, those LSPs seem not using MT in production, most probably for the difficulty of integrating the technology with their processes.
This means, paradoxically, that LSPs leave MT to freelancers, who, when doing, use it as a support tool to increase productivity and offset the continuous dropping of compensations. It should actually come as no surprise, then, that customers see on-time delivery and end quality as most important. Anyway, these go hand-in-hand with price, and this means that something does not add up with these surveys, that is, as Seth Stephens-Davidowitz would most possibly put it, Everybody Lies. In fact, if quality, timeliness and reliability—in one word, professionalism—are so important, that’s because they are still frighteningly missing. And this is really a major issue, given the sophistication of tools and education available. Yes, Gresham’s Law again.
On the other hand, the translation industry still relies on surveys while Seth Stephens-Davidowitz proved they are ineffective, showing how data science can help gather real-time insights into people’s thoughts and behaviors that they may be unwilling to openly disclose.
Things do not get better with automation at large and AI technologies are far from actual implementation at almost all industry players.
And yet, AI-related technologies are going to be more and more pivotal to keep up with the chain explosion of content and language pairs, not to mention the rate of updates. And all this will keep happening on a growing number of different platforms.
Term mining might be one of the most enticing AI applications via machine learning (ML) with language data. As a typical horribly boring job, the automatic extraction of terms from large datasets is exactly the kind of task that algorithms can perform best. Indeed, it is ‘just’ discovering patterns in text and deriving high-quality information from it. So far, term mining leant on two rule-based approaches, the linguistic and the statistical one. The first try to identify the combinations of words that correspond to certain structures of speech and can generally work on one language or on akin languages. The latter identify repeated sequences of strings and therefore are language-independent, but highly subject to noise.
Today, machine learning allows for deriving patterns from models and therefore can make term mining far more accurate than the rule-based applications and even faster than the statistical ones.
The Deceptive Quest for Interoperability
Some people in the translation industry insist with saying that interoperability is a major issue. They are right in pointing at the problem, but they are desperately wrong with identifying the causes and hinting at a possible solution.
Despite the many, huge differences in scale, in many other equivalent industries the problem of interoperability is not equally felt and is still addressed with some effectiveness, maybe because interoperability is generally good for business, while many translation technology companies impede it for locking customers in.
As Jack Welde strikingly clear and right said recently, customers expect their business partners help them improve their business through translation, through understanding what is improving it and how can they make their global content production more efficient. To do so, data is necessary, and it must be good enough for analytics. In short, customer focus is essential, beyond small talk at industry events—where, anyway, customers are harder and harder to find—and social media crap.
It is certainly true that most buyers allocate just a trifle in their budgets for language-related tasks, while it is very unlikely that this depends on overhead or on the variety and incompatibility of systems among LSPs.
Companies in other industry have been aware of how critical metadata is for a very long time now, at least since computers and the Internet have become ubiquitous.
Translation industry players are generally lagging behind; while still producing massive amounts of metadata, they hardly even collect it, although applications for the extraction of business intelligence from workflows and processes (business analytics) are within everyone’s reach today. These applications can help gain crucial insights and infer customer-valued estimates. Translation could be a very tiny fraction of a project, although small, and no buyer is willing to put a project at stake on independent variables. Therefore, to avoid guessing, buyers require factual data to assess their translation effort, to budget it, and to evaluate the product they will eventually receive. Unfortunately, when metadata is not accurate enough, it is scarce and partial it is doomed to become rapidly irrelevant.
Any LSP should then first focus on what make its customer’s efforts more efficient, cost-effective, and insightful.
What has the industry produced so far, instead? A handful of dispensable standards.
The standard delusion peaked with yet another wasted opportunity, TAPICC. The definition of metadata for translation was a major goal. A basic model would have been enough, but after three years from the launch of the initiative, the industry will still have to wait. How long?
Well, the industry has its own terminology exchange format, although still largely unsupported, especially for import, and an obsolete translation memory exchange format that does not allow for storing important metadata. That’s it.
So, it is at least unconceivable that, if a lack of interoperability leads to great waste, no one in the industry is reasonably and seriously committed towards a standard model for metadata and the relevant exchange format. Or to unblock the stall that prevents the unification of current language data standards.
The truth most possibly is that no one is really interested in the release of any instrument that might potentially lead to real competitiveness thus unveiling the inadequacy of even the most celebrated tech pundits, not to mention the meanest convenience of the leading industry players.
The Baloney of the Industry’s Constant Growth
Recently, TransPerfect CEO Philip Shawe brought in the case of a typical LSP that is giving its employees a 5% raise each year with zero growth and constant efficiency. He said that some basic math would be enough to reckon that LSP out of business in five to seven years.
Besides compensations plummeting for over a decade, one of the reasons why some VCs showed certain interest in the language industry is its much-boasted uninterrupted growth, often in two digits. So, Shawe’s argument sounds just like a typical excuse for being unable to reduce overhead and inefficiency or grant better compensations.
Also, any LSPs can achieve substantial efficiency gains simply by implementing advanced automation (see before) that might even prove perfectly affordable.
However, no efficiency gain will ever replace growth, while every efficiency gain coming from process reengineering will take some time to produce a tangible productivity increase, so that even with the most advanced technology in place, any efficiency gain might eventually be lost.
Traditionally, growth may either be organic, i.e. through new product development and/or the increase in sales or customer base expansion, or inorganic, i.e. by increasing output and business reach through mergers or acquisitions. Any company should use a combination of the two. And technology can be equally helpful to gain insights and make better offering to loyal customers, entice new ones, or find the best business to buy or merge with.
That Will Never Work
“That’s not going to work for us” is something every consultant has been told at least once. It is uttered especially once a new model is proposed for a change in business and when this model involves the extensive adoption of new technologies or processes.
Most often, the kind of advice that any businessperson is willing to accept is everything that looks simple and cheap to him/her, and that he/she can label as ‘common sense’, although it might have proved ineffective. It is the case, for example, of the broken windows theory that Shawe revamped in the same interview on growth.
That Will Never Work is also a book recounting the story of Netflix as a scruffy start-up and does away with the fancy origin story that Reed Hastings co-founded the company out of frustration over the $40 late fee charged by Blockbuster.
The Netflix case is interesting because disruption may come unexpected but it is always due to poor customer focus, whether it shows in fares, availability, pricing options, terms of sale, etc. as a Twitter thread triggered by Saleforce’s Vala Afshar a few days ago has made clear.
The disruption champions of these latest years are all the living proof that money is the prime lead, and that they thrive by enabling and exploiting cheaper access to products and services they do not make. Quality and timeliness are accessories that are taken for granted. Just like in translation.
Finally, if the most influential factor affecting competition is the LSP concentration ratio, then the translation industry should be virtually unfragmented. Actually, it could be so, since very few companies run the industry with a much larger lot acting as sub-vendors or sub-sub-vendors. It’s not pulverization or fragmentation, it’s just inefficiency.