Doping and sauciness

Croyez ceux qui cherchent la vérité, doutez de ceux qui la trouvent.1
—André Gide

Spinach dopingIn Everybody Lies, Seth Stephens-Dawidowitz shows plainly and clearly how traditional survey models are no longer reliable even when highly experienced pollsters apply very sophisticated correctives to mitigate lies and wishful thinking. And yet, these pollsters are utterly painstaking when forming poll samples and wording questions to interviewees.

How can we expect, then, a survey to be trusted when it is run over a small, voluntary and perhaps unrepresentative cluster of subscribers to a newsletter with an insignificant fraction of it answering poorly-worded and misleading questions? Can we either expect a survey to be trusted even if it is run by a self-proclaimed industry reference body without knowing how the sample has been made up or, even worse, if it has not been cleaned and tested?

It should come as no surprise, then, that the industry is not being taken seriously.

Also, no causal link exists or has ever be unmistakably proven between the size of sales staff and a company’s increase in revenues and growth or, in other words, that “the larger the sales team, the more likely you are to bring in new revenue.” And even when more sales are achieved, they do not necessarily turn into more revenues, let alone in more profits. On the contrary, the growth gap between volumes and revenues means that an increase in sales does not automatically turn into an increase in revenues.

Nonetheless, the narrative of the industry’s eggheads, inside and outside of it, goes on in an endless upward spiral of grandeur baloney, in an enduring effort to drive people into an uninterrupted suspension of disbelief. In 1981, Apple’s Vice President of Software Technology Bud Tribble used Star Trek’s concept of reality distortion field to describe Steve Jobs’s ability to convince himself and others to believe almost anything with a mix of charm, charisma, bravado, hyperbole, marketing, appeasement and persistence. But there is no Steve Jobs in the translation industry. There has never been one.

Giving in to hypes

The translation industry is stuck into the middle of century-old paradigms and yet “the next big thing” is announced every day.

There is no thinking outside the box here. This is just a way of eluding the same old issues and keeping on waiting for an outsider to find and implement the solution. And singularity is expected to come a little bit later every day.

In fact, now AI is reportedly predicted to outperform humans in translating by 2024, writing high-school essays by 2026, driving a truck by 2027, working in retail by 2031, writing a bestselling book by 2049, and working as a surgeon by 2053. Parity and singularity, then, will remain distinct for still a long time to come. This should not be surprising as a master algorithm like the one described by Pedro Domingos is not yet in sight, and many different machines will be needed to perform the infinite tasks all humans can perform on their own, although possibly imperfectly. In other words, an algorithm contains a finite number, however large, of instructions to run one specific task. And if its parameters are not adequately defined, the result might be at least puzzling. The short clip below may hopefully clarify the problem with insufficient or inadequate parameters.

A major problem, even when just looking at AI applications, is with the general assumption of taking a set of predetermined universally valid schemes for granted.

In the example above, Inspector Clouseau acts as a typical human while the receptionist seems to follow pure logic. In short, humans assume and guess.

Indeed, strangely enough, or maybe not, in over 60 years, the approach to AI, outside the circles of insiders, has not changed much. When thinking about artificial intelligence, most people think between awe and marvel of machines taking over the world, but what is more and more labeled as AI to the point of making this seem commonplace hardly corresponds to AI and intelligence in a broader sense. For example, one of the reasons making machines learning much faster from their mistakes than humans do is bias, the same kind as Clouseau’s, a typical product of human “intelligence,” while machines have no bias.

So, where does all the fuss around the new hypes such as AI and blockchain come from? First, hypes are created purposely to feed the suspension of disbelief that is necessary for a reality distortion field. Suspension of disbelief can be more easily achieved for accounts beyond general knowledge and understanding.

The ML, NNs, DL, and AI hypes—and NMT too

While machine learning (ML,) neural networks (NNs,) and deep learning (DL) have been in the slope of enlightenment of the hype cycle for some time now, MT has long entered the plateau of productivity and become a GPT for its pervasiveness, in spite of the lack of recognition of translation.

Even though it is mostly known for its application to voice and video faking, picture identification and classification, and assisted driving, Deep Learning has become an essential toolbox and, more importantly, a toolbox enabler, for the vast amount of papers, source codes, and tutorials available. At the same time, Recurrent Neural Networks (RNNs) are now popular architectures that have proven effective in many natural language processing (NLP) tasks.

Today, in the effort to ‘democratize’ AI, the forefront in research is the use of synthetic data (i.e. algorithmically-created data) to rapidly train machine learning algorithms in place of data generated by real-world events, rather than being used ‘just’ to validate machine learning models. This might introduce bias, though. Also, GAFA have hoarded a conspicuous competitive advantage, and synthetic data, rather than help smaller and commoner companies capitalize on it, might prove a nut too hard to crack because, to be effective, it must be nearly identical to a real-world data set. Unfortunately, as technology gets better, the gap between synthetic data and real data diminishes and those companies with a long-time experience with real-world data will be able to exploit synthetic data even more efficiently and cost effectively than before, thus increasing their advantage further.

This is the equivalent of the prospected downscaling from GPU-based to CPU-based NMT. When—and who knows when—NMT engines can be run on CPUs without compromising performance, the companies that will have weightily invested in ‘older’ NMT might see their efforts spoiled or depleted, for the exclusive benefit of the big ones, that set off earlier, with much more resources, and possibly a huge side buffer, like a thriving services division with enough cash flow to sustain the tech division, in spite of any conflict of interest. Some may name this doping.

It’s the kind of doping that Mariana Mazzucato has been dealt with in her latest The Value of Everything: The ‘captains’ of this industry, just like their reference models, call value what is extract. They call themselves and their companies ‘innovators’ but the tech behind their gizmos come in one way or another from public funds.

In this respect, Kirti Vashee asked an intriguing question that would deserve much and better space than a blog post: “Why haven’t other large LSPs developed robust general use software?”

So, wondering whether NMT is just another hype or not is pointless, even when comparing the long history of disappointment with RBMT and SMT. NMT does make a difference, but it is still far from best human quality.

Also, for successful NMT, much more and much cleaner data is needed than with SMT, which already performs differently according to the cleanness and quality of data. To correctly deal with data, more than basic skills are required, starting from those for evaluation, which is a huge task. Finally, post-editing of NMT might not yield the expected productivity increase, especially with beginner or inexperienced translators turned into post-editors. This is due to the overkilling revising approach of newly trained translators and to the deeply rooted habit to contrastive analysis. On the other hand, all post-editing courses currently available follow a traditional revision methodology based on contrastive analysis, rather than starting from monolingual post-editing to take full advantage of the fluency of NMT and maximize the writing skills of would-be post-editors. And, in turn, this is due to the same old red-pen syndrome and to a parceled rather than holistic approach to translation quality assessment.

In the end, the NMT hype is only the latest in a long series of NLP-tech-related hypes reflecting the anxiety of overcoming the intrinsic limit of multilingual communication, trust. Indeed, the basic requirement for translation quality assessment is a solid grasp of the relevant language pair, and the lack of it makes translation just as unreliable as it has been for centuries, whatever the same old, self-styled pundits and trombones may say who do not even dare to admit they are Triste, solitario y final and scared to death.

The blockchain hype

Hypes are always sneaky as misleading, and they might be even more untrustworthy when addressing an industry about a technology that is alien to it and the audience is unable to perceive the exact value of this technology, thus distinguishing, which is even more important, the width of the marketing cap.

Praises of “technology as a means of transforming fragmentation to cohesion and community without losing diversity or control” are a posthumous endorsement of Baldur von Schirach’s infamous statement2.

Truth is, however, that this looks like nonsense, convincedly worded only to baffle, and the blockchain hype follows this very same track. In fact, there would be too much to clarify, starting from the enormous amount of power required to run a blockchain system.

Many blockchain questioners, for example, argue that private blockchains are just old technology masquerading as something new.

Don’t get fooled, then. Things are quite clearly a bit different from how they are depicted.

The first rule in business is: Never use your own money to launch and run your business, leverage other’s. That’s what venture capitalists are for, with all due respect to Mariana Mazzucato. And also, what ICOs are for.

In other words, rather than facing the risk of investing their own money, or listing the company, ICO proponents will have the crowd pay for their feat.

There is no real innovation in translation blockchains and no ‘token economy’ will be created that benefits the industry, at least no more than the fabled sharing economy have done so far.

As a matter of fact, even though blockchain advocates have been repeatedly and strongly insisting that the technology is independent of cryptocurrencies, all public blockchain projects, and the translation industry is no exception, have been leveraging an ICO. This is a most widespread financial instrument of funding through some cryptocurrency. Indeed, all ICO issuers—and blockchain companies—are actually cryptocurrency companies, selling a certain number of the virtual coins they create to finance their projects. Recently, speculations have been made about the possibility of ICO issuers liquidating holdings to avoid an increased regulatory scrutiny, after the SEC announced its first civil penalties against two cryptocurrency companies that didn’t register their ICOs as securities. In fact, ICOs can quickly attract huge investments, but they have often hidden scams.

The translation industry has maybe get accustomed to bullshitting over its short life, and the semantic wasteland associated with blockchain should come as no surprise, even though the emerging scenario looks even bleaker when peddled as innovation by people who are up to twist the long-established concept of token economy to their ends.

There might not be any Theranos on the horizon, and no Bad Blood will be written a decade later, nor another LangPie’s ICO scam (see also Slator’s first coverage of the ICO and, later, of the scam,) but you’d better be suspicious of anyone who routinely uses ‘blockchain,’ especially if they are trying to sell you something, or possibly inviting you to join their ICO. Or if they are trying to posture as an expert in an industry which realistically has no experts. Again, there has always been only one man in black.

Side effects

The general uncertainty behind blockchain has contributed to its bubbliness together with the escalation in the number of blockchain projects and the many exaggerations over the capabilities of the technology.

Also, the SEC initiative might be only the first of many other blockchain-related regulations that may cause unpredictable problems in the future.

Hyping is just like doping, with similar side and after effects: Hyping your idea to get funding while concealing your true progress and hoping that reality will eventually catch up to the hype always backfires in the end.

The translation industry seems fascinated by the idea of cutting corners via an ICO. If a tablet, some hosting space, a fancy domain name and a social media presence is just what it takes to start a translation business, a little more, mostly programming skills, is what it takes to exploit the Ethereum platform capabilities and set up just another translation blockchain and launch an ICO.

So, if you are thinking of joining an ICO, you will need to do a lot of homework first. Proponents of the ICO should prove to have done the same too. Unfortunately, especially in this industry, judging by their pitches, they have not. And use a grain of salt in front of enthusiast reports. Especially if they look like just copy & paste.

Blockchain or nothing

A question that many people interested in blockchain applications should ask is how to deal with EU’s GDPR basic requirements that says users must have control over their data.

Another question relating to the fundamental concept of trustworthiness should concern data accuracy. In fact, just because some data is in a blockchain this doesn’t mean it is accurate: Inaccurate data can still be validated in a blockchain. According to Victoria Lemieux, “The concept of trustworthiness—at least from an archival science perspective—goes far beyond what the blockchain can do, or even promises to do, in most cases.”

Finally, the reliability of blockchain should be assessed also through its typical application. So far, all cryptocurrencies have failed in their stated goal, i.e. to become usable currencies. Bitcoin, for example, began life as a techno-anarchist project to create an online version of cash, a way for people to transact without the possibility of interference from malicious governments or banks. A decade on, it is barely used for its intended purpose. Users must wrestle with complicated software and give up all the consumer protections they are used to. Few vendors accept it. Security is generally poor (according to one estimate, around 14% of the supply of big cryptocurrencies has been compromised). The decentralized nature of blockchain inevitably makes it slow: It has been reckoned that on any given day, there are tens of thousands of transactions that get put on hold waiting for the network to catch up to confirm them; current wait times range from a few minutes to several hours for a transaction to be approved on the blockchain. Other cryptocurrencies are used even less.

Is it all crap?

Blockchain technology is still in its infancy, and it won’t be ready to solve many of the problems it is meant to address for 5 years at least. The crux of the matter is that blockchain is not useful yet. In fact, a few organizations have abandoned their blockchain projects, concluding that the costs outweigh the benefits.

Blockchain advocates still believe though that they can help solve all sorts of problems. Those are big claims and blockchain advocates have yet to prove that blockchain can live up to the grand claims made for it. Many are made by cryptocurrency speculators, who hope that stoking excitement around blockchains will boost the value of their related cryptocurrency holdings.

Truth is that the advantages of blockchains are oversold. Because of the overhead involved in shuffling data between all participants, blockchains are less efficient than centralized databases, a problem that gets worse as the number of users rises.

However, just because blockchains have been overhyped this does not mean they are useless. In this respect, a blockchain infrastructure aimed at creating projects and platforms and not involving the creation of coins, like Hyperledger, could be a reliable alternative. Anyway, blockchain is no panacea against the usual dangers of large technology projects: cost, complexity and overcooked expectations.

Fixing

The ‘translation ecosystem’ (if any exists) will not be fixed with blockchain, NMT or any technology at hand, old or new. Fragmentation is due to the intrinsic nature of the translation business and to how easy it is to start one with virtually no capitals and specific skills or competences. This is the good, the bad and the ugly of this industry. And it is even taught in college today.

The knowledge gap is the hardest one to fill: The institutions that are supposed to be in charge of the task are secluded in their ivory towers in a perennial navel-gazing self-indulgence. Industry pundits are too interested and involved in finding new names for old tasks, issuing innovation awards or finding a new navel for their friends in academic and political circles to gaze.

The operational gap is almost entirely depending on the same century-old models that no one seems willing to just challenge. “The difficulty lies not so much in developing new ideas, but in escaping from the old ones, which ramify, for those brought up as most of us have been, into every corner of our minds.”

So, the same old sermons are given every so often on the same old topics by the same old preachers.

The pressures that are alleged to fragment the translation industry cannot be removed by welcoming the usual suspects, politicians and industry ‘entrepreneurs,’ endorsing the nonsense they peddle as innovation and basing a business on it to access and cash the funds that are generously and guiltily granted to any project that feeds the wishful thinking of being at the forefront of anything or at least to keep up while the fork between the 1% and all the others widens.

I stuck around St. Petersburg
When I saw it was a time for a change


1 Follow the man who seeks the truth; run from the man who has found it.

2 Wenn ich Kultur höre … entsichere ich meinen Browning! (Whenever I hear [the word] ‘culture’… I remove the safety from my Browning!)