According to a new study, the human parity allegedly achieved in machine translation is ascribable to flaws in evaluation protocols. The proof, apparently, is that “professional translators showed a significant preference for human translation, while non-expert raters did not”. According to the researchers, although human evaluation remains the “gold standard”, this should be run by evaluating documents, not sentences. Indeed, current best practices suggest using a source-based direct assessment with non-expert annotators, data sets and the evaluation protocol of the Conference on Machine Translation (WMT), while updates are necessary to […]
Related Posts