An international research firm forecasts that the global machine translation market has the potential to grow by one billion US dollars during the period 2020–2024 (technavio.com 2019). Yet recent studies also indicate that despite the growing adoption of machine translation in workflow processes in industrial settings, many professional translators, freelancers in particular, still harbour negative feelings about machine translation (Fung 2018) and tend to resist and/or reject the technology (Sakamoto 2019, Moorkens 2020).
It is perhaps not unexpected that most practising translators are not software programmers or computer scientists. Translators generally do not possess in-depth knowledge of intricate high-level computer programming. Therefore, it is not surprising that the regular translator may view machine translation with some degree of confusion and/or suspicion, oftentimes due to misconceptions of what machine translation is and how it works. For some translators, there is also this pervasive sense of resentment, dismay and dread, stemming (rightly or wrongly) from a perceived threat that machine translation is capable of replacing them and depriving them of their livelihood.
That said, I personally believe that technology is the way of our future, and machine translation will increasingly become a significant part of translators’ professional lives. We would perhaps be well advised to future-proof our careers by making the effort to understand enough about machine translation so that we are able to adapt and capitalise on what such technology can offer. To this end, this short article touches very briefly on the basics of machine translation: its origins and the main types of systems in use nowadays.
At the risk of oversimplification, machine translation fundamentally works through algorithms – complex sets of cascading commands crafted by humans – instructing the computer what step(s) to execute when presented with different linguistic scenarios. The phenomenon is not exactly new, as operational machine translation has been around since the mid-1900s. At the start, machine translation was not much more than automated matching of a word in one language (the input) with a word of similar meaning in another language (the output), akin to a computerised search-and-consult of two monolingual dictionaries, and then stringing the output words together.
Nonetheless, the approach demonstrated the technical feasibility of machine translation, and triggered a period of intense research and development lasting more than a decade in the Western world. The aspiration at the time was to quickly develop fully automatic processes capable of producing human quality translation. However, as the pursuit of this lofty goal did not yield very significant results during the intervening period, large-scale funding was subsequently withdrawn and machine translation R&D receded into the background.
It was only in the mid-1970s that interest in machine translation started anew due largely to globalisation and the ensuing demand for multilingual communication, which human translation alone could not satisfy. Then, in the 1980s, the arrival of more powerful computing speed and memory capacity, the increasingly available and affordable personal computer, and the rapid growth of the Internet and the World Wide Web became the jet fuel that helped to propel the machine translation renaissance further and higher.
In more recent times, machine translation system design may be broadly categorised into rule-based and corpus-based paradigms. As the names imply, the former group covers systems that are guided by linguistic rules. For each language pair of interest, an immense hierarchy of rules are written to govern how automatic conversions between those two languages proceed. Since these linguistic rules are specific to the particular language pair, extending rule-based machine translation to cover each additional (pair of) language(s) necessitates drawing up a whole new set of rules. Not only are such endeavours extremely labour and resource-intensive, rule-based systems can also produce stilted and awkward output. However, a key advantage of such systems is that the conversion processes are well grounded in a robust framework of linguistic knowledge.
In contrast, the latter group consists of systems that rely on massive corpora of aligned source language and target language segments; in other words, existing translations. For many years, the dominant corpus-based model was statistical machine translation, where the conceptual platform is to obtain an output with a particular probability at a stated confidence level by using the frequencies of words or phrases occurring within a defined vocabulary or corpus. In statistical terms, the higher the probability and confidence level, the ‘better’ an outcome is.
In the last four years or so, though, neural machine translation is hailed by many as the new state of the art. The keystone of neural machine translation is the neural network: a self-learning technique that draws on a number of inputs to predict outputs, functioning like a predictive text completion device, based on given corpora. Although corpus-based systems tend to yield fluent and stylistically natural output since they are based on existing authentic translations, the most frequent translation in a corpus may not always be the most correct or appropriate translation for a given context. Moreover, the success of corpus-based systems depends entirely on having suitable and contamination-free corpora, which are often not readily available.
These brief system descriptions by no means purport to be a crash-course in machine translation. Neither do I wish to trivialise the painstaking efforts many have contributed and are contributing towards researching, developing and improving machine translation systems. This synopsis aims to provide a glimpse of what is behind machine translation – because, even to frequent users, this is often a ‘black box’ operation. Offering some idea, however rudimentary, of the relative strengths and weaknesses of different popular designs that give rise to different types of errors needing human correction may, hopefully, plant a seed of consideration and reconciliation among machine translation detractors.
Without a doubt machine translation quality has improved vastly over the years, so much so that there was a hyperbolic claim recently of achieving parity with human translation, albeit for just one language pair and within one specific genre (Hassan et al. 2018). Wider reproducibility of such results remains very much to be seen. For now, and for the foreseeable future, my conviction is that machine translation output still needs human intervention to ensure its quality. The proverbial ball is therefore in the court of those practitioners who are willing and able to play the game of interacting meaningfully and cooperating with the technology, so as to stake a (small) claim in the billion-dollar machine translation opportunity just around the corner.
Fung, Yuen May. 2018. Post-Editing in the Wild: An Empirical Study of Chinese-to-English Professional Translators in New Zealand. PhD thesis, School of Cultures, Languages and Linguistics, The University of Auckland, Auckland.
Hassan, Hany, et al. 2018. Achieving Human Parity on Automatic Chinese to English News Translation. Microsoft AI & Research. Available https://www.microsoft.com/en-us/research/uploads/prod/2018/03/final-achieving-human.pdf (accessed May 2020)
Moorkens, Joss. 2020. “Comparative Satisfaction among Freelance and Directly-Employed Irish-Language Translators”. Translation & Interpreting 12(1): 55-73.
Sakamoto, Akiko. 2019. Why do Many Translators Resist Post-editing? A Sociological Analysis using Bourdieu’s Concepts. The Journal of Specialised Translation 31 (January 2019): 201-216.
technavio.com. 2019. Machine Translation Market by Application and Geography – Forecast and Analysis 2020-2024. SKU: IRTNTR40205. Snapshot available https://www.technavio.com/report/machine-translation-market-industry-analysis (accessed May 2020)
By Yuen May Fung