Choose your own translation future
- Created on 14 January 2013
- Written by Jaap van der Meer
Technology arrived late in the translation services sector. Now it has arrived, it is bound to change everything. In the not-too-distant future everyone in the world will be able to speak his or her own language and everyone else will understand. We are entering the Convergence era: translation will be a utility embedded in every app, device, sign board and screen.
Businesses will prosper by finding new customers in new markets. Governments and citizens will connect and communicate easily. Consumers will become world-wise, talking to everyone everywhere as if language barriers never existed.
Don’t get me wrong. It will not be perfect, but it will open doors and break down barriers. And it will give a boost to the translation industry, which will be chartered to constantly improve the technology and fill the gaps in global communications.
Is this picture too rosy? Not if you believe in the power of translation data, like we do. Translation data is the fuel of machine translation technology. Data powers the engines. The engines may never emulate the human language competence, but they will be good enough to help us converse as we see fit in languages we never spoke before or will ever speak. Machine translation, according to Nicholas Ostler, will be the new lingua franca.
This is a vision that frightens many insiders in the translation industry. Machine translation was experimented, tried and tested for a long time, but it never passed the test of usefulness. Automation of translation was believed to be a utopia, at least until the vox populi revolution spoke and millions of people started clicking on the automatic translate button in their search pages. But no matter whether the quality is often bad and laughable, people simply like the fact that it is under their control and in real-time. It’s a sign of the times. Users take charge and drive change.
Entering the Convergence era
As the evolution diagram below shows, the translation industry has undergone a paradigm shift every decade since 1980, but none was as big as the one we are facing now: the Convergence.
The volume of content is exploding to zettabytes (trillions of gigabytes) of information that can be relevant to billions of new users who click to translate as much as they like. While we make this journey from the 20th century export mentality to the 21st century’s open global society, the mix of language pairs will be shifting from today’s 7 source and 60 target languages to 200 source and 200 target languages in the next ten years. It is utterly clear that a human-driven translation process alone will not suffice in this new era.
In the current phase—the Integration era—enterprises and institutions are busy releasing the translation function from its isolated position. The focus is on integrating translation in enterprise applications such as content management systems. This will help organizations to scale up and translate a lot more than just the usual documents, instructions, brochures and software.
But the pressure will keep building to translate more and more content faster or even translate it in real-time. This opens up tremendous opportunities for innovators to seize the convergence instrument and offer solutions that did not exist before. (See the Agents of Change: Insiders and Invaders videos.)
Two types of convergence
We highlight two interconnected forms of convergence: pure technology convergence and functional convergence. Technology convergence means combining two or more technologies to create a new compelling product or service offering. Functional convergence means combining functions to create a new solution.
The best example of technology and functional convergence in our daily life is the mobile phone. It has now become a camera, a PDA, a navigation tool, and so much more with thousands of new apps being developed to turn this simple handheld device (called a ‘handy’ in German language) into a life-saving and indispensable kind of extension to our body.
In the physical world, the emergence of supermarkets was a form of convergence. The combined offering of coffee and music by Starbucks is also a good example of convergence. In the digital world functional convergence often has a give-and-take dimension: the user becomes part of the supply chain. Examples of this are restaurant review websites where users are requested to give ratings and share reviews of the restaurants. The service is free. The owner of the site makes money through advertisements.
More innovative examples of functional convergence are location-based apps (another form of localization). The user—often without knowing it—transmits his or her exact location and receives perfectly matching offers from a shop or restaurant in the neighborhood or an invitation to meet a friend who happens to be walking on the same street.
Convergence in the translation industry
We can start imagining what convergence can mean to the translation industry. In fact convergence has already started across technologies and across functions. We have seen the first demonstrations of the integration of speech and machine translation technology. Imagine what happens if the technologists get this to work really well. Using tiny little keys on your mobile will no longer be necessary: speech input in one language, and speech output in another language.
Of course, the best example of functional convergence in the translation industry is the combination of automatic translation with search. This innovation fired off the vox populi revolution I mentioned above. Millions of end-users started clicking every day to use real-time translations. They don’t pay unless you want to call their viewing of advertisements on search pages a form of payment. The owners of search engines decided to extend the service to professional translators.
The business model convergence went a step further: for sharing the translation data (translation memories) the industry professionals received customized (improved quality) machine translations. Another recent example of functional convergence in the translation sector is Duolingo: an online gamified language training site. It’s free, but users are helping to translate sentences according to their skill level. This way they return a service while at the same time feeding translation data which helps to improve the platform.
In the next ten years we will see numerous new examples of converging functions and technologies. Sometimes this convergence will address just one language pair, domain or market niche. Sometimes it will be applicable on a much wider scale. Together, this convergence is changing the translation industry completely. Translation will quickly become a utility embedded in everything we do. It will be as ubiquitous as electricity and the Internet.
More and more, it will be considered a basic necessity for human kind. Language communities not connected yet through this translation utility will make a special effort to become part of it by aggregating the required translation data and sharing it. This is what we call the Viral Effect. It causes the acceleration in the spread of language pairs and domains and the continuous performance improvement of the translation utility.
Crowd, Cloud and Big Data
Other trends that play along in the Convergence era are Crowd, Cloud and Big Data. The Crowd is part and parcel of functional convergence. Duolingo needs hundreds of thousands of users to make the platform really work well, because it is the voting on the best translations that will improve the overall performance of the system. The Cloud is the natural infrastructure environment to connect with the Crowd and to reach the required scalability and efficiency.
Many innovative translation solutions will be characterized as SaaS (Software-as-a-Service), DaaS (Data-as-a-Service), IaaS (Infrastructure-as-a-Service) and PaaS (Platform-as-a-Service), all variances of Cloud-based solutions. But behind the Crowd and the Cloud is the secret power of Big Data—the biggest trend of all. When IBM’s Watson beat the best Jeopardy players in 2011, it was a milestone event in natural language processing. It proves that the computer can decipher ambiguity, understand jokes and metaphors, as long as it is fed enough data.
The importance of Big Data for the translation industry should not be underestimated. Big Data will push the performance of automated translation forward. Big Data will address challenges in many different areas of natural language processing, including machine translation. The computer will be able to run automatic semantic clustering and genre identification processes, meaning that the computer will recognize the industry domain (for instance: medical and radiology) and the type of content (for instance: instruction text or patent application).
This is vital for the continuous improvement and customization of machine translation technology. Big Data technologies become crucial since the modern machine translation systems involve more and more parallel data and it is reaching the limit when it is not feasible to process large amounts of data with traditional database management techniques. The computer will also be able to do terminology mining much better if it gets more data.
It will identify synonyms, related terms, neologisms, jargon and automatically generate syntactic classification using parallel processing tools. Plain statistical translation models evolve into hybrid models with hierarchical (syntax- or alignment-based) trees allowing the machine translation engines to do long-range reordering, creating more fluent and correct translations especially for more distant language pairs.
Translation support matching the new content mix
In the Convergence era, the mix of content to be translated is shifting further away from documents and software releases to bits and pieces of text, voice and video published on multiple screens. The end-user, citizen or patient will be in control—even more than today—and they will drive a continuous stream of translation of official (corporate, public, legislation), social, shared, earned and also private information.
Translation memory software fits very well with updates of static documentation pushed by publishers but it will not be very helpful when translating dynamic content pulled by users. Machine translation technology will mature quickly and take over as the primary choice of tools to be used by the translation service sector. New features will be added to MT platforms allowing professional users to add data (customer-specific or product-specific translation memories, glossaries and target language texts) that will train and customize the engine almost in real-time.
This self-service real-time training of MT engines may be applied to every single job. Personalization of MT is a far jump from the costly and lengthy process of MT development for a generic language pair that we were used to. It will drive the need for translation memory data to be bigger and bigger. For every new job translators will be looking for matching data to fine-tune the engine. The need for data will be insatiable.
So, where does that leave the entrepreneurs in the translation industry—the buyers and providers of translation?
Planning for an uncertain future
In 2010 TAUS organized a series of brainstorming sessions following the scenario-based planning methodology with translation buyer and provider executives in Copenhagen and Portland (OR) with the aim of planning for an uncertain future to minimize crisis-driven change and instead pursue opportunity-driven change. The participants agreed that certain drivers were indisputable (content explosion, the shift to multimedia and mobile media, and the trend to real-time delivery), but they were uncertain about the answers to three questions:
1. Will machine translation take a big role in the translation industry or not?
2. Do we have to fear that translation will become a free-for-all service?
3. Will the closed (competitive) or the open (collaborative) business models prevail?
Two of the three questions have been answered in the last couple of years. Yes, MT will play a major role in the translation industry. No, translation will not be free. There is a lot of elasticity in translation pricing, but somehow users always pay for translation. However, the third question is still haunting us. We have not seen clear indicators yet whether closed models or open models will prevail. Both seem to function very well.
Open or closed translation futures
The future of the translation industry could be closed (as it is more or less today) or it could be open and collaborative. In the closed translation future scenario, a few companies will have aggregated all of the world’s translation data that facilitate and support fast and efficient translation of the world’s information in 40,000 or more language pairs. Large and small translation operators—including corporate buyers, governments and institutions—will be dependent on the few data owners to keep their translation engines tuned for every job.
In today’s translation world, the translation memories they own or manage for their customers may be sufficient to keep their translation operations running efficiently. But in the Convergence era, it will be harder to predict which content and in what domain or language pair needs translation. New data will always be needed to make new translations possible.
In the open translation future scenario, data is shared in collaborative platforms. All translation operators have access to the data on an equal basis and may use the data to leverage and to develop derivative work, i.e., new machine translation engines. In the open translation future scenario, industry stakeholders agree on common interfaces to connect content, technologies and platforms to ensure a frictionless exchange of translation jobs and data. In the open translation future scenario, industry stakeholders agree on common metrics and benchmarking to measure and compare the performance of automatic translation engines and to track progress.
Both scenarios could be true. It is hard to tell today which one has a better chance to win out over the other. In both scenarios we see opportunities for growth. But unless you have a fairly good chance to own all the data that you possibly need in your translation future, your growth opportunities will be much greater in the open translation future scenario.
Fork in the road
In the coming two years—more than ever before—translation buyers and providers will have a decision to make to open or not to open; to collaborate and share or not to collaborate and share. Rather than being taken by surprise, it will be wise to take a conscious decision about your own translation future. We are at a fork in the road. Going one way or the other can make a big difference for the success and growth of your business.
Choosing the open translation future scenario means openly sharing your translation memories and convincing your customers and collaborators to do the same. Translation data—other than translation memories—cannot be easily used to reconstruct the original individual source or target language documents. We should look at translation data as data in the same way as the medical industry treats human genome data.
Every life sciences company, every university—in fact, everyone in the world—has access to the descriptions of the 1.3 billion chemical base pairs that constitute human DNA. Every company can use human DNA data to develop new medicines and new technologies. This is what stimulates innovation, growth and helps human civilization. Of course, if you choose to share your translation data, you are free not to share confidential data or non-released product information.
Choosing the open translation future scenario means collaborating in translation quality benchmarking and industry metrics. In today’s translation world, every operator has its own way of evaluating translation quality. We have no way to compare and benchmark quality with peers in the industry. To scale up and prepare for growth in the Convergence era, we need to be able to measure the performance of MT engines, as well as track and compare their progress across domains, language pairs and content types.
We need to be able to establish best practices—on an industry-wide scale—when and when not to use MT technology. We need to have industry agreement on acceptable scores, ratings and evaluation techniques. If we don’t have this, it will be harder to meet market expectations and scale up.
Force and counterforce
Finally, if the translation future still looks frightening to you, relax, because every force has a counterforce. The ubiquitous availability of non-perfect automated translation will also lead to growth in the need for high quality (non-automated) translation, transcreation and personalization where old-fashioned human language skills are unbeatable.
The future of translation looks good. It is your choice where you want to be.
References and further reading
- This article summarizes some of the analyses in the TAUS Translation Technology Landscape report (70 pages). The full report will be available end January 2013.
- See videos of the Agents of Change: Insiders and Invaders sessions at the TAUS User Conference 2012 in Seattle.
- TAUS Planning for an Uncertain Future, report available from TAUS web site, October 2010.
- TAUS Dynamic Quality Framework and benchmarking platform. See knowledge base and tools on the TAUS Labs web site.
- MT as the new Lingua Franca, a review of the book “The last Lingua Franca” by Nicholas Ostler, article on TAUS web site.
- Clarifying Copyright on Translation Data, article on TAUS web site.