incidence of computers in translation ranges from the rather unsuccessful
attempts to attain the so-called Fully Automatic High Quality Machine
Translation to the current widespread usage of translation memories. Gone are
the years of vast government
spending in research attempting
to “help” computers do the translators job, and
spending comes now from individuals who invest in expensive software in the
hope of having back machines helping the translator.
Are there any better ways than the ones available to share that
spending and its benefits?
Glory and Shame of Machine Translation
The idea of trying to make numbers talk like words is an old one. While thinkers like Leibniz already devised a mathematical system of language representation and translation as early as in late 17th century, and even Descartes sketched out what he called a “universal language” in form of mathematical expressions, we can go back as far as 1661 to trace one of the first fully developed attempts to work out a mathematical model for translation.
the precocious chemist, explorer and mathematician, Johannes Becher produced a numeric system that was allegedly able to
translate from Latin into German and postulated a generic mechanism that could
be extended to all vernacular languages. It consisted of some 10,000 words,
designated by a number, and it used additional numeric values for endings and
cases, together with some basic equations. By entering “word” values into the
calculations, new numbers would come out that could be checked against a new list
in German, eventually returning a translation of the original (Freigang 2001).
It could be said, thus, that the concept of computer assisted translation, or even, automated translation, dates back several centuries before the appearance of computers. Should Becher have been able to use a computer or a calculating machine, no one would doubt to qualify his invention like the first attempt to develop an automated translation system. And there were other similar attempts, curiously enough, very close in time, like the one by Athanasius Kircher in 1663, or an even earlier one by Cave Beck in 1657 (Hutchins 1986, 2:1).
The idea of such
“mechanical dictionaries” experienced a revival in the early 20th century, with
the “Mechanical Brain”
the French engineer Georges Artsruni, or the
invention by the Russian Petr Trojanskij,
which were the first truly mechanical translation devices (Freigang
The heyday of many other subsequent attempts started with a famous
in order to attain the so called “Fully Automated
High Quality Machine Translation.”
final report of the Automatic Language Processing Advisory Committee (ALPAC) is almost as famous
nowadays: machine translation is mostly
useless without human intervention in the form of editing or rewriting.
further attempts and approaches would provide new insights to the complexity of
the machine translation question, like the initiative in the former European
Community, called Eurotra, which not only retrieved
Descartes of developing what is called
an “interlingua,” or intermediate metalanguage, but
also provided richer analytical developments while establishing the bas es
for current computer
The idea of
developing an input-controlled translation method is very much associated with
the Canadian system for bilingual weather reports, Méteo,
which is still working
approach, which is effectively working also
for many multinational companies in the production of their internal
multilingual paperwork, memorandums and manuals, can be very well summarised by
outlining the features of the project called KANT, for “Knowledge-based
Accurate Natural-language Translation.”
by carefully controlling the input quality of the source text. Developed by
sense of R. Jakobson
(Jakobson 2000) what takes place first: the
original is conventionally translated into another simplified “original.”
Automated translation becomes more of a by-product rather than a real
science-fiction high-brow automated translation projects more or less at a
halt, down to earth translation professionals
benefit ing from the advantages of computers.
From MT to TM
here to speculate about the pendulum-like movement that may articulate the
relationship between translation memories and machine translation which goes
beyond a simple swap of capital initials
although it may very well have to do with the swapping of full
Translation memories (TM) may be defined as set of software applications devised to help translators in their activity by retrieving already translated terms or segments and recycling them, or by building up tentative translations from previously translated segments that share common traits. Those perfectly duplicable segments are called “perfect matches.” Those tentative translations generated from analogous segments are called “fuzzy matches.”
aside the particular mechanics of different software, there are more than a
dozen different Translation Memories in the market, ranging in price from the
memories are optimal tools for texts
highly repetitive, belong to a larger corpus of specialized texts
to be translated, present a wide specialized terminology pool and belong to
multilingual localization projects. They help to guarantee a high degree of
terminological consistency, ease massive revision processes, speed up
productivity in large localization projects and efficiently cumulate
is easy to anticipate that they do not deal well with “stylistically rich”
originals and that they impose a segment-restricted optics instead of
general-text approaches. The so-called “
matches” may induce disastrous context-related misinterpretations. Furthermore,
there has been a traditional problem of low compatibility between different TM
software and, in most cases they involve an expensive investment for translators
that may need to face too diverse customer requirements.
Let us focus on the last two problems: how can the information contained in a translation memory be shared between users of different software? Why can that be useful and when could it be desirable?
Large localization projects are often undertaken by teams of translators who are required to use the same software. Their already translated segments are uploaded into a common repository that subsequently provides possible perfect of fuzzy matches not only to the one translator that uploaded them, but also to the other members of the translation team.
the advantages of sharing one’s work with other project partners, by means of
increasing the size of the commonly
Finally, all translators must use the same software and versions. Thus, a professional may end up being excluded from a project because it may not be worth for him or her to invest in that particular new software that may be needed exclusively for a specific project. Even thinking of it as an investment in the long run, by when he or she may need the same software for a new project, new incompatible versions of the program may have been released.
Among the above problems, some are strictly work-flow related —which will not be discussed here— and some others are good-old translation problems. Finally, some other problems related with software standards
TMX and New Paradigms in File Sharing
Translation Memory eXchange language (TMX) is a SGML/XML-based markup language —which involves a fairly easy and compatible Internet implementation. It is a standard established by LISA (Localization Industry Standards Association —www.lisa.org—) that is being increasingly integrated by translation memory makers within the export/import capabilities of their latest versions. There are several levels of compliance with the TMX norm, ranking 1 to 3 depending on the amount of meta-data aside from purely textual information which the system is able to convert into TMX. It becomes a powerful exchange tool when combined with TBX (TermBase eXchange Language), which is its counterpart by means of exchanging terminological database contents. Ultimately, by using TMX, translators would not have to use the same TM software in order to co-participate in the same localization project.
Essentially, TMX works as a text-only based mark-up language into which aligned text —original and its translation(s)—is exported from a translation memory . No matter which TM software is being used, as long as it furnishes TMX import/export capabilities, the resulting tagged text-only file could be “read” by any other TM that effectively participates of the same capabilities, no matter what particular internal codification system it uses to store the information.
This is a
—very much— general picture of how far things have evolved
these days. How further can they go is still questionable but here
follows a speculation on the potential of TMX when
combined with currently existing possibilities and software already running on
the Internet. What will be said from now on, however speculative, is not simple
science-fiction and, should technical and human means be provided, an
interesting field of theoretical research and practical application may unfold
paradigms in Internet file sharing must be considered here. In the late 1990
a new way of sharing information and files shook the music industry and pushed
it to the fringe of bankruptcy in some cases. Programs like Napster, Gnutella, Kazaa and others, allow users to share
their files them
exchange freely. Several national branches
of large music companies were forced to close or to deeply restructure their
business philosophies because of the economic breakdown inflicted by
peer-to-peer Internet music sharing. As a result, a ruling of the Supreme Court
in 2001 closed Napter’s web page and all its
activities. This involved one of the most echoed direct interventions of the
administration on the actual practices that take place in the Internet. But it
is not music or even major financial consequences what may be interesting in
regard to translation memories: it is instead the fact that a network of
independent users may share their files so easily which becomes of importance
a program like Napster works as follows: a user sets a series of music files in
his computer within an especial “share” folder. The program sends the list of
filenames (song titles) to the server, which indexes it. Then the user sends a
query about any song he
in a one-to-one
basis. The bulge of data (the comparably huge music file) is only transmitted
in the final stage. All what happens before that is just listings of short
textual units (song titles) going to and fro.
Gnutella system works in a slightly different way: it is more of a
and the provider get in touch directly, without a “middleman” this time, and
the file is transmitted, again on a one-to-one basis.
The question arising from this seems both obvious and compelling: Can a peer-to-peer exchange system be developed for translation memory sharing?
of TMX as a unifying standard would provide common
grounds for the exchange. Once a translation project is finished, translators
usually return their final version to their client, while they usually hold the
resulting translation memory as a by-product of their work. A program would convert the contents of those
memories into TMX-tagged multilingual text and an
“exchanger” would expose the memories to the World Wide Web by placing them it
into a share area open to public access.
The repetition of this action by several users would create a dense and sprawling network of interconnected computers, as it happens with Napster and Gnutella, which could potentially become the largest pool of aligned text (translations and originals) ever.
translation project starts, users would connect to the network and their
“memory exchanger” would launch queries for similar segments to the bulge of participants.
Slowly, in a way similar to that of the basic translation memories themselves,
pre-translated replies would travel back to the
some in form of perfect matches, most of them in form of fuzzy matches. This
would result in a pre-translated draft, whose production perhaps could require
the computer to be left working overnight (depending on factors such as length,
actual degree of matches found, level of requirements set by the user etc…).
There are of course many questions arising from this, most of them far beyond the scope of this paper. And not a few immediate drawbacks. To start with, all previous drawbacks from conventional non peer-to-peer sharing would be still there, unsolved. But also, there would be higher risks of potentially wrong translations from anonymous partners: sharper criticism on received equivalences will be needed, making thus the revision process even more demanding. The obviously wider range of topic variety would add to confusion and metadata describing the thematic adscription of segments would be indispensable in order for the machine to “trust” one potential translation or the other. Bandwidth requirements would be unknown. There would also be legal and copyright issues on translated text —as such— versus equivalent segments whose ownership is determined differently depending on national legislations.
With technical, legal and translational problems ahead, the possibility to implement some peer-to-peer device for translation memory sharing appears both as a challenging enterprise and as a promising area of research. As one good old friend of mine says “machines don’t have intuition, but they have memories” (Fustuegueres 2001, my translation). I would add, maybe we can help them sharing.
 See Hutchins (Hutchins 1996) for an enlightening description of the most frequent misinterpretations and misleading circumstances related to the ALPAC report.
 Aligned texts are the main asset of a translation memory, and many resources are usually devoted by companies and institutions to align texts that had been translated before the implementation of TM software in order to enhance the production of subsequent translation activity
Abaitua, Joseba. “TMX format,” 1998,
Davis, Paul C. Stone Soup Translation: The Linked Automata
Model. Doctoral dissertation.
Freigang, Karl Heinz. “Automation of Translation: Past, Presence, and Future” in Revista Tradumatica No. 0 (2001), http://www.fti.uab.es/tradumatica/revista/num0/sumari/sumari.htm
Gow, Francie. Metrics for evaluating Translation Memory Software. Unpublished Thesis.
John. “The precursos and the
pioneers,” Machine Translation, past,
present and future.
- “ALPAC: the
MT News International Vol. 14, June
1996, pp. 9-12.
Roman. “On Linguistic Aspects of Translation.” In Baker, M., and Venuti, L.k Eds. The
Translation Studies Reader, 113-118.
Nyberg, Eric; Mitamura, Teruko. “The Kant system:
fast, accurate, high-quality, translation in practical domains,” Proceeds of Coling
Sanchez-Gijon, Pilar. “Cataleg de sistemes de memories de traduccio,” Revista Tradumatica, No. 0 (2001).
SDL International. An Introduction to Computer Aided-Translation, http://www.sdl.com/products and http://tc.eserver.org/18490.html
Several authors. “CAT fight,” Proz, The Translators Workplace, http://www.proz.com/?sp=cat/compare
Silvia Fustegueres. “Qui te por de les memories de traduccio?” Revista Tradumatica, No. 0 (2001).
Zerfass, Angelika. “Evaluating Translation Memory Systems,” First International Workshop on Language Resources for Translation Work and Research, Gran Canaria, 2002.