Abstract

The incidence practice of computers in translation *** Vague: “computers in translation.” *** ranges from the rather unsuccessful attempts to attain the so-called Fully Automatic High Quality Machine Translation to the current widespread usage of translation memories. Gone are the years of vast government research-spending, in research attempting to “help” computers do the translators job,; and spending comes now comes from individuals who invest in expensive software in the hopes of having back creating machines that helping the translator. Are there any better ways than the ones available to share that spending and its benefits? *** You may consider distinguishing what type of “benefits” you mean ***

 

The Glory and The Shame of Machine Translation

The idea of trying to make numbers talk like words is an old one.1 While thinkers like Leibniz had already devised a mathematical system of language representation and translation as early as in the late 17th century *** A citation is required here. ***, and even Descartes had sketched out what he called a “universal language” in form of mathematical expressions *** Citation required ***, we can go back as far as to 1661 to trace one of the first fully developed attempts to work out a mathematical model for translation.

 

In  Tthat year, the precocious chemist, explorer and mathematician, Johannes Becher produced a numeric system that was allegedly able to translate from Latin into German, and postulated a generic mechanism that could be extended to all vernacular languages. *** It is unclear as to whether Becher or the numeric system did the “postulating.” *** It consisted of some 10,000 words, *** You may want to include some of these words as an example. *** designated by a number, *** One number for each word? It is unclear. *** and it used additional numeric values for endings and cases, together with some basic equations. *** Examples of some equations? *** By entering “word” values into the calculations, new numbers would come out that could be checked against a new list in German, eventually returning a translation of the original (Freigang 2001).

 

It could be said, thus, that Tthe concept practice of computer assisted translation, or even, automated translation, dates back several centuries before the appearance of computers. Should Becher have been able to use a computer or a calculating machine, no one would doubt to qualify his invention like the first attempt to develop an automated translation system. *** You would most certainly need a citation here to back up your opinion that no one would doubt Becher. *** And Moreover, there were other similar attempts, curiously enough, very close in time, like the one by Athanasius Kircher in 1663, or an even earlier one by Cave Beck in 1657 *** These examples may require an endnote rather than a citation. *** (Hutchins 1986, 2:1).

 

The idea of  Ssuch mechanical dictionaries experienced a brief revival in the early 20th century, with the “Mechanical Brain,, by the French engineer Georges Artsruni, or and the invention by the Russian Petr Trojanskij,. These which were the first truly mechanical translation devices (Freigang 2001). *** You seem to give more credit to the attempts at translation than to the actual models. You don’t give enough credit to the “Mechanical Brain.” Please extend this paragraph. ***

 

The heyday of many other subsequent attempts started with a famous —or infamous— memorandum addressed to the Rockefeller Foundation in 1949 by Warren Weaver. His well-known mathematical model of communication, developed together with Claude Shannon, would consolidate *** Not sure if “consolidate” (to merge, to strengthen) is the correct word to use, here. *** the idea of translation as a mere *** How is the question a “mere” one? *** question of “breaking the code.and This model would initiate two decades of frantic activities and huge investment in order to attain the so called “Fully Automated High Quality Machine Translation.”

 

The final report of the Automatic Language Processing Advisory Committee (ALPAC) *** You have this in your reference page, but this is not a proper citation. *** is almost as famous —or as infamous writes [1]1: the year 1966 meant the end of government spending for research on machine translation and the establishment of a certainty that lasts until nowadays: machine translation is mostly useless without human intervention in the form of editing or rewriting. *** Are the words after the colon a quote or paraphrasing? This paragraph is unclear and not cited properly. ***

 

However, further attempts and approaches would provide new insights to the complexity of the machine translation question, like the initiative in by the former European Community, called in creating Eurotra, *** Endnote? *** which not only retrieved borrowed the original idea by Descartes of developing what is called an “interlingua,” or intermediate meta-language, but also provided richer analytical developments *** Example? *** while establishing the bases for current computer assisted translation techniques.

 

The idea of developing Aan input-controlled translation method is very much associated with similar to Meteo, the current Canadian system for bilingual weather reports, Méteo, which is still working nowadays. This approach, which is effectively working also works for many multinational companies in the production of their internal multilingual paperwork, memorandums and manuals, can be very well summarised by outlining the features of the project called KANT, by a project for “Knowledge-based Accurate Natural-language Translation.,or KANT.

 

KANT works by carefully controlling the input quality of the source text. Developed by Carnegie Mellon University, the system KANT monitors ambiguities in the original text, returns to the “writer” inquirer those segments considered incorrect by the machine’s internal grammar and only when the text is considered “understandable” by the machine, automated translation takes place *** It is still unclear, up until this point, what exactly is being translated. *** (Nyberg and Mitamura 1992). However, it is a fully human intralingual translation, in that sense of what R. Jakobson (113-118) what takes place first recognizes as: the original is [being] conventionally translated into another simplified “original.” Automated translation becomes more of a by-product rather than a real translation.

 

But with science-fiction high-brow, automated translation projects more or less at a halt, *** Why are they coming to a halt? *** down to earth translation professionals did start have started benefiting from the advantages of modern-day computers. Computer Assisted Translation *** CAT? Is this not the title of your article? It should be recognized here, now. *** is “the broadest term used to describe an area of computer technology applications that automates or assists the act of translating text from one language to another” (SDL International). The list of contributions of Ccomputer technologies that conform attributed to this definition is not short  application include, but are not limited to: word processors, electronic dictionaries, terminological data banks, BBS and discussion groups, optical character recognition, spell and grammar check, Ee-mail, WWW documentation, desktop publishing, speech recognition, specific localization tools, and translation memories. *** It seems that the former list contains programs and systems that would be placed better at the introduction of your document—again, adding more focus to it. ***

 

From MT to TM

I intend here to speculate about the pendulum-like movement that may articulate the relationship between translation memories and machine translation which goes beyond a simple swap of capital initials —from MT to TM—, although it may very well have to do with the swapping of full translated sentences. *** Also, watch your point-of-view. You suddenly jump from third person to first person. Be consistent. ***

 

Translation memories (TM) may be defined as a set of software applications devised to help translators in their activity by retrieving already translated terms or segments and recycling them, or by building up tentative translations from previously translated segments that share common traits. Those perfectly duplicable segments are called “perfect matches.” Those tentative translations generated from analogous segments are called “fuzzy matches.”*** There should be examples or citations to support your points. *** 2

 

Leaving aside the particular mechanics of different software, Tthere are more than a dozen different Translation Memories oin the market, ranging in price from the twenty dollar amateurish from the Alair II” to the highly professional, corporate and expensive 5000 dollar high ended “Alchemy Catalyst,” or other suites likeTrados,” a de facto standard, or its competitors  Déjà Vu,” “SDLX” or “Transit.. *** An endnote may be required for readers who may want to know more about the aforementioned programs. ***

 

Translation memories TM’s are optimal tools for texts that are highly repetitive, belong to a larger corpus of specialized texts to be translated, present a wide specialized terminology pool and belong to multilingual localization projects. They help to guarantee a high degree of terminological consistency, ease massive revision processes, speed up productivity in large localization projects and efficiently cumulate topic-related formulisms. *** You seem to be losing more of your focus; keep to your topic. ***

 

However, it is easy *** Why is it “easy”? *** to anticipate that they do not deal well with “stylistically rich” originals and that they impose a segment-restricted optics instead of general-text approaches. The so-called “pPerfect matches” may induce disastrous context-related misinterpretations. Furthermore, there has been a traditional problem of low compatibility between different TM software and, in most cases they involve an expensive investment for translators that may need to face too diverse customer requirements. *** Where are your citations? ***

 

Let us focus on the last two problems: Hhow can the information contained in a translation memory be shared between users of different software? Why can that be useful and when could it be desirable? How can this be useful—practically and financially? *** You should bold face and capitalize (where appropriate) the previous lines and make them a subheading. ***

 

Large localization projects are often undertaken by teams of translators who that are required to use the same software. Their already translated segments are uploaded into a common repository that subsequently provides possible perfection of fuzzy matches not only to the one translator that uploaded them, but also to the other members of the translation team. *** Example? ***

 

Although the advantages of sharing one’s work with other project partners, by means of increasing the size of the commonly developed repository of paired sentences and, thus, the overall amount of translated text that can be recycled, are clear and appealing, *** Why and how are they clear and appealing? *** there are a few serious drawbacks that to the current actual practice of translation memory TM sharing involves.: tTranslators may have to put up with non-agreed be faced with non- unanimous solutions, revisions and eventual changes do affecting other another translators’ work,; the search for group consensus tends to may slow down the process,; and while there is a higher heavier workload for early starters, while more recycled segments are available for late participants. veteran members, newer partners may be given only left-over material to work with.

 

Finally, all translators must use would have to use the same software type and versions. Thus, a professional may end up being excluded from a project because it may not be worth for him or her to invest in that particular new software that may be needed exclusively for a specific project. Even thinking of it as an investment in the long run, by when he or she may need the same software for a new project, new incompatible versions of the program may have been released. *** This paragraph seems to have lost some of the formality of language found in the rest of the text. ***

 

Among the above problems, some are strictly work-flow related —which will not be discussed here— and some others are good-old translation problems. Finally, some other problems related with software standards. For the latter, TMX provides a general solution that is becoming increasingly accepted and integrated by software makers.

 

TMX and New Paradigms in File Sharing

Translation Memory eXchange language (TMX) provides a general solution for software standards this is becoming increasingly accepted and integrated by software developers.  TMX is a SGML/XML *** What are these? *** -based markup language which involves a fairly easy and compatible Internet implementation. It is a standard established by LISA (Localization Industry Standards Association —www.lisa.org— *** Cite this properly. ***) that is being increasingly integrated by translation memory makers within the export/import capabilities of their latest versions. There are several levels of compliance with the TMX norm, ranking from 1 to 3 depending on the amount of meta-data set aside from purely textual information which the system is able to convert into TMX. It becomes a powerful exchange tool when combined with TBX (TermBase eXchange Language) (TBX), which is its counterpart by means of exchanging terminological database contents. Ultimately, by using TMX, translators would not have to use the same TM software in order to co-participate in the same localizationed project.

 

Essentially, TMX works as a text-only based mark-up language into which aligned text —original and its translation(s)—is exported from a translation memory [2].2 No matter which TM software is being used, as long as it furnishes TMX import/export capabilities, the resulting tagged, text-only file could be “read” by any other TM that effectively participates of the same capabilities, no matter what particular internal codification system it uses to store the information.

 

This is a —very much— general picture of how far things have evolved up to these days. How further can they go is still questionable but here follows a speculation on the potential of TMX when combined with currently existing possibilities and software already running on the Internet. What will be said from now on, however speculative, is not simple science-fiction and, should technical and human means be provided, an interesting field of theoretical research and practical application may unfold before us.

 

The new paradigms in Internet file sharing The potential of TMX when combined with current Internet software must be considered here. In the late 1990’s a new way wave of sharing information and files shook the music industry and pushed it to the fringe of bankruptcy in some cases. Programs like Napster, Gnutella, Kazaa and others, allowed users to share their files —including music— and to them exchange exchange files freely. Several national branches of large music companies were forced to close or to deeply restructure their business philosophies because of the economic breakdown inflicted by peer-to-peer Internet music sharing. As a result, a ruling one instance of the a Supreme Court ruling in 2001 closed Napter’s web page and all its activities. This involved one of the most echoed direct interventions of the administration on the actual practices that take place in the Internet. 3 But it is not music or even major financial consequences what may be interesting in regard to translation memories *** “Financial consequences” should be a major concern, as it is a major point stated in your abstract! And, if indeed, it is no longer a concern, then why place any focus at all on the financial demise of such large companies as Napster? ***: it is instead the fact that a network of independent users may share their files so easily which becomes of importance here.

 

Basically, a program like Napster works as follows: a user sets copies a series of music files in his computer within an especial into a “share” folder. The program sends the list of filenames (song titles) to the server, which indexes it. Then Tthe user then sends a query about any song he may be interested in. Since many other users of the same software have also sent their shareable filenames to the server using via Napster, the server locates the requested song title(s) in his its indexed directory and tells the first user in which other computer the song is stored. Then, both users’ each party’s computers connect directly to one with another and file transmission takes place in on a one-to-one basis. The bulge of data (the comparably huge music file) is only transmitted in the final stage. All what happens before that is just listings of short textual units *** Citation? *** (song titles) going to and fro. 4

 

The Gnutella system works in a slightly different way: it is more of a “word-of-mouth” system —if such a bodily metaphor can be used when talking about computers— which is consequently slower but requires no central server. One user launches a request *** How? ***, which is directed to only two computers, the “closest ones” in the network of Gnutella users.  The odds are that those particular computers are not able to satisfy the request of that particular filename *** Proof? ***, so the next thing they do is to re-launch the same query to the next two computers. After twenty times, more than one million computers will have received the request. Once the requested file is located, a response stating where the host is travels back the chain, and, finally the requestor and the provider may get in touch connect directly, without a middleman this time, and Tthe file is transmitted, again on a one-to-one basis. *** Citations? ***

 

The question arising from this seems both obvious and compelling: Can a peer-to-peer exchange system be developed for translation memory sharing? *** This should be a bold-faced, capitalized sub-title. ***

 

Usage of TMX as a unifying *** Universal? *** standard would provide common grounds for the exchange. Once a translation project is finished, translators usually would return their final version to their client, while they usually holding the resulting translation memory as a by-product of their work. *** This sentence is not clear. *** A program would convert the contents of those memories into TMX-tagged multilingual text and an “exchanger” would expose the memories to the World Wide Web by placing them it into a share area open to public access. *** It is not clear if you are finally giving the reader your “solution” to these problems, or if this is still a continuation of the TMX definition. ***

 

The repetition of this action by several users would create a dense and sprawling network of interconnected computers, as it happens with Napster and Gnutella, which could potentially become the largest pool of aligned text (translations and originals) ever.

 

Whenever a translation project starts, users would connect to the network and their “memory exchanger” would launch queries for similar segments to the bulge of participants. Slowly, in a way similar to that of the basic translation memories themselves, pre-translated replies would travel back to the requestor, some in form of perfect matches, most of them in form of fuzzy matches. This would result in a pre-translated draft, whose production perhaps could require the computer to be left working overnight (depending on factors such as length, actual degree of matches found, level of requirements set by the user etc…). *** This “solution” is the focus of your abstract, and should be the bulk of your article, yet you give it only one paragraph? You need to lessen the majority of your piecemeal definitions, which are scattered throughout, and give this “solution” much more attention and research. ***

 

There are of course many questions arising from this, most of them far beyond the scope of this paper. *** Yet, these “questions” are what your focus should be—what you should be researching. *** And not a few immediate drawbacks. To start with, all previous drawbacks from conventional non peer-to-peer sharing would be still there, unsolved. But also, there would be higher risks of potentially wrong translations from anonymous partners: sharper criticism on received equivalences will be needed, making thus the revision process even more demanding. The obviously wider range of topic variety would add to confusion and metadata describing the thematic adscription of segments would be indispensable in order for the machine to “trust” one potential translation or the other. Bandwidth requirements would be unknown. There would also be legal and copyright issues on translated text —as such— versus equivalent segments whose ownership is determined differently depending on national legislations. 5

 

With technical, legal and translational problems ahead, the possibility to implement some peer-to-peer device for translation memory sharing appears both as a challenging enterprise and as a promising area of research. As one good old friend of mine says “machines don’t have intuition, but they have memories” (Fustuegueres 2001, my translation). I would add, maybe we can help them sharing.

 

End notes 6

[1] 1. See Hutchins (Hutchins 1996) for an enlightening description of the most frequent misinterpretations and misleading circumstances related to the ALPAC report.

 

[2] 2. Aligned texts are the main asset of a translation memory, and many resources are usually devoted by companies and institutions to align texts that had been translated before the implementation of TM software in order to enhance the production of subsequent translation activity. *** The should still be a reference link with this endnote. ***


 

References 7

Abaitua, Joseba. “TMX Fformat,.1.1 (August 1998),.

TABhttp://paginaspersonales.deusto.es/abaitua/konzeptu/ta/tmx.htm (Date of access). *** I do not see this reference in your paper. ***

 

Brain, Marshall. “How Gnutella Works,.How Sstuff Wworks,. (Date of publication.)

TABhttp://computer.howstuffworks.com/file-sharing3.htm (Date of access). *** I do not see this reference in your paper. *** *** This is not a user-friendly site; I can find no location for your article title. ***

 

Davis, Paul C. (2002). Stone Soup Translation: The Linked Automata Model. (Doctoral dissertation., Ohio State University, 2002),. Retrieved (month date, year), from Ohio State University, Paul C. Davis Web site:

http://www.ling.ohio-state.edu/~pcdavis/papers/diss.pdf *** I do not see this reference in your paper. ***

 

Freigang, Karl Heinz. “Automation of Translation: Past, Presence, and Future.in Revista Tradumatica No. 0 (October 2001),.

http://www.fti.uab.es/tradumatica/revista/num0/sumari/sumari.htm (Date of access).

 

Gow, Francie. (2003). Metrics for Eevaluating Translation Memory Software. Unpublished master’s tThesis., University of Ottawa, 2003. *** I do not see this reference in your paper. ***

 

Hutchins, John. “The precursos and the pioneers,” Machine Translation, past, present and future. New York: Halsted, 1986.

 

-          ALPAC: the (in)famous report,” MT News International Vol. 14, June 1996, pp. 9-12. Reprinted in: Readings in machine translation, ed. Sergei Nirenburg, Harold Somers, and Yorick Wilks (Cambridge, Mass.: The MIT Press, 2003), pp. 131-135. Also available at: http://ourworld.compuserve.com/homepages/WJHutchins/Alpac.htm *** This reference is not cited properly in your text. You need only to include the reference that you actually cite. ***

 

Baker, M, and L.K. Venuti, eds. Jakobson, Roman. “On Linguistic Aspects of Translation.” In Baker, M., and Venuti, L.k Eds. The Translation Studies Reader,. By Roman Jakobson. 113-118. London and New York: Routledge, 2000. 113-118.

 

Nyberg, Eric;, and Teruko Mitamura. Mitamura, Teruko. “The Kant Ssystem: Ffast, Aaccurate, Hhigh-Qquality, Ttranslation in Ppractical Ddomains,.Proceedsings of Coling COLING-92,. Nantes, 1992. *** This is an on-line abstract on PDF. ***

 

Sanchez-Gijon, Pilar. “Cataleg de sistemes de memories de traduccio,.” Revista Tradumatica,. No. 0 (October 2001).

http://www.fti.uab.es/tradumatica/revista/num0/sumari/sumari.htm (Date accessed). *** I do not see this reference in your paper. ***

 

SDL International. An Introduction to Computer Aided-Translation, http://www.sdl.com/products and http://tc.eserver.org/18490.html *** You need to separate both web site addresses, and give each one its own line; if this is not a significant source, but rather supplementary material, it should be cited as an end note. ***

 

Several authors. “CAT fight,.1999-2005. Proz, The Translators Workplace,. http://www.proz.com/?sp=cat/compare (Date of access). *** I do not see this reference in your paper. ***

 

Silvia Fustegueres, Silvia. “Qui te por de les memories de traduccio?” Revista Tradumatica,. No. 0 (October 2001).

http://www.fti.uab.es/tradumatica/revista/num0/sumari/sumari.htm (Date accessed). *** I do not see this reference in your paper. ***

 

Zerfass, Angelika. “Evaluating Translation Memory Systems,.First International Workshop on Language Resources for Translation Work and Research,. Gran Canaria, 2002. *** I do not see this reference in your paper. ***

 

 

 

(Refer to corresponding coloured numbers in text)

 

1. This would have been a very interesting topic to read about. However, the bulk of your article also falls into the trap of discussing issues that are not new. There is no real new train of thought here, but rather a chaotic summary of terms and definitions.

 

2. All citations disappear from this point on, which happens to be the majority of your article. It is vital to source anything that could be attributed to someone else’s ideas and research. Pay special attention to making off-handed remarks or making judgments (like when you indicated the “cheap” vs. “expensive” translation memory programs).

 

3. The decision to close down the Napster website was made by the Lower Courts, and not the Supreme Court. Following are two links that may be helpful:

 

http://www.ce9.uscourts.gov/web/newopinions.nsf/4bc2cbe0ce5be94e88256927007a37b9/c4f204f69c2538f6882569f100616b06?OpenDocument

 

http://archives.cnn.com/2001/LAW/02/12/napster.decision/

 

4. The wording on directions for downloading music on the Napster site is a little awkward. The following site may be helpful: http://en.wikipedia.org/wiki/Napster

 

5. Mentioning these dilemmas is just and valid, but they seem to overshadow any solutions you may have already proposed, and may have become even more of an obstruction than the already existing models (see KANT and TMX).

 

6. Endnotes / Footnotes: www.aresearchguide.com/7footnot.html. Superscripting is a better way to indicate an endnote.

 

7. Referencing: you may consider purchasing the 1998 copy of MLA handbook (www.mla.org) if you haven’t already done so, or visit www.ccc.commnet.edu/mla/.

 

 

Your focus is unclear from the beginning. Work on an introduction that addresses the

points you’re going to make. (John Hutchins and Harold L. Somers, an introduction to machine translations:

http://ourworld.compuserve.com/homepages/WJHutchins/IntroMT-TOC.htm). Your abstract seems to promise the following: old vs. newer translation programs, government vs. personal expenditures, alternative ways to share the wealth. Yet, in the actual article you jump from one point to another; your bold-faced titles do not follow any logical pattern. This could potentially be an interesting paper, but you throw too many ideas, definitions, and piece-meal examples into the pot—which makes it look more like you should be writing a research paper—you have too much going on to justify turning this in as just an article.

 

You may want to focus on one issue you bring up (like sharing files between differing peer-

to-peer programs; why certain governments have stopped funding research in the translation field; or expensive vs. inexpensive translation programs and the good and not-so-good of each), and work with that. You cannot jump into such a huge topic and not give some background information to machine translation.

 

The following link may also be of interest regarding machine translation:

www.essex.ac.uk/linguistics/clmt/papers/mt/ (mostly .ps files)

 

MIRC and IRC are probably the grandfather’s of on-line chat, which you do not mention,

but could also provide some interesting points. Take a look at Doug Robinson: Cyborg Translation (it deals humourously with chat and translation from a sci-fi view point).

 

Proxies and Firewalls may also provide an interesting look at translation (since part of your discussion touched on file-sharing: www.mirc.co.uk/help/proxies.html.

 

Are you familiar with the Janice Walker style of Internet referencing? Whichever style you decide to use, you must just make sure you are consistent with usage.

 

A – 1

B –  3 (2 with revision)

C – 3 (2 with revision)

D – 3 (2 with revision)

E – 2 (1 with revision)

F – 3 (2 with revision)

G – 3 (2 with revision)

H – 3 (1 with revision)

 

Reviewer: