Submissions/Seamless integration of machine translation as an aid to Wikipedia readers and editors

From Wikimania 2011 • Haifa, Israel
Jump to navigation Jump to search

Information icon.svg

This is an open submission for Wikimania 2011.

Review no.


Title of the submission
Seamless integration of machine translation as an aid to Wikipedia readers and editors
Type of submission (workshop, tutorial, panel, presentation)
Author of the submission
Marek Blahuš
E-mail address or username (if username, please confirm email address in Special:Preferences)
Country of origin
Czech Republic
Affiliation, if any (organization, company etc.)
E@I (Education@Internet)
Personal homepage or blog
Abstract (please use no less than 300 words to describe your proposal)

Wikipedia article texts have often been used by authors of machine translation systems, because they are easy to get and work with, quite homogenous in form yet heterogenous in topic, and they are large. This makes them a good corpus for development and training of computer linguistics applications, including machine translation systems. As a consequence, such systems that have been trained with Wikipedia texts are especially suitable for further applications in the same domain - they may be used to provide Wikipedia readers with machine translations of foreign language Wikipedia articles in their own language (particularly if similar article in this language is missing), as well as to simplify the translation of articles between two language versions of the Wikipedia by editors.

WikiTrans, developed by the GramTrans company, is a machine translation system especially aimed at Wikipedia articles. Its translation engine based on the novel Constraint Grammar approach has already translated all articles from the English Wikipedia into Esperanto, a pioneer language pair. All wiki markup is preserved during the translation process, which itself works with HTML internally, so that results are readily available for viewing without the need of parsing, while translated text in proper wiki syntax may be requested at any time. Machine translation of English Wikipedia into Esperanto performed by WikiTrans is available at Esperanto speakers generally agree about it being the best quality machine translation existing so far, and it was suggested that a turning point may have been achieved in which it takes less time to revise such a machine translation (both in terms of language and wiki syntax) than to translate the same text manually as usual.

In order to more closely interconnect WikiTrans and Wikipedia, a Wikipedia gadget has been implemented by E@I (Education@Internet) which provides the experience of seamless integration of the machine translation software into the encyclopedia: When the user requests an article that is missing in his language version of the Wikipedia (e.g. Esperanto), machine translation of the article on the same topic from the other language (e.g. English) appears automatically, with appropriate notification on this fact as well as additional tools enabling the user to quickly switch to another translation candidate (should the equivalent article have not been identified correctly by the software) and to start writing a new article in the home Wikipedia based on this machine translation. The process of revising the machine translation into the form of a new real target language article has been divided into two parts - revising the language and revising the syntax - what gives the editor full freedom of editing the wiki text (in the second part), while preserves the possibility of collecting valuable data originating in human revision of the language mistakes performed by the translation software (first part). In result, both parties profit from the integration - Wikipedia (which has received a new article more easily), as well as the used machine translation software (which may be taught by its author not to make the same mistakes next time).

Track (People and Community/Knowledge and Collaboration/Infrastructure)
Will you attend Wikimania if your submission is not accepted?
Yes, thanks to the partial scholarship I have received.
Slides or further information (optional)
WikiTrans – the machine translation software involved

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Vibhijain 11:41, 6 May 2011 (UTC)
  2. CasteloBrancomsg 00:15, 20 May 2011 (UTC)
  3. Hindustanilanguage 06:40, 21 May 2011 (UTC)
  4. --Gomà 16:39, 7 June 2011 (UTC)
  5. Add your username here.