Vol 3, No 1 (2013)

Special Issue on Language Technologies for a Multilingual Europe

Guest editors: Georg Rehm, Felix Sasaki, Daniel Stein, and Andreas Witt

This Special Issue brings together various groups concerned with the umbrella topics of multilingualism and language technology, especially mutlilingual technologies. The contributions of this issue originate from the workshop "Language Technology for a Multilingual Europe" held at the 2011 GSCL conference in Hamburg and co-organised by two GSCL working groups (Text Technology and Machine Translation) and META-NET, an EU-funded Network of Excellence dedicated to building the technological foundations of a multilingual European informaiton society. The issue takes a wide shot at the topic of multilingual language technologies, encompassing contributions laying out research agendas, describing upcoming and future developments, and reporting on research performed within the greater circle of the META-NET alliance.



Georg Rehm, Felix Sasaki, Daniel Stein, Andreas Witt

Editorial to the Special Issue on Language Technologies for a Multilingual Europe, TC3, Volume 3, Number 1, June 2013.

Full Text: PDF


Machine Translation - Past, Present, and Future

Daniel Stein

The attempt to translate meaning from one language to another by formal means traces back to the philosophical schools of secret and universal languages as they were originated by Ramon Llull or Johann Joachim Becher. Until today, machine translation (MT) is known as the crowning discipline of natural language processing. Due to current MT approaches, the time needed to develop new systems with similar power to the older ones, has decreased enormously. However, when comparing current achievements to those of thirty years ago, only a minor dierence in the number and type of errors can be observed. In this article, the history of MT, the dierence to computer aided translation and the current approaches are discussed.

Keywords: Machine Translation

Full Text: PDF


The META-NET Strategic Research Agenda for Language Technology in Europe: An Extended Summary

Georg Rehm

Recognising Europe’s exceptional demand and opportunities for multilingual language technologies, 60 leading research centres in 34 European countries joined forces in META-NET, a European Network of Excellence. Working together with numerous additional organisations and experts from a variety of fields, META-NET has developed a Strategic Research Agenda (SRA) for multilingual Europe – the complex planning and discussion process took more than two years to complete. While the complete SRA has been published elsewhere (Rehm and Uszkoreit, 2013), this heavily condensed version provides an extended summary as an alternative mode of access and to enable interested parties to familiarise themselves with its key concepts in an efficient way.

Full Text: PDF


Metadata for the Multilingual Web

Felix Sasaki

We describe the Internationalization Tag Set (ITS) 2.0, an upcoming standard to foster the development of the multilingual Web. ITS 2.0 provides metadata to integrate workflows for content production, localization and language technology. The technical goal is to achieve better results in content creation and other language related processes; the goal in terms of community building is to raise awareness of needs in multilingual workflows. This aim is also supported by providing re-usable software components for various use cases.

Keywords: annotation, standardization, Web technology

Full Text: PDF


State of the Art in Translation Memory Technology

Uwe Reinke

Commercial Translation Memory systems (TM) have been available on the market for over two decades now. They have become the major language technology to support the translation and localization industries. The following paper will provide an overview of the state of the art in TM technology, explaining the major concepts and looking at recent trends in both commercial systems and research. The paper will start with a short overview of the history of TM systems and a description of their main components and types. It will then discuss the relation between TM and machine translation (MT) as well as ways of integrating the two types of translation technologies. After taking a closer look at data exchange standards relevant to TM environments the focus of the paper then shift towards approaches to enhance the retrieval performance of TM systems looking at both non-linguistic and linguistic approaches.

Keywords: translation technology; computer-assisted translation; translation memory; translation environment

Full Text: PDF


Authoring Support for Controlled Language and Machine Translation

Melanie Siegel

Automatic authoring support for controlled language and machine translation has previously been seen as two distinct tools in the text processing process. We describe methods for close integration of both, resulting in better written documents as well as machine translation output of higher quality.

Keywords: controlled language, machine translation

Full Text: PDF


Integration of Machine Translation in On-line Multilingual Applications - Domain Adaptation

Mirela Stefania Duma, Cristina Vertan

Large amounts of bilingual corpora are used in the training process of statistical machine translation systems. Usually a general domain is used as the training corpus. When the system is tested using data from the same domain, the obtained results are satisfactory, but if the test set belongs to a different domain, the translation quality decreases. This is due to insufficient lexical coverage, wrong choice in case of polysemous words and differences in discourse style between the two domains. Thus, the need to adapt the system is an ongoing research task in machine translation. Some challenges in performing domain adaptation are to decide which part of the system requires adaptation and to choose what method needs to be applied. In this paper, we used language model interpolation as a domain adaptation method and proved that it is a fast state of the art method that can be used in building adapted translation systems even when sparse domain specific material is available (i.e. especially in the case of low-resourced language pairs). The best improvement was of 15 BLEU points over the baseline system.

Keywords: domain adaptation; statistical machine translation; web content management system

Full Text: PDF


Disambiguate Yourself. Supporting Users in Searching Documents with Query Disambiguation Suggestions

Ernesto William De Luca, Christian Scheel

In this article we present a query-oriented semantic approach and the respective architecture for supporting users in searching and browsing documents in a retrieval framework. While users are typing their queries a “meaning-oriented" analysis of each keystroke can provide different disambiguation suggestions (spelling correction, Named-Entity Recognition, WordNet- and Wikipedia-based suggestions) that can help users in formulating their queries for filtering relevant results. On the other hand systems can better interpret the query, because users implicitly tag the queries with the related meaning choosing the desired concept they had in mind. After the presentation of our architecture we show the results of two user studies, where users were asked to judge the support while typing their query and browsing documents. These results confirm that a semantic support is important in both cases.

Keywords: Natural Language Processing, User Modeling, Personalization, Recommendation

Full Text: PDF


Multilingual Knowledge in Aligned Wiktionary and OmegaWiki for Translation Applications

Michael Matuschek, Christian M. Meyer, Iryna Gurevych

Multilingual lexical-semantic resources play an important role in translation applications. However, multilingual resources with sufficient quality and coverage are rare as the effort of manually constructing such a resource is substantial. In recent years, the emergence of Web 2.0 has opened new possibilities for constructing large-scale lexical-semantic resources. We identified Wiktionary and OmegaWiki as two important multilingual initiatives where a community of users (“crowd”) collaboratively edits and refines the lexical information. They seem especially appropriate in the multilingual domain as users from all languages and cultures can easily contribute. However, despite their advantages such as open access and coverage of multiple languages, these resources have hardly been systematically investigated and utilized until now. Therefore, the goals of our contribution are threefold: (1) We analyze how these resources emerged and characterize their content and structure; (2) We propose an alignment at the word sense level to exploit the complementary information contained in both esources for increased coverage; (3) We describe a mapping of the resources to a standardized, unified model (UBY-LMF) thus creating a large freely available multilingual resource designed for easy integration into applications such as machine translation or computer-aided translation environments.

Keywords: lexical semantic resources; collaboratively constructed resources; standards for language resources; alignment of language resources; Lexical Markup Framework

Full Text: PDF


The BerbaTek project for Basque: Promoting a less-resourced language via language technology for translation, content management and learning

Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe

The Basque language is both a minority language (only a small proportion of the population of the Basque Country speaks it) and also a less-resourced language (being spoken only in a small region by few speakers). Fortunately, the Basque regional government is committed to its recovery, and has adopted policies for funding, among other things, language technologies, a field which a language aiming to survive cannot leave dispense with. BerbaTek is a 3-year (2009-2011) strategic research project on language, speech and multimedia technologies for Basque carried out by a consortium of 5 members, all prominent local organizations dedicated to research in the above-mentioned areas, and partially funded by the Departments for Industry and Culture of the Basque Government. The collaboration on BerbaTek has allowed not only a great amount of basic research to be done, but also a more applied research to be carried out and various prototypes to be developed to show the potential of integrating these technologies for the language industry sector, that is, companies working in the fields of translation, content management and learning.

Keywords: Natural Language Processing; Speech Processing; Less-Resourced Languages

Full Text: PDF

