Vol 2, No 1 (2012)

Special Issue on the Crossroads between Contrastive Linguistics, Translation Studies, and Machine Translation

Contrastive Linguistics (CL), Translation Studies (TS) and Machine Translation (MT) have common grounds: They all work at the crossroad where two or more languages meet. Despite their inherent relatedness, methodological exchange between the three disciplines is rare. This special issue touches upon areas where the three fields converge. It results directly from a workshop at the 2011 German Association for Language Technology and Computational Linguistics (GSCL) conference in Hamburg where researchers from the three fields presented and discussed their interdisciplinary work.

While the studies contained in this volume draw from a wide variety of objectives and methods, and various areas of overlaps between CL, TS and MT are addressed, the volume is by no means exhaustive with regard to this topic. Further cross-fertilisation is not only desirable, but almost mandatory in order to tackle future tasks and endeavours, and TC3 remains committed to bringing these three fields even closer together.



Oliver Čulo

Introduction to TC3, Vol. 2, No. 1, July 2012.

Methodological cross-fertilization: empirical methodologies in (computational) linguistics and translation studies

Erich Steiner

Recent years have seen attempts at improving empirical methodologies in contrastive linguistics and in translation studies through interdisciplinary collaboration with multi-layer corpus architectures in computational linguistics. At the same time, explanatory background for empirical results is increasingly sought in more sophisticated models of language contact in typologically based contrastive linguistics on the one hand, and in language processing in situations of multilinguality, including translation, on the other. Three attempts are discussed to narrow the significant gap between the high level of abstraction of such models, and data provided through shallow analysis and annotation of electronic corpora.
The first of these operationalizes the high level terms “explicitness/ explicitation” in terms of lexicogrammatical data available in a contrastive corpus, treating them as dependent variables and attempting to explain their variation in terms of the independent variables controlled for in the corpus architecture.
The second attempt starts from the same corpus architecture, yet includes annotations about textual cohesion in its operationalizations and develops increasingly fine-grained hypotheses to limit search space and variation between independent and dependent variables so as to get closer to causal explanations rather than explanations in terms of co-variation .
The third attempt intersects corpus data of the type outlined before with data from processing studies, aiming at an integration and mutual explanation of product and process data. Our focus here is on methodological issues involved in integrating data of such different types and granularity in an overall empirical research architecture.

Keywords: empirical methodologies, corpus architectures, processing studies

Text Structure in a Contrastive and Translational Perspective

Iørn Korzen, Morten Gylling

This paper argues that both human translators and machine translation systems can greatly benefit from contrastive studies of text structure. Due to the great terminological and definitional confusion regarding structures in texts, the paper first discusses the main viewpoints on these issues and then outlines the two most significant differences between Italian and Danish text structure. One regards the notion of information density: Italian tends to accumulate the same information in shorter text spans and to include a larger number of Elementary Discourse Units in each sentence than Danish. The other regards clause linkage: A higher percentage of Italian clauses is morpho-syntactically and rhetorically subordinated by means of non-finite and nominalised verb forms. Danish text structure, on the other hand, is more informationally linear and characterised by a higher number of finite verbs and topic shifts. These typological differences are transferred into some simple translation rules concerning the number of Elementary Discourse Units per sentence and their textualisation. Each rule is illustrated by a number of examples taken from the parallel part of the Europarl Corpus.

Keywords: text structure; information density; clause linkage; Danish; Italian

Abstract pronominal anaphors and label nouns in German and English: selected case studies and quantitative investigations

Heike Zinsmeister, Stefanie Dipper, Melanie Seiss

Abstract anaphors refer to abstract referents, such as facts or events. This paper presents a corpus-based comparative study of German and English abstract anaphors. Parallel bidirectional texts from the Europarl Corpus were annotated with functional and morpho-syntactic information, focusing on the pronouns ‘it’, ‘this’, and ‘that’, as well as demonstrative noun phrases headed by “label nouns”, such as ‘this event’, ‘that issue’, etc., and their German counterparts. We induce information about the cross-linguistic realization of abstract anaphors from the parallel texts. The contrastive findings are then controlled for translation-specific characteristics by examination of the di fferences between the original text and the translated text in each of the languages. In selected case studies, we investigate in detail “translation mismatches”, including changes in grammatical category (from pronouns to full noun phrases, and vice versa), grammatical function, or clausal position, addition or omission of modifying adjectives, changes in the lexical realization of head nouns, and transpositions of the demonstrative determiner. In some of these cases, the specificity of the abstract noun phrase is altered by the translation process.

Keywords:abstract anaphors; parallel corpora; comparable corpora; translation mismatches

An analysis of translational complexity in two text types

Martha Thunes

This article presents an empirical study where translational complexity is related to a notion of computability. Samples of English-Norwegian parallel texts have been analysed in order to estimate to what extent the given translations could have been produced automatically, assuming a rule-based approach to machine translation. The study compares two text types, fiction and law text, in order to see how these differ with respect to the question of automatisation. A central assumption behind the empirical method is that a specific translation of a given source expression can be predicted, or computed, provided that the linguistically encoded information in the original, together with information about source and target languages, and about their interrelations, provides the information needed to produce that specific target expression. The results of the investigation indicate that automatic translation tools may be helpful in the case of the law texts, and the study concurs with the view that the usefulness of such tools is limited with respect to fiction. Finally, an extension of the analysis method is proposed in order to make it relevant as a diagnostic tool for the feasibility of automatic translation in relation to specific text types.

Keywords: English-Norwegian parallel text, translational complexity, text types, computability

Statistical Machine Translation Support Improves Human Adjective Translation

Gerhard Kremer, Matthias Hartung, Sebastian Padó, Stefan Riezler

In this paper we present a study in computer-assisted translation, investigating whether non-professional translators can profit directly from automatically constructed bilingual phrase pairs. Our support is based on state-of-the-art statistical machine translation (SMT), consisting of a phrase table that is generated from large parallel corpora, and a large monolingual language model. In our experiment, human translators were asked to translate adjective–noun pairs in context in the presence of suggestions created by the SMT model. Our results show that SMT support results in an acceptable slowdown in
translation time while significantly improving translation quality.

Keywords: Machine-Supported Human Translation, Web Experiment, Translation Quality, Translation Speed

Inside the Monitor Model: Processes of Default and Challenged Translation Production

Michael Carl, Barbara Dragsted

It has been the subject of debate in the translation process literature whether human translation is a sequential and iterative process of comprehension-transfer-production or whether and to what extent comprehension and production activities may occur in parallel. Tirkkonen-Condit (2005) suggests a “monitor model” according to which translators start with a literal default rendering procedure and where a monitor interrupts the default procedure when a problem occurs. This paper suggests an extension of the monitor model in which comprehension and production are processed in parallel by the default procedure. Deviations from this default behaviour are triggered through text production problems and involve conscious decision-making processes, related to text comprehension or to text production problems. In order to quantify this hypothesis, we compare text copying with translation activities under the assumption that text copying is a prototypical literal default rendering procedure. Both, translation and text copying, require decoding, retrieval and encoding of textual segments, but translation requires in addition a transfer step into another language. Comparing user behaviour obtained in copying and translation experiments, we observe surprisingly many similarities between these two activities. Copyists deviate from the default literal text reproduction into more effortful text understanding, and much of the translators’ behaviour resembles that of copyists. We observe that extended ST and TT comprehension is triggered through production problems, during translation as well as during text copying.

Keywords: human translation process research, key-logging, eye tracking, text copying

