Extraction of Translation Unit from Chinese-English Parallel Corpora
CHANG Baobao
Institute of Computational Linguistics
Peking University,
Beijing, P.R.China, 100871
chbb@pku.edu.cn
Pernilla DANIELSSON and
Wolfgang TEUBERT
Centre for Corpus Linguistics
Birmingham University,
Birmingham, B15 2TT United Kingdom
pernilla@ccl.bham.ac.uk
teubertw@hhs.bham.ac.uk
Abstract
More and more researchers have recognized
the potential value of the parallel corpus in the
research on Machine Translation and Machine
Aided Translation. This paper examines how
Chinese English translation units could be
extracted from parallel corpus. An iterative
algorithm based on degree of word association is
proposed to identify the multiword units for
Chinese and English. Then the Chinese-English
Translation Equivalent Pairs.are extracted from
the parallel corpus. We also made comparison
between
different
statistical
association
measurement in this paper.
Keywords: Parallel Corpus, Translation
Unit , Automatic Extraction of Translation
unit
Introduction
The field of machine translation has changed
remarkably little since its earliest days in the
fifties. So far, useful machine translation could
only obtained in very restricted domain. We
believe one of the problems of traditional
machine translation lies in how it deals with unit
of translation. Normally Rule-Based Machine
Translation system
takes word as basic
translation unit. However, word is normally
polysemous and therefore ambiguous, which
causes many difficulties in selecting proper
target equivalent words in machine translation,
especially in translation between unrelated
language pairs, such as Chinese and English. On
the other hand, human translation is rarely
word-based. Human translators always translate
group of words as a whole, which means human
do not view words as the basic translation units,
and it seems they view language expressions that
can transfer meaning unambiguously as basic
translation units
instead.