Statistical Machine Translation in Low Resource Settings

Ann Irvine • @North American Chapter of the Association for Computational Linguistics • 01 June 2013

TLDR: This thesis aims to reduce the dependence of modern SMT systems on expensive parallel data by augmenting components of the SMT framework with large monolingual and comparable corpora, and extends its applicability beyond the small handful of language pairs with large amounts of available parallel text.

Citations: 13

Abstract: My thesis will explore ways to improve the performance of statistical machine translation (SMT) in low resource conditions. Specifically, it aims to reduce the dependence of modern SMT systems on expensive parallel data. We define low resource settings as having only small amounts of parallel data available, which is the case for many language pairs. All current SMT models use parallel data during training for extracting translation rules and estimating translation probabilities. The theme of our approach is the integration of information from alternate data sources, other than parallel corpora, into the statistical model. In particular, we focus on making use of large monolingual and comparable corpora. By augmenting components of the SMT framework, we hope to extend its applicability beyond the small handful of language pairs with large amounts of available parallel text.

Related Fields of Study

13 Citations No References

Citations

Sort by

Showing results 1 to 0 of 0

References

Sort by

Showing results 1 to 0 of 0