News & Events

Machine Translation: Green, Yellow, and Red

by Prof. Aarne Ranta, Professor of Computer Science, Department of Computer Science and Engineering, University of Gothenburg

Humanities Lecture Series
Date                  10 November 2014
Time                  11:00am
Venue                CD309


The main stream in machine translation is to build systems that are able to translate everything, but without any guarantees of quality. An alternative to this is systems that aim at precision but have limited coverage. Combining wide coverage with high precision is considered unrealistic. Most wide-coverage systems are based on statistics, whereas precision-oriented domain-specific systems are typically based on grammars, which guarantee translation equality by some kind of formal semantics. 

This talk introduces a technique that combines wide coverage with high precision, by embedding a high-precision semantic grammar inside a wide-coverage syntactic grammar, which in turn is backed up by a chunking grammar. The system can thus reach good quality whenever the input matches the semantics; but if it doesn't, the user will still get a rough translation. The levels of confidence can be indicated by using colours, whence the title of the talk. 

The talk will explain the main ideas in this technique, based on GF (Grammatical Framework) and also inspired by statistical methods (probabilistic grammars) and the Apertium system (chunk-based translation), boosted by freely available dictionaries (WordNet, Wiktionary), and built by a community of over 50 active developers. The current system covers 11 languages: 

Bulgarian, Chinese, Dutch, English, Finnish, French, German, Hindi, Italian, Spanish, and Swedish. It is available both as a web service and as an Android application. 

The translator is open source software, which has enjoyed contributions from more than 50 people around the world. The license (LGPL/BSD) also allows commercial applications, which typically involve the specialization of the green part to some specific domain.


Back to top