Projects

Si-Ta - Machine assisted translation system for official government documents written in Sinhala/Tamil

Si-Ta is a collaborative effort between the National Languages Processing Centre, the Department of Official Languages (DOL) and Ministry of National Co-existence, Dialogue & Official Languages. Si-Ta currently supports the translation of short official documents between Sinhala-to-Tamil and Tamil-to-Sinhala.

 

A Comprehensive Parts of Speech (POS) tag set and a POS tagger for Sinhala

We have developed a new Sinhala POS tag set that overcomes the limitations of the previous Sinhala POS tag set. A corpus of 400,000 words have been manually annotated, which in turn was used to train different supervised classifiers.

 

Sinhala Morphological Synthesizer

Sinhala is a morphologically rich language, thus morphological analysis on Sinhala is challenging. Currently we have developed a morphological synthesizer for Sinhala nouns, which works with a reasonable accuracy.