1. K Sarveswaran Project Title: A deep syntactic parser for the Tamil language. K. Sarveswaran is a doctoral student assistant at the National Languages Processing Centre, University of Moratuwa. He has been attached to the centre since its initiation. Sarves has been developing a deep syntactic parser for the Tamil language. Despite its large speaker population around the world and historical time depth, Tamil is an under-researched language that is also under-resourced from the perspective of Natural Language Processing. Therefore, he believes that developing a parser would be a useful contribution to the field of Natural Language Processing, specifically to the development of a Machine Translation application. He has published a language processing tool stack for Tamil called ThamizhiLP, which consists, so far, a pre-processor, a part of speech tagger, a morphological analyser/generator and a dependency parser. In addition, he jointly works with other members in the NLPC team. Sarves has published several research papers, and a journal. He has also organised an international summer school and workshops for the centre. Apart from these he collaborates with members at University of Konstanz and University of Hyderabad on Tamil language resource development. In this relation, he has also spent a few months at University of Konstanz in 2018 and 2019. Supervisors: Prof. Gihan Dias (University of Moratuwa) and Prof. Miriam Butt (University of Konstanz, Germany).
|
|
|
2. Anosha Ignatius Project title: Speech Embedding with Segregation of Paralinguistic information for Tamil Language Project Description: The presence of paralinguistic information such as speaker characteristics, accent, and emotion expression causes performance degradation in speech processing applications where only the linguistic content is needed. This project is intended to develop a speech embedding model for Tamil language, that disentangles the underlying paralinguistic information in the speech signal while preserving the linguistic content. Supervisor: Dr. Uthayasanker Thayasivam
|
3. Chathuri Jayaweera Project title: Automatic Post Editing (APE) seeks to automatically refine the output of a black-box machine translation (MT) system through human post-edits (PE). The objectives of this technique are to replace the time and energy-consuming human post-editing process with a rather fast approach and to provide a better solution to improve MT quality instead of building new MT systems from scratch. APE systems assume the availability of source language input text, MT output and target language PE data. In the early stages of APE implementations, only MT output and target language PE were used as inputs, but later it was observed that integrating source-language information together with the other two types of inputs was useful in conveying context information to improve APE performance. There have been several implementations with varying architectural qualities that have shown significant results on certain language pair translations such as English – German translation. Hence, the research carried out by me is focused on utilizing these currently available APE approaches to improve the quality of Sinhala - English translations. Achievements: A part of my study was submitted and presented at the 6th annual Symposium on Natural Language Processing - 2020 (SNLP 2020) organised by the National Language Processing Centre of University of Moratuwa.
|
|
4. Kallindu Kumarasinghe Project title: Sinhala Grammar Error Corrector
Supervisor : Prof. Gihan Dias
|
|
5. Janarthanasarma Baskarakurukkal Project Title : neural machine translation (NMT) system for English-Tamil language pairs He is currently working on neural machine translation (NMT) system for English-Tamil language pairs. He is working on improving the efficiency of deep learning systems for machine translation involving Tamil which is a low resource and morphologically rich language. He is mainly focusing on techniques like sub word segmentation for NMT. Bio Currently, Janarthanasarma is a postgraduate research student at the University of Moratuwa. He earned his undergraduate degree in Computer Science and Engineering from the same university in 2018. He also has one and a half years industry experience as a software engineer.
|
|
6. Koshiya Epaliyana
Supervisors: Dr Surangika Ranathunga, Prof. Sanath Jayasena
|
|
7.Anushika Liyanage
Utilize the generated bilingual lexicon in a task specific downstream system to enhance the results of the overall system
|
|
8. Sarubi Thillainathan
Supervisors: Dr. Surangika Ranathunga, Prof. Sanath Jayasena
|
|
9. Rameela Azeez
Supervisors: Dr. Surangika Ranathunga
|
|
10. Aloka Fernando
The Neural Machine Translation (NMT) model is limited to producing reliable translations to the top most frequent words in the training corpus. The vocabulary that is not covered by the NMT model, referred to as Out-of-Vocabulary (OOV), leads to producing weaker translation outputs. Low-resourced languages such as Sinhala-English have a limited parallel corpus, and Sinhala being a morphologically rich language OOV problem becomes severe. The research would focus on two techniques, data augmentation and subword-based encodings to address the OOV problem. Supervisors: Prof. Gihan Dias, Dr. Surangika Ranathunga
|