SigmaLaw: Legal Information Extraction

Lawyers and paralegals spend a lot of time searching for the information they need from legal documents for a given task or a court case. Such legal documents - presenting the statutes (laws) and court cases - are generally available online. We currently have specific focus on Case Law which can be described as a part of common law, consisting of judgments given by higher (appellate) courts in interpreting the statutes applicable in cases brought before them. In order to find useful information from documents related to a given legal scenario, lawyers and other legal officials have to spend a significant amount of effort and time. Keyword-based search over these documents, as carried out by web search engines, is not sufficient here because a more sophisticated understanding of the contents is needed. Absence of a system or a research methodology that can represent legal information in an intuitive and well-structured manner is a major challenge when it comes to facilitating legal officials via an autonomous system. Thus, our system and methodologies are to be of assistance to lawyers and other legal officials by reducing the time and effort a lawyer has to put into find court cases and arguments related to a new legal scenario. Within this project, we have already initialized the development of research methodologies and  resources which are required to organize the information available in court case transcripts (legal opinion text) in a systematic manner. Domain specific neural word embedding models, a legal ontology , legal information retrieval systems, system to identify relationships existing between sentences in legal opinion texts and a domain specific sentiment annotator are some of the tools and methodologies which have already been developed. The objective of this project is to develop an information extraction system for law which helps lawyers and paralegals in their work. Identifying major parties related to a court case, extracting arguments brought forward by each party, detecting counter arguments for a particular argument and identifying the party which is supported by a particular argument (or a legal opinion) can be considered as some of the future directions of the project.


Spatio-Temporal Analysis of Dengue Epidemic in Sri Lanka using Mobile Network Big Data based Mobility Models

Dengue is the most rapidly spreading mosquito-borne virus in the world. In Sri Lanka, there have been 51591 dengue cases for the year 2018. Since there is no vaccine or therapeutic protocol against dengue, outbreak preparedness is an important technical and operational element as suggested by WHO. Such a preparedness plan would help to deal with the inflow of patients, medical supplies and facilities, political issues, and vector control. Yet, existing epidemic forecasting models do not provide an accurate usable solution. Human mobility is considered a major factor for epidemiology and it’s crucial for successful epidemic forecasting. Due to the absence of proper data sources in developing countries like Sri Lanka, CDR (Call Details Records) data is a very useful source since the vast majority of people have access to mobile communication devices and the user mobility can be modeled using CDR data. Dengue epidemic forecasting requires multiple data sources due to the fact that the spread of the disease depends on multiple factors. Human Mobility is one such vital factor in the forecast of spread of the disease. There are multiple human mobility models such as the gravity, exploration and Preferential Return, etc that are used in the literature. Existing risk-based mobility models compute risk scores for a given location based on domain intuition. Increased precision on the risk score might lead to improved forecasts for the epidemic model. A data-driven approach, to find precise values for the risk score using sensitivity analysis, formulas and/or penalty based methods based on past data can be explored.

Developing or improving existing human mobility models using CDR data to derive a vector in addition to other vectors, in order to successfully predict spatio-temporal visibility of dengue outbreaks.
Deriving a pre-processing computational framework for CDR data to be used for risk-based mobility models.
Identification of the most suitable data-driven risk-based mobility model for Dengue forecasting
Developing a platform to utilize human mobility on the state of the art dengue forecasting technologies for automating data flows, analyzing, visualizing and alerting.

Keywords: Mobile Networks Big Data, Epidemic modeling

Developing a Retrieval-based Tamil Language Chatbot for Closed Domain

A chatbot is a conversational system which interacts with human users via natural language. Based on the approach used for developing, the chatbot system can be categorised as retrieval systems and generative systems. Current research in low-resourced language chatbot systems focuses mainly on machine learning based retrieval models since generative approaches will require many language-related resources, tools and experts which is normally not available for low-resourced languages. The retrieval-based system picks a response from a fixed set of responses based on the input and context using a rule-based approach or by using machine learning classifiers. High inflexion and free word order pose key challenges to Tamil language chatbots. A practical challenge in using chatbots is that the user may not express entirely in Tamil but mixed with English. Currently available Tamil chatbots dominantly suffer from these challenges even for a closed domain. Hence, a suitable approach to develop a Tamil language chatbot for the closed domain need to be explored.

Travel Behaviour Analytics using GPS Probe Data for Public Transportation Services.

Global Navigation Satellite System (GNSS) is a technology that provides positioning, navigation, and timing services on a global or regional basis. There are several GNSS systems like GPS, GLONASS, Galileo, Beidou etc. GPS is a network consists of a series of satellites that are orbiting the earth at an altitude of 19,300 km, broadcasting signals to receivers on the ground. A receiver can determine its location using data from at least four satellites.  GPS tracking data over a specific time period can be used to analyse mobility behaviours in the selected context. Properly collected GPS data can be used to analyse driver behaviour, traffic and route information, to learn transport mode and to gain insights into public transport travel time variability. Public transport systems are put in place to provide a service to the citizens and the proper functioning of the public transport system is vital for a country. Monitoring of the service provided can be used to maintain and improve the Quality of Service. GPS data and other supplementary data obtained from vehicles used for public transport can be used for monitoring the service. The underlying process of analysing the mobility behaviour requires (1) a preprocessing pipeline which converts the data into transport-related indicators based on the sample size and the acquisition frequency, (2) a map matching to match the GPS data to digital map and (3) a data reduction subtask. Service quality monitoring can be done by visualizing the acquired data by conducting descriptive and diagnostic analytics and incsetain cases using machine learning techniques. The computational challenges that needs to be addressed to accomplish a usable visualization will be addressed in this project.

Keywords: Public Transportation, Intelligent Transport Systems(ITS), GPS Data Mining, Travel Behaviour Analytics


On Demand High Capacity Ride Sharing for Mobility on Demand (MoD) Systems

Mobility-on-demand (MoD) systems are emerging as a novel mode for urban mobility which provide users with a reliable mode of transportation that is catered to the individual needs. Ride sharing services provided by these mobility on demand systems provide not only a very personalized mobility experience but also present immense potential for positive societal impacts with reference to pollution, energy consumption, congestion, and etc. Ride sharing services primarily concern with picking up spatiotemporally distributed mobility demand and delivering it within a pre-specified time window subjected to different constraints. Large scale ride sharing in more sophisticated spatiotemporally distributed mobility demand distributions, require well designed mathematical models and algorithms in order to match riders and vehicle fleets in real time. In this research, the motive is to design and develop a dynamic model for ride sharing which is reactive anytime optimal and can perform dynamic vehicle assignment in an effective and efficient manner while being able to scale well with both sparse and dense spatio-temporal demand distributions.                

Keywords : Ride sharing, Human Mobility, Vehicle Routing, Smart Cities, Intelligent Transport Systems, Mobility on demand.


Customer profiling to improve service and management of mobility on demand systems.

In the modern era of big data, many systems which provide various services thrive to gather data related to the customer-system interaction. Mobility on demand systems can be recognized as a similar type of a system, which gathers a massive amount of data related to customers mobility on a daily basis. Identifying and characterizing different customer profiles within the customer base by analyzing this data related to the customer mobility is quite important for the strategic decision-making process. Hence, this project explores algorithms and techniques that are suitable to model customer related data of “mobility on demand” systems in order to profile customers to support predictive management and service enhancement of the system.   

Key-words : Data-Mining, Mobility on demand systems, Customer profiling, Customer segmentation 


Affect level opinion mining of Twitter Streams

Twitter is a social media platform which is used by millions of users to express their opinions freely. There are about 120,000 active twitter users in sri lanka. Because of the rapidly increasing number of tweets, mining people’s expressed opinions in tweets on interesting topics has attracted more and more attention. Mining of these opinions manually is an impossible task, thus we have to employ automated methods to summarize the opinions. Opinion of a tweet can be summarized at the level of sentiment polarity or more finer level of expressed emotion. In this research our goal is to develop an emotion analysis algorithm which can accurately recognize emotions in a given tweet and provide an approach to identify the emotion intensity for group of tweets related to a single topic.

Keywords: data-mining, opinion-mining, affect, social-media, twitter


Developing a Trip Distribution model for Identified Mobility Groups using Big Data

Transport infrastructure is an important component of the economy and a common tool used for the development. The satisfactory outcome of the transport depends on the effectiveness and the efficiency of infrastructure planning which involved in expensive and time consuming human intervention in current conventional approaches. Ubiquitous mobile usage and the massive data it generates presents new opportunities to assess the demand for this infrastructure, diagnose problems, and plan for the future. These data sources include passively collected data such as mobile phone network data (CDR, VLR data), smartphone GPS data and etc. Further, these newer data sources have the ability to complement conventional data as proven by the previous studies. However, before these benefits can be realized, methods must be found to integrate such new data sources with existing transportation planning frameworks such as widely used travel demand models like four step model and direct demand models. Therefore the current research study is continued to reformulate a comprehensive transport demand model based on new big data inputs

Keywords: Big Data, Machine Learning, CDR


Forecasting Agricultural Crop Yield using Remote Sensing Data & Machine Learning

In recent years, sustainability of the agriculture sector has been threatened due to devastating environmental hazards and severe climate conditions that has occurred all over Sri Lanka. Gathering data of environmental catastrophes and important environmental factors such as temperature, soil moisture & atmospheric humidity etc., estimating effects of them on agriculture are necessarily involved in highly error-prone & time consuming human interventions in current traditional approaches. Policies and decisions taken by authorities of the government and other stakeholders are highly susceptible to flaws of current approaches. The tendency to employ data science and remote sensing techniques together in the field of agriculture is limited in developing countries due to scarcity of remote sensing resources till recent years. The objective of this research is to explore the untouched synergy of the remote sensing and machine learning methodologies in order to enable a data driven policy & decision making culture in agriculture sector.

Keywords: Remote Sensing, Big Data, Machine Learning, Agriculture, Data Driven Decision Making