Unsupervised and Weakly-supervised Methods for Semantic Parsing

In recent years, there has been an increasing interest in statistical approaches to semantic parsing. However, most of this research has focused on supervised methods requiring large amounts of data labeled by human experts. Such annotated resources are scarce and expensive to create.  In this project, we develop un- and weakly-supervised approaches to semantic parsing. First results of this work are presented in Titov and Klementiev (ACL, 2011) and  Titov and Kozhevnikov (ACL, 2010).    Members working on this project:   Ivan Titov,  Alexandre Klementiev, Mikhail Kozhevnikov.

Exploiting Linguistic Knowledge and Partially Labeled Data for Wide-Coverage Semantic Parsing

The research is focused on statistical methods for inducing structured semantic representation using labelled and unlabelled data, and different types of prior linguistic knowledge. The goal of this project is to develop a wide-coverage semantic parser useful for a variety of natural language processing tasks including question answering, information extraction and textual entailment. The methods used in this project are be based on recent advances in non-parametric Bayesian models for structured prediction and techniques for injecting prior knowledge in models with latent variables.  This work is done in collaboration with Dr. Caroline Sporleder  and Dr. Alexis Palmer.   Members working on this project: Ivan Titov and Ashutosh Modi.

Learning  Latent Representations for Domain-Adaptation

Most learning algorithms operate under the assumption that both training and test data originate from the same distribution, though in practice this assumption is often violated. The difference between training and test data distributions typically results in a significant drop in accuracy.  We propose to tackle this problem by inducing common feature representation  generalizable across domains.  We achieve this by learning a statistical model with distributed latent representation while constraining the 'variability' of this representation across domains (more formally, enforcing that the marginal distribution of each latent feature does not vary significantly across domains).  First results on the standard sentiment classification dataset are reported in Titov (ACL, 2011).   Members working on this project: Ivan Titov.

Distributed Latent Representation for Learning Syntax and Semantics

Most natural language processing research focuses on linear models, and learning and inference with linear models is well understood both from practical and theoretical points of view. However, linear models are sufficiently powerful to solve complex problems only if the feature representation is rich enough to include important predictive features and, thus, such models require extensive feature engineering.  One way to address this problem is to use (vectors of) latent variables to represent  interactions between the elementary features.   In this project, we develop such latent variable models for syntactic and semantic parsing tasks.  Relevant previous and more recent publications include   (Titov and Henderson, IWPT 2007),   (Titov and Henderson, CoNLL 2007),   (Titov and Henderson, ACL 2007),   (Henderson et al, CoNLL 08),   (Titov et al., IJCAI 2009)  and (Henderson and Titov, JMLR 2010).   Members working on this project: Ivan Titov.