|
Natural Language Processing University of Melbourne, 10-14 July 2006 |
|
Statistical approaches dominate the field of coreference resolution. However, the performance of coreference resolution classifiers is far from satisfying. This course attempts to address some of the issues needed to overcome the plateauing in performance observed in the literature.
We will begin with a brief overview of non-statistical approaches and then turn to issues related to data, annotation and evaluation. A baseline system (Soon et al., CL 2001) will be used to introduce the paradigm of pairwise binary classification and different strategies for instance generation (Ng & Cardie, ACL 2002). While this describes the state-of-the-art, we then turn to research addressing problems with this baseline approach. Yang et al. (ACL 2003) and Luo et al. (ACL 2004) both address the problem of representation. Kehler et al. (NAACL 2004) and Yang et al. (ACL 2005) use information about predicate argument structure while Ponzetto & Strube (NAACL 2006) use semantic role labels to improve pronoun resolution. Common noun resolution can be improved by mining WordNet (Harabagiu et al., NAACL 2001) and Wikipedia (Ponzetto & Strube, NAACL 2006). We will conclude with an overview of related work on bridging and metonymy resolution, and applications using coreference resolution for determining coherence.
Michael Strube is group leader of the NLP group at EML Research, a privately funded research institute in Heidelberg, Germany. The NLP group focuses on the areas of semantics, pragmatics, and discourse and applications such as summarization and information extraction.
Michael Strube completed his Ph.D. on reference resolution at the end of 1996 at the University of Freiburg, Germany. From 1997-1999 he was a postdoctoral fellow at the University of Pennsylvania; in 2000 he joined EML Research. Since 1998 he has worked on also on spoken dialogue, and since 2000 on multimodal dialogue. His research interests also include natural language generation, lexical semantics and issues related to the annotation of corpora.