ACL/HCSNet Advanced Program in
Natural Language Processing

University of Melbourne, 10-14 July 2006

Stephen Clark: Statistical Parsing with Combinatory Categorial Grammar

Abstract

This tutorial will describe how to build a statistical parser using the grammar formalism Combinatory Categorial Grammar (CCG). The tutorial will begin with an introduction to CCG and give some motivation for why we want to use this formalism for parsing. I will also describe some of the difficulties, including the notion of "spurious" ambiguity.

The second part of the tutorial will consider the statistical modelling problem, focusing on log-linear parsing models. I will introduce a dependency parsing model for CCG which uses all of CCG's additional derivations, including the "spurious" ones, and which uses a novel decoding algorithm to find the highest scoring dependency structure. The parsing models are estimated using discriminative estimation techniques, and I will show how a parallel implementation of a general-purpose numerical optimisation algorithm, run on a Beowulf cluster, can be used to estimate the models.

An important component of the parser is a CCG "supertagger" which assigns lexical categories to words in a sentence. The supertagger also uses log-linear models, and, since supertagging is "almost parsing", leads to a highly efficient parser.

Common themes running throughout the tutorial will be the use of log-linear modelling and the use of dynamic programming to perform model estimation and decoding.

Biographical Sketch

Stephen Clark is a lecturer in Computer Science at the Oxford University Computing Laboratory and a Fellow of Keble College. He moved to Oxford in 2004 from the University of Edinburgh's School of Informatics where he spent four years as a postdoctoral researcher working with Professor Mark Steedman. He obtained his DPhil in Artificial Intelligence from the University of Sussex in 2001, and has a first degree in Philosophy from the University of Cambridge. His research interests are in data-driven approaches to Natural Language Processing, including statistical parsing and tagging, statistical and example-based machine translation, and distributional approaches to semantic similarity.


ACL/HCSNet Advanced Program in Natural Language Processing