Natural Language Processing
University of Melbourne, 10-14 July 2006
This tutorial will lay the groundwork for the remainder of the statistical parsing theme. I will begin with a brief introduction to parsing and its applications, and then review context free grammars (CFGs) and parsing algorithms (bottom-up, top-down, Earley and CKY). I will also describe standard datasets and evaluation techniques. This will motivate the need for statistical approaches to parsing.
I will then discuss statistical parsing with Probabilistic CFGs, including how these generative models are estimated and smoothed, and how the most probable parse is determined using dynamic programming. I will then discuss the types of information we can include in the model: head lexicalisation and the Charniak/Collins models; lexical information and whether it is necessary; and parents and grandparent nodes. I will finish by mentioning of agenda-based parsing algorithms (e.g. A* parsing) and parse re-ranking techniques.
James Curran is an ARC Australian Postdoctoral Fellow in the School of Information Technologies at the University of Sydney, Australia.
His research interests include statistical approaches to Natural Language Processing (NLP) ranging from theoretical and low-level component development through to high-level systems development in Question Answering and Information Extraction. He is also interested in developing techniques for scaling language technology to massive corpora.
James received his PhD from the School of Informatics, University of Edinburgh in 2004, supervised by Marc Moens. His thesis was on calculating semantic similarity using distributional similarity techniques on large corpora.