|
Natural Language Processing University of Melbourne, 10-14 July 2006 |
|
The distributional hypothesis that similar words appear in similar contexts can be exploited computationally to calculate the semantic distance between terms. Many approaches to calculating distributional similarity have been explored in the literature. These approaches differ in terms of their definition of 'context' and their method of calculating distances between distributions. The recent explosion in the amount of raw text available has opened new opportunities and challenges for distributional similarity research.
This tutorial will present an overview of distributional similarity research, review and compare the main approaches, discuss evaluation and applications, and speculate on areas for future research.
James Curran is an ARC Australian Postdoctoral Fellow in the School of Information Technologies at the University of Sydney, Australia.
His research interests include statistical approaches to Natural Language Processing (NLP) ranging from theoretical and low-level component development through to high-level systems development in Question Answering and Information Extraction. He is also interested in developing techniques for scaling language technology to massive corpora.
James received his PhD from the School of Informatics, University of Edinburgh in 2004, supervised by Marc Moens. His thesis was on calculating semantic similarity using distributional similarity techniques on large corpora.