ACL/HCSNet Advanced Program in
Natural Language Processing

University of Melbourne, 10-14 July 2006

Alistair Moffat: Information Retrieval Nuts and Bolts

Abstract

This tutorial describes the key implementation issues associated with IR systems, including index construction, index representation, and similarity computations, with particular attention paid to efficient techniques that can be scaled to collections in the terabyte range. Practical efficiency is a key theme, and where appropriate, tradeoffs between efficiency and effectiveness are evaluated, and recommendations provided. The key components described this tutorial have been used in practice in TREC (Text Retrieval Conference) experimentation over many years, and continue to support active research into information retrieval techniques. Topics covered: inverted file indexing; index coding using bit and byte aligned codes; types of queries; fundamentals of similarity computations; implementation of similarity computations; reducing the memory requirements; increasing query throughput; handling phrase queries; distributed retrieval systems; web retrieval techniques.

Biographical Sketch

Professor Alistair Moffat has been a member of the academic staff at the University of Melbourne since 1986. Current research interests continue to focus on topics in the areas of efficient coding, index structures and representations, and the implementation of large-scale retrieval systems. Alistair has published more than 140 refereed papers in these and a range of other research areas, and is an author of three books: Managing Gigabytes: Compressing and Indexing Documents and Images (second edition, Morgan Kaufmann, 1999); Compression and Coding Algorithms (Kluwer, 2002); and Programming, Problem Solving, and Abstraction with C (Pearson SprintPrint, 2003). Alistair was a program chair for the SIGIR conference in 2005, and serves on a range of editorial and advisory boards.


ACL/HCSNet Advanced Program in Natural Language Processing