An Intelligent Search Infrastructure
for Language Resources on the Web
Special Research Initiatives - E-Research SR0567353
Amount(s): 2005: AU$49,018; 2006: AU$49,018
Summary
Language occupies a central role on the web: most content is expressed in language, and
most access takes place via natural language search. Today, investigation of human language
depends on access to this vast store of language data. This project will develop new
infrastructure for accessing language resources, namely a language-aware search engine.
Language technologies will be employed to classify web content, and a special search
keyword 'lang:' will constrain search results to be in the specified language. The system
will be integrated with major language archives in Australia and overseas, and deployed
on the high performance computing infrastructure at Melbourne University's Advanced Research
Computing Centre.
Chief Investigators
Timothy Baldwin |
Steven Bird |
Baden Hughes
Technical Consultant
Gary Simons (SIL International)
Student Interns
Peter Lee (December 2005-February 2006)
Administrative Materials
- Original Proposal (contact Baden Hughes)
- Project Description [PDF]
- Project Activities
Software and Services
- LangGator
- OLAC Search API
- OLAC-Dot
Papers and Presentations
Papers
- Baden Hughes, to appear in 2006. Towards Effective and Robust Strategies for Finding Web Resources for Lesser Used Languages. Proceedings of Lesser Used Languages and Computer Linguistics 2005.
- Baden Hughes, 2006. A Web Search Service for Minority Language Communities. Proceedings of OpenRoad 2006. VicNET / State Library of Victoria.
- Baden Hughes, Timothy Baldwin, Steven Bird, Jeremy Nicholson and Andrew MacKinlay, 2006. Reconsidering Language Identification for Written Language Resources. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC2006). pp.485-488.
[Paper]
- Baden Hughes, 2006. Searching for Language Resources on the Web: User Behaviour in the Open Language Archives Community. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC2006). pp.601-604.
- Timothy Baldwin, Steven Bird and Baden Hughes, 2006. Collecting Low-Density Language Materials on the Web. Proceedings of 12th Australasian Web Conference (AusWeb06). Southern Cross University Press.
[Paper]
- Mike Maxwell and Baden Hughes, to appear in 2006 (July). Frontiers in Linguistic Annotation for Lower-Density Languages. Proceedings of COLING/ACL2006 Workshop on Frontiers in Linguistically Annotated Corpora. Association for Computational Linguistics.
- Baden Hughes, to appear in 2006 (September). The Linguistic Diversity of the Web's Content: Recent Evidence. Proceedings of Internet Research 7.0: Internet Convergences (AOIR 2006). Association of Internet Research.
Talks
- HCSNet Data Workshop (21/10/2005) [Slides]
- ARC/ARIIC Workshop (2-3/11/2005) [Poster]
- Global Information Infrastructure Laboratory / Language Observatory Project visit (19-20/3/2006)
Related Events
Related Projects
Last Updated:
Mon Jan 23 13:17:34 EST 2006