TEL-Forum Meeting

Tuesday 21 October, 2003     2.00 - 5.15pm
Dept Computer Science & Software Engineering
ICT Building, University of Melbourne, 111 Barry St, Carlton

In attendance

Steven Bird, Baden Hughes, Cathy Bow, Nick Nicholas, Dafydd Gibbon, Dagmar Jung, Nick Thieberger, Leila Behrens, Simon Musgrave, Georgina Heydon.

GOAL
The goal of this group is to build collaboration between Melbourne-based researchers interested in technology for language documentation.

AGENDA

1. Each attendee gave a brief introduction to their own interest or involvement in TLD.  Then there was discussion about the agenda for the meeting.

2. Nick Thieberger gave an update on the Paradisec project, whose funding was not renewed this year.  They are seeking further funding, since they have developed a useful workflow which they would like to both continue and pass on to others. Much of the work proposed for the one-year funding has been completed, with 300 hours of tapes 'ingested', and OLAC compliant metadata registered.  They are looking for old photographs of researchers working with tape recorders - contact Nick directly if you can help.  

3. Nick Nicholas reported on his work with Thesaurus Linguae Graecae, a digital library of classical Greek texts, where he is working on lemmatising corpora.  He also briefly described  his work with John Hajek on a phonological and typological survey of Pacific languages.

4. Georgina Heydon described her project to work with hundreds of hours of recorded interviews, specifically looking at discourse features.  She would like to store the data in such a way that researchers with different interests could also access the resources, and was seeking advice on what tools may be appropriate for transcription and annotation.  Audiamus, SyncWriter and TASX Annotator were all suggested, as well as the use of the ANDOSL for population data for Australian English, required for forensic work on voice recognition and identification.  She is seeking an IT partner, possibly through Arts-IT funding at Monash, with other connections to Kings College London.  

5. Leila Behrens raised a question regarding database management.  The issue was postponed for later discussion.

6. Dafydd Gibbon gave a short presentation about the Ega language, available here, as well as demonstrating the use of a PalmPilot in fieldwork.  

7. Steven Bird introduced the need for lightweight processing of legacy data, using the example of Shoebox's difficulties with character encoding.  He presented a simple solution in Python which added a new line to a Shoebox record indicating Unicode correspondences to character codes.  The program should be extended to include the most commonly used IPA fonts (such as IPAKiel, SILDoulos fonts, etc.)

After a short break:
8. Simon Musgrave gave a brief presentation from his contribution to the Digital Resources and Humanities Conference in the UK last month, available here.  Some discussion ensued regarding the dangers of conversion, as well as the availability of XML native databases.

9. Baden Hughes reported on the Indigenous Communications Forum in Canberra, explaining the criteria set for government funding in this area, some of which is not evident from tender documents.  He says there will be funding available for projects in the future, where linguistic applications fit into a wider social context of community development in certain regions.

10.  Cathy Bow briefly advertised the upcoming ALTA Summer School and Workshop to be held in the CSSE department from December 8-12, and encouraged people to publicise it in their departments and to interested postgraduates.

DISCUSSION ISSUES
11. The first item was discussion of the organisation of linguistic databases, as raised by Leila. The question was how to model databases to incorporate changing information.  There seems to be a gulf between the linguistic approach - enter all data into some kind of database and worry about the structure later - and the computer science approach, which involves working out a data model before building a database.   XML is a useful construct as it supports semi-structured data, and tools need to support flexible entry and different views and querying (e.g. discourse, phonology...)

12.  Training needs and opportunities were discussed, as it was noted that some linguists are unnecessarily intimidated by computers, yet many people request advice and/or training on new ways to deal with linguistic data ("I've got all this stuff, what do I do with it?").  Ideally, training resources should be designed to be reusable, possibly creating on-line resources which codify and document what we know.  There is a need for ongoing exchange between designers and users of software tools, and a feedback loop should be both timely and responsive.  With the passing of the ALI as we know it, there is no obvious forum for offering short training courses, so it was suggested that we attach training opportunities to existing events (such as ALS, FEL ...).
Simon reported on Monash's plans to establish a MA/Grad Dip in Language Technology Studies in Endangered Languages. Margaret Florey is preparing this course, which will be based on existing components, and should be available for 2006.  Simon will be offering a course in data management for linguists in semester 2, 2004.  He also advertised two Honours scholarships for next year for the Maluku project.
Baden raised the issue of indigenous training, as AIATSIS supports ventures of this type, and these could be centralised (e.g. Batchelor) or regional.  Georgina suggested some training could also expand to other fields, such as historians, geographers, anthropologists, educators, etc. The Australian Academy of Humanities is a potential source of sponsorship for training events.
Dafydd asked if there was a market for speech technology projects, but the size of indigenous language groups makes this not viable.
As the discussion raised many interesting issues but with no clear outcomes, it was suggested that  a smaller group reconvene to discuss further at a later date.  Those interested were Nick T, Simon, Nick N, Baden, Cathy and Steven.

The meeting formally closed around 5.15, with some informal discussion ensuing.

While no further meeting was scheduled, it is suggested that we meet again around February/March next year.   TEL-Forum now maintains a website at http://www.cs.mu.oz.au/research/lt/tel-forum/.  Any other names to be added to the mailing list should be referred to Cathy Bow.