TEL-Forum Meeting
Tuesday 21 October, 2003 2.00 - 5.15pm
Dept Computer Science & Software Engineering
ICT Building, University of Melbourne, 111 Barry St, Carlton
In attendance
Steven Bird, Baden Hughes, Cathy Bow, Nick Nicholas, Dafydd
Gibbon, Dagmar Jung, Nick Thieberger, Leila Behrens, Simon Musgrave,
Georgina Heydon.
GOAL
The goal of this group is to build collaboration between
Melbourne-based researchers interested in technology for language documentation.
AGENDA
1. Each attendee gave a brief introduction to their own interest
or involvement in TLD. Then there was discussion about the agenda
for the meeting.
2. Nick Thieberger gave an update on the Paradisec project, whose funding
was not renewed this year. They are seeking further funding, since
they have developed a useful workflow which they would like to both continue
and pass on to others. Much of the work proposed for the one-year funding
has been completed, with 300 hours of tapes 'ingested', and OLAC compliant
metadata registered. They are looking for old photographs of researchers
working with tape recorders - contact Nick directly if you can help.
3. Nick Nicholas reported on his work with Thesaurus Linguae Graecae, a digital
library of classical Greek texts, where he is working on lemmatising corpora.
He also briefly described his work with John Hajek on a phonological
and typological survey of Pacific languages.
4. Georgina Heydon described her project to work with hundreds of hours
of recorded interviews, specifically looking at discourse features. She
would like to store the data in such a way that researchers with different
interests could also access the resources, and was seeking advice on what
tools may be appropriate for transcription and annotation. Audiamus,
SyncWriter and TASX Annotator were all suggested, as well as the use of
the ANDOSL for population data for Australian English, required for forensic
work on voice recognition and identification. She is seeking an IT
partner, possibly through Arts-IT funding at Monash, with other connections
to Kings College London.
5. Leila Behrens raised a question regarding database management. The
issue was postponed for later discussion.
6. Dafydd Gibbon gave a short presentation about the Ega language,
available here,
as well as demonstrating the use of a PalmPilot in fieldwork.
7. Steven Bird introduced the need for lightweight processing of legacy
data, using the example of Shoebox's difficulties with character encoding.
He presented a simple solution in Python which added a new line to
a Shoebox record indicating Unicode correspondences to character codes. The
program should be extended to include the most commonly used IPA fonts
(such as IPAKiel, SILDoulos fonts, etc.)
After a short break:
8. Simon Musgrave gave a brief presentation from his contribution to
the Digital Resources and Humanities Conference in the UK last month,
available
here. Some discussion ensued regarding the dangers of conversion,
as well as the availability of XML native databases.
9. Baden Hughes reported on the Indigenous Communications Forum in
Canberra, explaining the criteria set for government funding in this area,
some of which is not evident from tender documents. He says there
will be funding available for projects in the future, where linguistic
applications fit into a wider social context of community development in
certain regions.
10. Cathy Bow briefly advertised the upcoming ALTA Summer School and
Workshop to be held in the CSSE department from December 8-12, and
encouraged people to publicise it in their departments and to interested
postgraduates.
DISCUSSION ISSUES
11. The first item was discussion of the organisation of linguistic
databases, as raised by Leila. The question was how to model databases to
incorporate changing information. There seems to be a gulf between
the linguistic approach - enter all data into some kind of database and
worry about the structure later - and the computer science approach, which
involves working out a data model before building a database. XML
is a useful construct as it supports semi-structured data, and tools need
to support flexible entry and different views and querying (e.g. discourse,
phonology...)
12. Training needs and opportunities were discussed, as it was
noted that some linguists are unnecessarily intimidated by computers,
yet many people request advice and/or training on new ways to deal with
linguistic data ("I've got all this stuff, what do I do with it?"). Ideally,
training resources should be designed to be reusable, possibly creating
on-line resources which codify and document what we know. There is
a need for ongoing exchange between designers and users of software tools,
and a feedback loop should be both timely and responsive. With the
passing of the ALI as we know it, there is no obvious forum for offering
short training courses, so it was suggested that we attach training opportunities
to existing events (such as ALS, FEL ...).
Simon reported on Monash's plans to establish a MA/Grad Dip in Language
Technology Studies in Endangered Languages. Margaret Florey is preparing
this course, which will be based on existing components, and should be available
for 2006. Simon will be offering a course in data management for linguists
in semester 2, 2004. He also advertised two Honours scholarships for
next year for the Maluku project.
Baden raised the issue of indigenous training, as AIATSIS supports
ventures of this type, and these could be centralised (e.g. Batchelor)
or regional. Georgina suggested some training could also expand
to other fields, such as historians, geographers, anthropologists, educators,
etc. The Australian Academy of Humanities is a potential source of sponsorship
for training events.
Dafydd asked if there was a market for speech technology projects,
but the size of indigenous language groups makes this not viable.
As the discussion raised many interesting issues but with no clear
outcomes, it was suggested that a smaller group reconvene to discuss
further at a later date. Those interested were Nick T, Simon, Nick
N, Baden, Cathy and Steven.
The meeting formally closed around 5.15, with some informal discussion
ensuing.
While no further meeting was scheduled, it is suggested that
we meet again around February/March next year. TEL-Forum now
maintains a website at http://www.cs.mu.oz.au/research/lt/tel-forum/.
Any other names to be added to the mailing list should be referred to
Cathy Bow.