Grammatical inference is the process of inferring languages from example sentences. This talk will briefly describe a development environment for spoken dialog systems (speech recognition applications) known as the Lyrebird system. The key feature of the Lyrebird system is that rather than requiring developers to program the spoken dialog system explicitly, they can provide a high level description of the application, and then provide examples of the language that would be used by speakers interacting with the developed application. From these example sentences the Lyrebird system infers a grammar that describes both the syntax and semantics of the language that speakers will use when interacting with the developed system.
This talk will describe the Lyrebird development environment, and a second-generation grammatical inference algorithm that will be integrated into the product. This algorithm, known as the Boisdale algorithm, has the property that it is guaranteed to learn a grammar of a particular class exactly after some finite time when presented with an infinite stream of example sentences generated from that language. This property is known as an "identification in the limit guessing strategy". The class of grammar that is inferred by the Boisdale algorithm is a sub-class of context-free unification grammar. Unification grammars are a class of grammar that can transform data structures representing the meaning of sentences into the sentences themselves and vice versa. When using the Boisdale algorithm no prior knowledge of the structure of the language is required, instead each sentence in the training set is tagged with its meaning using any data structure desired.