From Human Communication to Intelligent User Interfaces: Corpora of Spoken Estonian

We argue for the necessity of studying human-human spoken conversations of various kinds in order to create user interfaces to databases. An efficient user interface benefits from a well-organized corpus that can be used for investigating the strategies people use in conversations in order to be efficient and to handle the spoken communication problems. For modeling the natural behaviour and testing the model we need a dialogue corpus where the roles of participants are close to the roles of the dialogue system and its user. For that reason, we collect and investigate the Corpus of the Spoken Estonian and the Estonian Dialogue Corpus as the sources for human-human interaction investigation. The transcription conventions and annotation typology of spoken human-human dialogues in Estonian are introduced. For creating a user interface the corpus of one institutional conversation type is insufficient, since we need to know what phenomena are inherent for the spoken language in general, what means are used only in certain types of the conversations and what are the differences
Published in 2008