Shallow Discourse Genre Annotation in CallHome Spanish

The classification of speech genre is not yet an established task in language technologies. However we believe that it is a task that will become fairly important as large amounts of audio (and video) data become widely available. The technological cability to easily transmit and store all human interactions in audio and video could have a radical impact on our social structure. The major open question is how this information can be used in practical and beneficial ways. As a first approach to this question we are looking at issues involving information access to databases of human-human interactions. Classification by genre is a first step in the process of retrieving a document out of a large collection. In this paper we introduce a local notion of speech activities that are exist side-by-side in conversations that belong to speech-genre: While the genre of CallHome Spanish is personal telephone calls between family members the actual instances of these calls contain activities such as storytelling, advising, interrogation and so forth. We are presenting experimental work on the detection of those activities using a variety of features. We have also observed that a limited number of distinguised activities can be defined that describes most of the activities in this database in a precise way
Published in 2000