An Environment for Dialogue Corpora Collection (ENDIACC)

In this paper we present an environment for dialogue corpora collection (ENDIACC) being a part of our long term program consisting in development of methodology and tools to design systems with Emulated Language Competence (ELC systems). ELC systems are those able to communicate interactively with their human users in the human language. The key point of our research is development of a methodology for systematic studies of the human user interacting with a machine or with another human. The methods of acquisition of the initial linguistic knowledge, necessary at the early steps of the design of ELC systems, are being systematically implemented for Polish language at the Adam Mickiewicz University. The element we focus on in this presentation is an open experimental setting to generate empirical data about NL dialogues in form of dialogue corpora. The main problem with the corpus-based empirical approach consists in absence of easy and inexpensive way of collecting naturally generated dialogue recordings. The problem may partially be solved by designing experiments were natural dialogues could be registered. The novelty of the proposal presented in this paper consists in proposing a free, easily accessible, language independent software platform ENDIACC (ENvironment for DIAlogue Corpora Collection) to provide an experimental setting for text mode written (keyboard) dialogue corpora collection. This platform is particularly well adapted to the collection of corpora of chat-like dialogues in text mode combined with MMS-like technologies. The system requires a graphical operating system (e.g. Windows or Linux) with a Java interpreter. It will be free accessible for research purposes from http://main.amu.edu.pl/~zlisi
Published in 2004

Authors