Online Evaluation of Coreference Resolution

This paper presents the design of an online evaluation service for coreference resolution in texts. We argue that coreference, as an equivalence relation between referring expressions (RE) in texts, should be properly distinguished from anaphora and has therefore to be evaluated separately. The annotation model for coreference is based on links between REs. The program presented in this article compares two such annotations, which may be the output of coreference resolution tools or of human judgement. In order to evaluate the agreement between the two annotations, the evaluator first converts the input annotation format into a pivot format, then abstracts equivalence classes from the links and provides five scores representing in different ways the similarity between the two partitions: MUC, B3, Kappa, Core-discourse-entity, and Mutual-information. Although we consider that the identification of REs (i.e. the elements of the partition) should not be part of coreference resolution properly speaking, we propose several solutions for the frequent case when the input files do not agree on the elements of the text to consider as REs
Published in 2004