Reusable Lexical Representations for Idioms

In this paper I introduce (1) a technically simple and highly theory-independent way for lexically representing flexible idiomatic expressions, and (2) a procedure to incorporate these lexical representations in a wide variety of NLP systems. The method is based on Structural EQuivalence Classes for Idioms and therefore called the SEQCI method. I illustrate the approach using the Rosetta MT system as an example of an NLP system. I discuss the advantages and some possible objections to the method. I conclude that the method is a good candidate for a standard for the lexical representation of idioms. The method also has the potential to be used for multi-word expressions other than idioms
Published in 2004