Exploring Social Annotations for Information Retrieval

Recently, social annotation has been gaining increasing popularity in many Web-based applications, leading to an emerging research area in text analysis and information retrieval. This paper is concerned with developing probabilistic models and computational algorithms for social annotations. We propose a unified framework to combine the modeling of social annotations with the language modeling-based methods for information retrieval. The proposed approach consists of two steps: (1) we seek to discover topics in the contents and annotations of documents while categorizing the users by domains; and (2) we enhance document and query language models by incorporating user domain interests as well as topical background models. Differences in user domain expertise are also considered when combining the discovered user domain interests. In particular, we propose a new general generative model for social annotations, which is then simplified to a computationally trackable hierarchical Bayesian network. Then we apply smoothing techniques in a risk minimization framework to incorporate the topical information to language models. Experiments are carried out on a real-world annotation data set sampled from del.icio.us. Our results demonstrate significant improvements over the alternative approaches without consideration of topical information, social annotations, user expertise, or simple incorporation of topic analysis.
Published in 2008