Query-Sets: Using Implicit Feedback and Query Patterns to Organize Web Documents

In this paper we present a new document representation model based on implicit user feedback obtained from search engine queries. The main objective of this model is to achieve better results in non-supervised tasks, such as clustering and labeling, through the incorporation of usage data obtained from search engine queries. This type of model allows us to discover the motivations of users when visiting a certain document. The terms used in queries can provide a better choice of features, from the user's point of view, for summarizing the Web pages that were clicked from these queries. In this work we extend and formalize as "query model" an existing but not very well known idea of "query view" for documents representation. Furthermore, we create a novel model based on "frequent query patterns" called the "query-set model". Our evaluation shows that both "query-based" models outperform the vector-space model when used for organizing into topics documents in a website. In our experiments, the query-set model reduces by more than 90% the number of features needed to represent a set of documents and improves by over 90% the quality of the results. We believe that this can be explained because our model chooses better features and provides more accurate labels according to the user's expectations.
Published in 2008