Title: Query classification using Wikipedia

Authors: Richard Khoury

Addresses: Department of Software Engineering, Lakehead University, 955 Oliver Road, Thunder Bay, Ontario, P7B 5E1, Canada

Abstract: Identifying the intended topic that underlies a user|s query can benefit a large range of applications, from search engines to question-answering systems. However, query classification remains a difficult challenge due to the variety of queries a user can ask, the wide range of topics users can ask about, and the limited amount of information that can be mined from the query. In this paper, we develop a new query classification system that accounts for these three challenges. Our system relies on the freely-available online encyclopedia Wikipedia as a natural-language knowledge-based, and exploits Wikipedia|s structure to infer the correct classification of any given query. We will present two variants of this query classification system in this paper, and demonstrate their reliability compared to each other and to the literature benchmarks using the query sets from the KDD CUP 2005 and TREC 2007 competitions.

Keywords: natural language processing; NLP; query classification; database systems; information retrieval; intelligent information systems; knowledge-based systems; web-based information systems; Wikipedia.

DOI: 10.1504/IJIIDS.2011.038969

International Journal of Intelligent Information and Database Systems, 2011 Vol.5 No.2, pp.143 - 163

Received: 26 Feb 2010
Accepted: 20 Jun 2010

Published online: 21 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article