Title: Document clustering based on web search hit counts

Authors: Masaya Kaneko; Shusuke Okamoto; Masaki Kohana; You Inayoshi

Addresses: Graduate School of Science and Technology, Seikei University, 3-3-1, Kichijoji-Kitamachi, Musashino-shi, Tokyo, 180-8633, Japan ' Graduate School of Science and Technology, Seikei University, 3-3-1, Kichijoji-Kitamachi, Musashino-shi, Tokyo, 180-8633, Japan ' Graduate School of Science and Technology, Seikei University, 3-3-1, Kichijoji-Kitamachi, Musashino-shi, Tokyo, 180-8633, Japan ' Graduate School of Science and Technology, Seikei University, 3-3-1, Kichijoji-Kitamachi, Musashino-shi, Tokyo, 180-8633, Japan

Abstract: This paper describes a web mining method for clustering research documents automatically. Web hit counts of AND-search for two words are used to form a document feature vector. Target documents are clustered using the k-means clustering method twice, in which cosine similarity is used to calculate the distance measure.

Keywords: document clustering; web mining; web hit counts; business intelligence; data mining; web search; web hit counts; search hit counts; information retrieval; research documents; k-means clustering.

DOI: 10.1504/IJBIDM.2013.055787

International Journal of Business Intelligence and Data Mining, 2013 Vol.8 No.1, pp.61 - 73

Published online: 28 Jun 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article