Title: Multi term based co-term frequency method for term weighting in information retrieval

Authors: M. Santhanakumar; C. Christopher Columbus; K. Jayapriya

Addresses: Department of Computer Science and Engineering, PSN College of Engineering and Technology, Melathediyoor, Tirunelveli, 627152 Tamilnadu, India ' Department of Computer Science and Engineering, PSN College of Engineering and Technology, Melathediyoor, Tirunelveli, 627152 Tamilnadu, India ' Department of Computer Science and Information Technology, Nadar Saraswathi College of Arts and Science, Theni, 625531 Tamilnadu, India

Abstract: Nowadays, World Wide Web (WWW) has become the only source of all kind of information. Retrieving the relevant web pages based on user queries from WWW is an exigent task. Term frequency inverse document frequency (TF-IDF) is the most frequently used method for term weighting based on the occurrences and presence of a term inside the document. Retrieved document based on a single query term may not relate to the user search. This may lead the user to process the unwanted information. So, this paper proposes a new term weighting method named co-term frequency, in which the weight is assigned according to the multi terms which commonly occur in all documents. From the measures of precision, recall and F-score of the proposed method, it is clearly evident that the proposed framework retrieves the most relevant web pages when compared to other term weighting methods.

Keywords: inverse document frequency; IDF; co-term frequency; CTF; term weighting; World Wide Web; WWW; precision; term frequency; F-score; recall.

DOI: 10.1504/IJBIS.2018.091164

International Journal of Business Information Systems, 2018 Vol.28 No.1, pp.79 - 94

Received: 03 Aug 2016
Accepted: 09 Oct 2016

Published online: 13 Apr 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article