Title: A promising combination of approaches for solving complex text classification tasks: application to the classification of scientific papers into patents classes

Authors: Kafil Hajlaoui; Jean-Charles Lamirel; Pascal Cuxac

Addresses: INIST CNRS, Vandœuvre-lès-Nancy 54500, France ' INRIA team SYNALP-LORIA, Vandœuvre-lès-Nancy 54500, France ' INIST CNRS, Vandœuvre-lès-Nancy 54500, France

Abstract: This paper focuses on a subtask of the QUAERO research program, a major innovating research project related to the automatic processing of multimedia and multilingual content. The objective discussed in this paper is to propose a new method for the classification of scientific papers, developed in the context of an international patents classification plan related to the same field. The practical purpose of this work is to provide an assistance tool to experts in their task of evaluation of the originality and novelty of a patent, by offering to the latter the most relevant scientific citations. This issue raises new challenges in categorisation research as the patent classification plan is not directly adapted to the structure of scientific documents, classes have high citation or cited topic and that there is not always a balanced distribution of the available examples within the different learning classes.

Keywords: supervised classification; patents; KNN; K-nearest-neighbour; association rules; feature selection; feature maximisation metrics; text classification; scientific papers; patent classes; automatic processing; multimedia content; multilingual content; scientific citations.

DOI: 10.1504/IJKL.2014.067187

International Journal of Knowledge and Learning, 2014 Vol.9 No.1/2, pp.142 - 163

Received: 12 Jun 2013
Accepted: 16 Jun 2014

Published online: 31 Jan 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article