Title: A promising combination of approaches for solving complex text classification tasks: application to the classification of scientific papers into patents classes
Authors: Kafil Hajlaoui; Jean-Charles Lamirel; Pascal Cuxac
Addresses: INIST CNRS, Vandœuvre-lès-Nancy 54500, France ' INRIA team SYNALP-LORIA, Vandœuvre-lès-Nancy 54500, France ' INIST CNRS, Vandœuvre-lès-Nancy 54500, France
Abstract: This paper focuses on a subtask of the QUAERO research program, a major innovating research project related to the automatic processing of multimedia and multilingual content. The objective discussed in this paper is to propose a new method for the classification of scientific papers, developed in the context of an international patents classification plan related to the same field. The practical purpose of this work is to provide an assistance tool to experts in their task of evaluation of the originality and novelty of a patent, by offering to the latter the most relevant scientific citations. This issue raises new challenges in categorisation research as the patent classification plan is not directly adapted to the structure of scientific documents, classes have high citation or cited topic and that there is not always a balanced distribution of the available examples within the different learning classes.
Keywords: supervised classification; patents; KNN; K-nearest-neighbour; association rules; feature selection; feature maximisation metrics; text classification; scientific papers; patent classes; automatic processing; multimedia content; multilingual content; scientific citations.
International Journal of Knowledge and Learning, 2014 Vol.9 No.1/2, pp.142 - 163
Received: 12 Jun 2013
Accepted: 16 Jun 2014
Published online: 31 Jan 2015 *