Title: Keyphrase extraction from single textual documents based on semantically defined background knowledge and co-occurrence graphs

Authors: Mauro Dalle Lucca Tosi; Julio Cesar Dos Reis

Addresses: Faculty of Science, Technology and Medicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg ' Institute of Computing and NIED, University of Campinas, Campinas, Brazil

Abstract: The keyphrase extraction task is a fundamental and challenging task designed to extract a set of keyphrases from textual documents. Keyphrases are essential to assist publishers in indexing documents and readers in identifying the most relevant ones. They are short phrases composed of one or more terms used to represent a textual document and its main topics. In this article, we extend our research on C-Rank, which is an unsupervised approach that automatically extracts keyphrases from single documents. C-Rank uses concept-linking to link concepts in common between single documents and an external background knowledge base. We advance our study over C-Rank by evaluating it using different concept-linking approaches - Babelfy and DBPedia Spotlight. We evaluated C-Rank on data sets composed of academic articles, academic abstracts, and news articles. Our findings indicate that C-Rank achieves state-of-the-art results extracting keyphrases from scientific documents by experimentally comparing it to existing unsupervised approaches.

Keywords: keyphrase extraction; complex networks; semantic annotation; keywords; concept linking; entity linking; entity ranking; natural language processing; graph.

DOI: 10.1504/IJMSO.2021.120284

International Journal of Metadata, Semantics and Ontologies, 2021 Vol.15 No.2, pp.121 - 132

Received: 08 Jul 2020
Accepted: 08 Jun 2021

Published online: 13 Jan 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article