Title: Improving named entity recognition and disambiguation in news headlines

Authors: Jayendra Barua; Rajdeep Niyogi

Addresses: Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, Uttarakhand, India ' Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, Uttarakhand, India

Abstract: In this paper, we present a framework for extraction and disambiguation of hyphenated and partially named entities in news headlines. The direct application of state-of-the-art named entity detection and disambiguation approaches on news headlines results in significantly degraded performance due to different headline formatting in comparison with regular text; hyphenated mentions; and partial entity mentions. In this paper, we introduce a novel framework that assists existing named entity recognition and disambiguation systems to deal with introduced challenges. In particular, we deal with hyphenated entity mentions and partial entity mentions present in news headlines. We modify the hyphenated and partial entity in a way that increases the probability of disambiguation to correct entity in knowledge base. Our framework leverages headlines of recent past to improve the entity mentions in headlines. The experimental results showed that presented framework improves the F1-score of mention detection by 12% and 9% in state-of-the-art Stanford and Illinois NER systems, whereas F1-score of disambiguation is improved by 9%, 12%, 7% and 5% in AIDA, Wikifier, TagMe, and YODIE state-of-the-art NED systems respectively.

Keywords: information retrieval; named entity disambiguation; mention detection; mention modification; news headlines; natural language processing.

DOI: 10.1504/IJIIDS.2019.104530

International Journal of Intelligent Information and Database Systems, 2019 Vol.12 No.4, pp.279 - 303

Received: 20 Feb 2018
Accepted: 26 Mar 2019

Published online: 17 Jan 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article