Title: Identifying journalistically relevant social media texts using human and automatic methodologies

Authors: Nuno Guimarães; Filipe Miranda; Álvaro Figueira

Addresses: CRACS - INESC TEC & University of Porto, Porto, Portugal ' Faculty of Engineering, University of Porto, Porto, Portugal ' INESC TEC & University of Porto, Porto, Portugal

Abstract: Social networks have provided the means for constant connectivity and fast information dissemination. In addition, real-time posting allows a new form of citizen journalism, where users can report events from a witness perspective. Therefore, information propagates through the network at a faster pace than traditional media reports it. However, relevant information is a small percentage of all the content shared. Our goal is to develop and evaluate models that can automatically detect journalistic relevance. To do it, we need solid and reliable ground truth data with a significantly large quantity of annotated posts, so that the models can learn to detect relevance over all the spectrum. In this article, we present and confront two different methodologies: an automatic and a human approach. Results on a test data set labelled by experts' show that the models trained with automatic methodology tend to perform better in contrast to the ones trained using human annotated data.

Keywords: relevance detection; machine learning; text mining; crowdsourcing task; data mining; human annotation; automatic labelling; natural language processing; supervised models; event detection.

DOI: 10.1504/IJGUC.2020.103971

International Journal of Grid and Utility Computing, 2020 Vol.11 No.1, pp.72 - 83

Received: 20 Jun 2018
Accepted: 05 Nov 2018

Published online: 03 Dec 2019 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article