Title: Streaming trend detection in Twitter

Authors: James Benhardus; Jugal Kalita

Addresses: Centre for Cognitive Science, University of Minnesota, Minneapolis, MN 55455, USA ' Department of Computer Science, University of Colorado, Colorado Springs, CO 80918, USA

Abstract: As social media continue to grow, the zeitgeist of society is increasingly found not in the headlines of traditional media institutions, but in the activity of ordinary individuals. The identification of trending topics utilises social media (such as Twitter) to provide an overview of the topics and issues that are currently popular within the online community. In this paper, we outline methodologies of detecting and identifying trending topics from streaming data. Data from Twitter's streaming API was collected and put into documents of equal duration using data collection procedures that allow for analysis over multiple timespans, including those not currently associated with Twitter-identified trending topics. Term frequency-inverse document frequency analysis and relative normalised term frequency analysis were performed on the documents to identify the trending topics. Relative normalised term frequency analysis identified unigrams, bigrams, and trigrams as trending topics, while term frequency-inverse document frequency analysis identified unigrams as trending topics. Application of these methodologies to streaming data resulted in F-measures ranging from 0.1468 to 0.7508.

Keywords: microblogs; streaming data; trend detection; trending topics; tf-idf; human language processing; web based communities; online communities; virtual communities; social media; Twitter; social networking.

DOI: 10.1504/IJWBC.2013.051298

International Journal of Web Based Communities, 2013 Vol.9 No.1, pp.122 - 139

Published online: 30 Jan 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article