Authors: James Benhardus; Jugal Kalita
Addresses: Centre for Cognitive Science, University of Minnesota, Minneapolis, MN 55455, USA ' Department of Computer Science, University of Colorado, Colorado Springs, CO 80918, USA
Abstract: As social media continue to grow, the zeitgeist of society is increasingly found not in the headlines of traditional media institutions, but in the activity of ordinary individuals. The identification of trending topics utilises social media (such as Twitter) to provide an overview of the topics and issues that are currently popular within the online community. In this paper, we outline methodologies of detecting and identifying trending topics from streaming data. Data from Twitter's streaming API was collected and put into documents of equal duration using data collection procedures that allow for analysis over multiple timespans, including those not currently associated with Twitter-identified trending topics. Term frequency-inverse document frequency analysis and relative normalised term frequency analysis were performed on the documents to identify the trending topics. Relative normalised term frequency analysis identified unigrams, bigrams, and trigrams as trending topics, while term frequency-inverse document frequency analysis identified unigrams as trending topics. Application of these methodologies to streaming data resulted in F-measures ranging from 0.1468 to 0.7508.
Keywords: microblogs; streaming data; trend detection; trending topics; tf-idf; human language processing; web based communities; online communities; virtual communities; social media; Twitter; social networking.
International Journal of Web Based Communities, 2013 Vol.9 No.1, pp.122 - 139
Available online: 04 Jan 2013 *Full-text access for editors Access for subscribers Free access Comment on this article