Title: Modelling content lifespan in online social networks using data mining

Authors: John W. Gibbons; Arvin Agah

Addresses: Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA ' Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA

Abstract: Online social networks (OSNs) are integrated into business, entertainment, politics, and education; they are integrated into nearly every facet of our everyday lives. They have played essential roles in milestones for humanity, such as the social revolutions in certain countries, to more day-to-day activities, such as streaming entertaining or educational materials. Not surprisingly, social networks are the subject of study, not only for computer scientists, but also for economists, sociologists, political scientists, and psychologists, among others. In this paper, we build a model that is used to classify content on the OSNs of Reddit, 4chan, Flickr, and YouTube - according the types of lifespan their content have and the popularity tiers that the content reaches. The proposed model is evaluated using ten-fold cross-validation, using data mining techniques of sequential minimal optimisation (based on support vector machine), decision table, Naïve Bayes, and random forest. The run times and accuracies are compared across OSNs, models, and data mining algorithms.

Keywords: online social networks; data mining; social media; content modelling; content lifespan; Reddit; 4chan; Flickr; YouTube; social network content; support vector machines; SVM; decision tables; naive Bayes; random forest.

DOI: 10.1504/IJWBC.2015.072131

International Journal of Web Based Communities, 2015 Vol.11 No.3/4, pp.234 - 263

Received: 23 Mar 2015
Accepted: 01 Apr 2015

Published online: 01 Oct 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article