Authors: Shaozhi Ye; S. Felix Wu
Addresses: Google Inc., 1600 Amphitheatre Pkwy, Mountain View, CA 94043, USA. ' Department of Computer Science, University of California, One Shields Ave., Davis, CA 95616, USA
Abstract: The huge size of online social networks (OSNs) makes it prohibitively expensive to precisely measure any properties which require the knowledge of the entire graph. To estimate the size of an OSN, i.e., the number of users an OSN has, this paper introduces three estimators using widely available OSN functionalities/services. The first estimator is a maximum likelihood estimator (MLE) based on uniform sampling. An O(logn) algorithm is developed to solve the estimator. In our experiments, it is 70 times faster than the naive linear probing algorithm. The second estimator is mark and recapture (MR), which we employ to estimate the number of Twitter users behind its public timeline service. The third estimator is based on random walkers (RW) and is generalised to estimate other graph properties. In-depth evaluations are conducted on six real OSNs to show the bias and variance of these estimators. Our analysis addresses the challenges and pitfalls when developing and implementing such estimators for OSNs.
Keywords: online social networks; OSNs; maximum likelihood estimator; MLE; mark and recapture; random walkers; network size.
International Journal of Social Computing and Cyber-Physical Systems, 2011 Vol.1 No.2, pp.160 - 179
Available online: 11 Dec 2011 *Full-text access for editors Access for subscribers Purchase this article Comment on this article