Title: Graphical models based hierarchical probabilistic community discovery in large-scale social networks

Authors: Haizheng Zhang, Ke Ke, Wei Li, Xuerui Wang

Addresses: Microsoft Corporation, 1 Microsoft Way, Redmond, WA 98052, USA. ' College of Business, Central Washington University, 2400 S. 240th St., Des Moines, WA 98198, USA. ' Contextual and Display Advertising Sciences Department, Yahoo! Labs, 4401 Great America Parkway, Santa Clara, CA 95054, USA. ' Yahoo! Labs, 701 First Avenue, Sunnyvale, CA 94089, USA

Abstract: Real-world social networks, while disparate in nature, often comprise of a set of loose clusters (a.k.a. communities), in which members are better connected to each other than to the rest of the network. In addition, such communities are often hierarchical, reflecting the fact that some communities are composed of a few smaller, sub-communities. Discovering the complicated hierarchical community structure can gain us deeper understanding about the networks and the pertaining communities. This paper describes a hierarchical Bayesian model based scheme namely hierarchical social network-pachinko allocation model (HSN-PAM), for discovering probabilistic, hierarchical communities in social networks. This scheme is powered by a previously developed hierarchical Bayesian model. In this scheme, communities are classified into two categories: super-communities and regular-communities. Two different network encoding approaches are explored to evaluate this scheme on research collaborative networks, including CiteSeer. The experimental results demonstrate that HSN-PAM is effective for discovering hierarchical community structures in large-scale social networks.

Keywords: community discovery; hierarchical community structure; probabilistic discovery; social networks; graphical models; data mining; clusters; Bayesian modelling; online communities, web based communities; virtual communities.

DOI: 10.1504/IJDMMM.2010.032144

International Journal of Data Mining, Modelling and Management, 2010 Vol.2 No.2, pp.95 - 116

Published online: 11 Mar 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article