Authors: Fan Xing; Chengbing Zhou; Xiafei Suo; Tinghuai Ma; Yuanfeng Jin
Addresses: School of Computer and Software, Nanjing University of Information Science and Technology, Jiangsu, Nanjing, China ' Jiangsu Information Center, Jiangsu, Nanjing, China ' School of Computer and Software, Nanjing University of Information Science and Technology, Jiangsu, Nanjing, China ' Jiangsu Engineering Centre of Network Monitoring, Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Nanjing University of information Science and Technology, Nanjing, Jiangsu, China ' Department of Mathmatics, Yan Bian University, Yanji 133002, Jilin, China
Abstract: The increasing popularity of social networks, such as online communities and collaboration networks, has given rise to interesting knowledge discovery and data mining problems. Different methods of knowledge discovery and data mining require different types of social network datasets, such as datasets including node and edge information or datasets with added labels or weights. In this paper, we review social network datasets in different formats and applications. Then, we propose a data processing method to add weights and labels for a Chinese collaboration network (C-DBLP). The weights reflect the affinity between two authors, and the labels contain information about the authors. This dataset can be used for privacy protection in social networks and social network community detection. Finally, we analyse the new data that we compiled and draw conclusions about the importance of the weights in the C-DBLP dataset in terms of social network research.
Keywords: social networks; C-DBLP; datasets; weights; labels.
International Journal of Communication Networks and Distributed Systems, 2017 Vol.18 No.3/4, pp.312 - 328
Received: 25 Nov 2015
Accepted: 10 Aug 2016
Published online: 18 Apr 2017 *