Title: Browser simulation-based crawler for online social network profile extraction

Authors: Suhail Iqbal Bhat; Tasleem Arif; Majid Bashir Malik; Aijaz Ahmad Sheikh

Addresses: Research Lab, Department of Information Technology, Baba Ghulam Shah Badshah University, Jammu and Kashmir, 185234, India ' Department of Information Technology, Baba Ghulam Shah Badshah University, Jammu and Kashmir, 185234, India ' Department of Computer Sciences, Baba Ghulam Shah Badshah University, Jammu and Kashmir, 185234, India ' Research Lab, Department of Information Technology, Baba Ghulam Shah Badshah University, Jammu and Kashmir, 185234, India

Abstract: The rapid proliferation and extensive use of online social networks (OSNs) like Facebook, Twitter, Instagram, etc., has attracted the attention of academia and industry, since these networks store massive information in them. But, acquiring data from these OSNs, which is a prerequisite for conducting any research on them, is a daunting task, which can be because of privacy concerns on one hand and complexity of underlying technologies of these complex networks, on the other. This paper presents the design and implementation of a crawler based on browser simulation for extraction of Facebook users profile data while preserving the privacy. The breadth-first-search (BFS) algorithm approach was also adopted for sampling of around 0.235 million Facebook users. Though the main purpose of this work is the design of a crawler still, the results have been briefly presented in terms of various social network metrics and analysed from different aspects of privacy.

Keywords: online social networks; OSNs; Facebook; crawler; Python; selenium; data analytics.

DOI: 10.1504/IJWBC.2020.111377

International Journal of Web Based Communities, 2020 Vol.16 No.4, pp.321 - 342

Received: 28 Feb 2019
Accepted: 17 Feb 2020

Published online: 23 Nov 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article