Title: A distributed big data analytics model for people re-identification based dimensionality reduction

Authors: Abderrahmane Ez-Zahout

Addresses: Faculty of Sciences, Computer Science Department, IPSS Team, Mohamed 5 University, Rabat, 10 000, Morocco

Abstract: Big data analytics is a vast domain includes intelligent processing systems. Intelligent video surveillance generates a huge volume of data; and unstructured data requires fast processing speed. In big data analytics, most of the data involved in the processing comes from closed-circuit television (CCTV) are unstructured. Therefore, a very big volume of data requires an efficient and advanced processing. Those systems operate on four phases, detection, tracking, profile analysis and re-identification. In this work, re-identification is based real time dimensionality reduction with SparkMlLib library to speed up the feature's extraction. Practically, Minkowski distance and Kmeans algorithms are used for this issue. Therefore, to improve the effectiveness of our model, principal component analysis (PCA), cumulative match curve (CMC) and cumulative distribution function (CDF) have been used. These functions measure the re-identification errors and provide more re-identification in real time context.

Keywords: SparkMlLib; re-identification; ROI; regions of interest; features similarity; Kmeans; Minkowski distance; CMC; cumulative match curve; CDF; cumulative distribution function; PCA; principal component analysis.

DOI: 10.1504/IJHPSA.2021.119147

International Journal of High Performance Systems Architecture, 2021 Vol.10 No.2, pp.57 - 63

Received: 11 Sep 2020
Accepted: 26 Dec 2020

Published online: 25 Nov 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article