Authors: Yanping Ma; Qiming Liu; Cuifeng Li; Yi Tang; Hongtao Xie
Addresses: School of Information and Electrical Engineering, Ludong University, Yantai, China ' School of Information and Electrical Engineering, Ludong University, Yantai, China ' School of Information and Electrical Engineering, Ludong University, Yantai, China ' Department of Mathematics, Guangzhou University, China ' National Engineering Laboratory for Information Security Technologies, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Abstract: Locality sensitive hashing (LSH) is a popular algorithm for approximate nearest neighbour (ANN) search. As LSH partitions vector space uniformly and the distribution of vectors is usually non-uniform, it poorly fits real datasets and has limited efficiency. In this paper, we propose a novel data-dependent locality sensitive hashing (DP-LSH) algorithm, which has a two-level structure. In the first level, we train a number of cluster centres, and use the centres to divide the dataset. So the vectors in each cluster have near uniform distribution. In the second level, we construct LSH tables for each cluster. Given a query, we first determine a few clusters that it belongs to, and perform ANN search in the corresponding LSH tables. Furthermore, we present an optimised distributed scheme and a distributed DP-LSH algorithm. Experimental results on the reference datasets show that the search speed of DP-LSH can be increased by 48 times compared to E2LSH, while keeping high search precision; and the distributed DP-LSH can further improve search efficiency.
Keywords: locality sensitive hashing; LSH; approximate nearest neighbour; ANN; data-dependent; distributed high dimensional search.
International Journal of High Performance Computing and Networking, 2019 Vol.13 No.3, pp.304 - 311
Received: 24 Mar 2016
Accepted: 11 Sep 2016
Published online: 13 Mar 2019 *