Distributed data-dependent locality sensitive hashing Online publication date: Thu, 28-Mar-2019
by Yanping Ma; Qiming Liu; Cuifeng Li; Yi Tang; Hongtao Xie
International Journal of High Performance Computing and Networking (IJHPCN), Vol. 13, No. 3, 2019
Abstract: Locality sensitive hashing (LSH) is a popular algorithm for approximate nearest neighbour (ANN) search. As LSH partitions vector space uniformly and the distribution of vectors is usually non-uniform, it poorly fits real datasets and has limited efficiency. In this paper, we propose a novel data-dependent locality sensitive hashing (DP-LSH) algorithm, which has a two-level structure. In the first level, we train a number of cluster centres, and use the centres to divide the dataset. So the vectors in each cluster have near uniform distribution. In the second level, we construct LSH tables for each cluster. Given a query, we first determine a few clusters that it belongs to, and perform ANN search in the corresponding LSH tables. Furthermore, we present an optimised distributed scheme and a distributed DP-LSH algorithm. Experimental results on the reference datasets show that the search speed of DP-LSH can be increased by 48 times compared to E2LSH, while keeping high search precision; and the distributed DP-LSH can further improve search efficiency.
Online publication date: Thu, 28-Mar-2019
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of High Performance Computing and Networking (IJHPCN):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email firstname.lastname@example.org