Title: Improved Hamming-space-based similarity search algorithm

Authors: Vikram Singh; Chandradeep Kumar

Addresses: National Institute of Technology, Kurukshetra, India ' National Institute of Technology, Kurukshetra, India

Abstract: In the modern context, similarity is driven by the quality-features of the data objects and steered by content preserving stimuli, as retrieval of relevant 'nearest neighbourhood' objects and the way similar objects are pursued. Current similarity searches in Hamming-space-based strategies finds all the data objects within a threshold Hamming-distance for a user query. Though, the numbers of computations for Hamming-distance and candidate generation are the key concerns from the several years. The Hamming-space paradigm extends the range of alternatives for an optimised search experience. A novel 'counting-based' similarity search strategy is proposed, with an a priori and improved Hamming-space estimation, e.g., optimised candidate generation and verification functions. The strategy adapts towards the lesser set of user query dimensions and subsequently constraints the Hamming-space computations with each data objects, driven by generated statistics. The extensive evaluation asserts that the proposed counting-based approach can be combined with any pigeonhole principle-based similarity search to further improve its performance.

Keywords: Hamming-space; information retrieval; similarity search.

DOI: 10.1504/IJIIDS.2023.128275

International Journal of Intelligent Information and Database Systems, 2023 Vol.16 No.1, pp.20 - 38

Received: 28 Sep 2021
Accepted: 16 Jul 2022

Published online: 16 Jan 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article