Improved LSH-driven string similarity join filtering-verification framework
by Jingwei Zhang; Ru Chen; Qing Yang
International Journal of Intelligent Internet of Things Computing (IJIITC), Vol. 1, No. 2, 2020

Abstract: Similarity join is a basic data analysis operation, which is widely used in the fields of similarity search, data cleaning and recommendation application. The filtering-verification framework is one of the main modes to implement similarity join. In view of high-dimensional data and high edit distance threshold, a filtering-verification framework based on locality-sensitive hashing (LSH) is proposed, which adopts dual filtering mode to effectively balance the number of both false positive and false negative, thereby improving the efficiency and accuracy of similarity join. Experimental results show that the similarity join filtering-verification framework based on LSH can effectively reduce the number of false positive, and it has a significant improvement in efficiency compared with the traditional method based on edit distance.

Online publication date: Mon, 12-Oct-2020

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Intelligent Internet of Things Computing (IJIITC):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com