You can view the full text of this article for free using the link below.

Title: Adaptable address parser with active learning

Authors: You-Xuan Lin

Addresses: National Center for Research on Earthquake Engineering, No. 200, Sec. 3, Xinhai Rd., Da'an Dist., Taipei City 106219, Taiwan

Abstract: Address parsing, decomposing address strings to semantically meaningful components, is a measure to convert unstructured or semi-structured address data to structured one. Flexibility and variability in real-world address formats make parser development a non-trivial task. Even after all the time and effort dedicated to obtaining a capable parser, updating or even re-training is required for out-of-domain data and extra costs will be incurred. To minimise the cost of model building and updating, this study experiments with active learning for model training and adaptation. Models composed of character-level embedding and recurrent neural networks are trained to parse address in Taiwan. Results show that by active learning, 420 additional instances to the training data are sufficient for a model to adapt itself to unfamiliar data while its competence in the original domain is retained. This suggests that active learning is helpful for model adaptation when data labelling is expensive and restricted.

Keywords: address parsing; record linkage; active learning; model adaptation; recurrent neural network; RNN; address in Taiwan.

DOI: 10.1504/IJDMMM.2023.129991

International Journal of Data Mining, Modelling and Management, 2023 Vol.15 No.1, pp.79 - 101

Received: 18 Nov 2021
Accepted: 28 Jan 2022

Published online: 04 Apr 2023 *

Full-text access for editors Full-text access for subscribers Free access Comment on this article