Adaptable address parser with active learning Online publication date: Tue, 04-Apr-2023
by You-Xuan Lin
International Journal of Data Mining, Modelling and Management (IJDMMM), Vol. 15, No. 1, 2023
Abstract: Address parsing, decomposing address strings to semantically meaningful components, is a measure to convert unstructured or semi-structured address data to structured one. Flexibility and variability in real-world address formats make parser development a non-trivial task. Even after all the time and effort dedicated to obtaining a capable parser, updating or even re-training is required for out-of-domain data and extra costs will be incurred. To minimise the cost of model building and updating, this study experiments with active learning for model training and adaptation. Models composed of character-level embedding and recurrent neural networks are trained to parse address in Taiwan. Results show that by active learning, 420 additional instances to the training data are sufficient for a model to adapt itself to unfamiliar data while its competence in the original domain is retained. This suggests that active learning is helpful for model adaptation when data labelling is expensive and restricted.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Data Mining, Modelling and Management (IJDMMM):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com