Chinese address standardisation via hybrid approach combining statistical and rule-based methods Online publication date: Tue, 22-Oct-2019
by Xi Chen; Cheng Fang; Jasmine Chang; Yanjiang Yang; Yuan Hong; Haibing Lu
International Journal of Internet and Enterprise Management (IJIEM), Vol. 9, No. 2, 2019
Abstract: This paper is derived from the research project of cleansing customer address data for the State Grid Corporation of China (SGCC), which is the largest electric utility company in the world and was ranked the 2nd in the 2016 Fortune Global 500. Address standardisation involves development of a standard address format for data integration, de-duplication, auto address correction/completion, and is widely considered as a very challenging data cleansing task. Address standardisation is critical for routine business tasks, customer relationship management, business intelligence for customer-oriented cooperates, and others. Address standardisation is particularly difficult for the Chinese language. The underlying reasons include: 1) the current address standard placed in China is only realised at the city/town level; 2) due to a number of reasons, many hand-written addresses are incomplete or contain errors; 3) it is difficult to process the Chinese language in a machine fashion due to the language. characteristics. To tackle challenges, we propose a hybrid approach combining both statistical and rule-based methods, which are the two mainstream address standardisation approaches. Our hybrid approach utilises the merits of the both methods and can complete the address standardisation task with a little human efforts and computational time, while achieving high accuracy.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Internet and Enterprise Management (IJIEM):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com