Statistics and linguistic rules in multiword extraction: a comparative analysis Online publication date: Sat, 22-Nov-2014
by Shaishav Agrawal; Ratna Sanyal; Sudip Sanyal
International Journal of Reasoning-based Intelligent Systems (IJRIS), Vol. 6, No. 1/2, 2014
Abstract: A hybrid methodology is proposed for extracting multiword expressions based on linguistic and statistical information. In the proposed methodology, N-grams are extracted by linguistic patterns and then various statistical measures are applied for classifying these N-grams as multiword expressions. To solve the problem of deciding cut-off boundary threshold in statistical filtering phase, a novel method for calculating boundary threshold is designed. Comparative analysis between the baseline method and the proposed methodology is presented. In the baseline method, firstly, N-grams are filtered by statistical measures and then linguistic filtering is applied. Precision, recall and f-Score are calculated on manually annotated corpus. Observed results show that the proposed methodology provides good results for certain types of multiword expressions like compound nouns, verb-particles and verb-verb.
Online publication date: Sat, 22-Nov-2014
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Reasoning-based Intelligent Systems (IJRIS):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email email@example.com