Authors: K. Sundarakantham, N. Sheena, S. Mercy Shalinie
Addresses: CSE Department, Thiagarajar College of Engineering, Madurai – 625015 Tamilnadu, India. ' CSE Department, Thiagarajar College of Engineering, Madurai – 625015 Tamilnadu, India. ' CSE Department, Thiagarajar College of Engineering, Madurai – 625015 Tamilnadu, India
Abstract: Grammatical inference, also known as Grammar Induction, is about the problem of learning structural models from data. For decades researchers have been trying to devise formal and detailed grammars that would capture the observed regularities of language. This paper presents a comprehensive solution for efficient language acquisition by a novel semi-supervised algorithm that learns a streamlined representation of linguistic structures from a plain natural-language corpus. The input datasets are ATIS dataset and sentences from children|s literature. The proposed algorithm generates rules from the given corpora and using the learned rules new sentences are generated. Performance of the algorithm is evaluated based on two measures – recall and precision. The recall was 0.935 and precision was 0.916. The results were found to be better than with other algorithms, such as EMILE, ADIOS and GCS. The running time of the algorithm is tested by varying the size of the dataset. It has shown a linear increment in time with the size of dataset.
Keywords: grammatical inference; clause boundary; language acquisition; semi-supervised learning; natural language processing; grammar induction; linguistic structures.
International Journal of Computer Applications in Technology, 2010 Vol.38 No.4, pp.259 - 263
Published online: 07 Aug 2010 *Full-text access for editors Access for subscribers Purchase this article Comment on this article