Title: Information retrieval by mining text and image
Authors: R. Seethalakshmi; K.S. Ravichandran; P. Swaminathan; A.N. Alagappan
Addresses: School of Computing, SASTRA University, Tirumalaisamudram, Thanjavur – 613 401, Tamil Nadu, India ' School of Computing, SASTRA University, Tirumalaisamudram, Thanjavur – 613 401, Tamil Nadu, India ' School of Computing, SASTRA University, Tirumalaisamudram, Thanjavur – 613 401, Tamil Nadu, India ' School of Computing, SASTRA University, Tirumalaisamudram, Thanjavur – 613 401, Tamil Nadu, India
Abstract: We have wonderful scripts which are lying to be digitised in Tamil. Tamil is a language which is enriched with several ancient scripts. Optical character recognition is done in Tamil in order to digitise the scripts. The optical character recognition consists of scanning phase, preprocessing phase, segmentation phase and recognition phase. The retrieved text is stored as an archive in the database. The archive also encompasses the original images. The front end GUI contains the search engine wherein which the keyword is put. The crawler crawls in the database and retrieves the searched page and the image based on context. The retrieved pages will be displayed in the order of relevant context and the appropriate page is clicked and fetched as desired.
Keywords: optical character recognition; OCR; information retrieval; ancient Tamil literature; database storage; search engines; data mining; text mining; image mining; Tamil script; digitisation.
DOI: 10.1504/IJAIP.2016.080199
International Journal of Advanced Intelligence Paradigms, 2016 Vol.8 No.4, pp.451 - 459
Received: 15 Nov 2014
Accepted: 16 Oct 2015
Published online: 07 Nov 2016 *