Title: Information retrieval by mining text and image

Authors: R. Seethalakshmi; K.S. Ravichandran; P. Swaminathan; A.N. Alagappan

Addresses: School of Computing, SASTRA University, Tirumalaisamudram, Thanjavur – 613 401, Tamil Nadu, India ' School of Computing, SASTRA University, Tirumalaisamudram, Thanjavur – 613 401, Tamil Nadu, India ' School of Computing, SASTRA University, Tirumalaisamudram, Thanjavur – 613 401, Tamil Nadu, India ' School of Computing, SASTRA University, Tirumalaisamudram, Thanjavur – 613 401, Tamil Nadu, India

Abstract: We have wonderful scripts which are lying to be digitised in Tamil. Tamil is a language which is enriched with several ancient scripts. Optical character recognition is done in Tamil in order to digitise the scripts. The optical character recognition consists of scanning phase, preprocessing phase, segmentation phase and recognition phase. The retrieved text is stored as an archive in the database. The archive also encompasses the original images. The front end GUI contains the search engine wherein which the keyword is put. The crawler crawls in the database and retrieves the searched page and the image based on context. The retrieved pages will be displayed in the order of relevant context and the appropriate page is clicked and fetched as desired.

Keywords: optical character recognition; OCR; information retrieval; ancient Tamil literature; database storage; search engines; data mining; text mining; image mining; Tamil script; digitisation.

DOI: 10.1504/IJAIP.2016.080199

International Journal of Advanced Intelligence Paradigms, 2016 Vol.8 No.4, pp.451 - 459

Received: 15 Nov 2014
Accepted: 16 Oct 2015

Published online: 07 Nov 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article