Inderscience PublishersInderscience PublishersInderscience Publishers
  PUBLISHERS OF DISTINGUISHED ACADEMIC, SCIENTIFIC AND PROFESSIONAL JOURNALS

Article Abstract

Title: OntoMiner: automated metadata and instance mining from news websites
  Author: Hasan Davulcu, Srinivas Vadrevu, Saravanakumar Nagarajan   Email author(s)
  Address: Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287-8809, USA. ' Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287-8809, USA. ' Convera Corporation, 1808 Aston Avenue, Carlsbad, CA-92008, USA
  Journal: International Journal of Web and Grid Services 2005 - Vol. 1, No.2  pp. 196 - 221
  Abstract: RDF/XML has been widely recognised as the standard for annotating online web documents and for transforming the HTML web into the so-called Semantic Web. In order to enable widespread usability of the Semantic Web, there is a need to bootstrap large, rich and up-to-date domain ontologies that organise the most relevant concepts, their relationships and instances. In this paper, we present automated techniques for bootstrapping and populating specialised domain ontologies by organising and mining a set of relevant overlapping websites. We develop algorithms that detect and utilise HTML regularities in the web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. We also report experimental evaluation for the news, travel and shopping domains to demonstrate the efficacy of our algorithms.
  Keywords: instance ontology; metadata mining; instance mining; news websites; semantic web; domain ontologies; bootstrapping; web information retrieval; data mining; document information retrieval; web search; travel websites; shopping websites; web documents; taxonomy directed websites.
  DOI: 10.1504/IJWGS.2005.008320
  Access for editors and complimentary subscribers       Access for Subscribers   Purchase this Paper        We welcome your comments about this paper Comment on the Paper