Title: Context aware discovery in web data through anomaly detection

Authors: Ruman Tambe; George Karabatis; Vandana P. Janeja

Addresses: Department of Information Systems, University of Maryland, Baltimore County (UMBC), 1000 Hilltop Circle, Baltimore, MD 21250, USA ' Department of Information Systems, University of Maryland, Baltimore County (UMBC), 1000 Hilltop Circle, Baltimore, MD 21250, USA ' Department of Information Systems, University of Maryland, Baltimore County (UMBC), 1000 Hilltop Circle, Baltimore, MD 21250, USA

Abstract: Context enables more accurate searches on the enormous information available on the web by setting the boundaries within which we can transition from data to relevant information. This paper describes a technique to analyse data extracted from the web and generate a contextual model that seamlessly combines data elements of a domain to provide the most accurate information to the user. The discovery of anomalies is of particular interest, since they may not be clearly evident without context information of a specific domain. A generic system design for extracting web data and generating a contextual model for any domain is presented. Contextual information and semantic techniques are used in a prototype system for the identification of potential threats associated with cargo shipments from the contextual perspective of relevant US federal agencies. An experimental evaluation shows that this technique increases precision of results.

Keywords: semantics; anomaly detection; web data extraction; web engineering; databases; context aware discovery; data discovery; contextual models; potential threats; cargo shipments; US federal agencies; USA; United States.

DOI: 10.1504/IJWET.2015.069348

International Journal of Web Engineering and Technology, 2015 Vol.10 No.1, pp.3 - 30

Published online: 13 May 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article