International Journal of Intelligent Information and Database Systems (4 papers in press)
QoS Management in Real-Time Spatial Big Data using a Feedback Control Scheduling
by Sana Hamdi
Abstract: Geographic Information System (GIS) is a computer system designed to capture, store, manipulate, analyze, manage, and present all types of spatial data. Spatial data, whether captured through remote sensors or large scale simulations becomes big and heterogenous. As a result, structured data and unstructured content are simultaneously accessed via an integrated user interface. The issue of real-time and heterogeneity is extremely important for taking effective decision. Thus, heterogeneous real-time spatial data management is a very active research domain nowadays. We are talking about a real-time spatial Big Data that process a large amount of heterogeneous data accessed simultaneously by two types of transactions, Update transactions and User transactions (continuous requests). In these applications, it is desirable to execute transactions within their deadlines using a real-time spatial data. But the real-time spatial Big Data can be overloaded and many transactions may miss their deadlines, or real-time spatial data can be violated.To address these problems, we proposed, as a first contribution, a new
architecture called FCSA-RTSBD (Feedback Control Scheduling Architecture
for Real-Time Spatial Big Data) (Hamdi et al., 2015). The main objectives
of this architecture are the following: take in account the heterogeneity of
data, guarantee the data freshness, enhance the deadline miss ratio even in
the presence of conflicts and finally satisfy the requirements of users by the
improving of the quality of service (QoS). In real-time spatial Big Data, the
performance can be increased by allowing concurrent execution of transactions.
This activity is called concurrency control. Concurrency control algorithm must
be used to ensure serializability of transaction scheduling and too maintain data
consistency. Several works have been done in this area but without holding in
account the existence of a huge volume of data. As a solution, we propose, as
a second contribution, an improvement of an existing Two-Shadow Speculative
Concurrency Control (SCC-2S) with priority proposed in (Yee et al., 2013) with
the use of the imprecise real-time spatial transaction. Finally, a simulation study
is shown to prove that our contributions can achieve a significant performance
improvement using the TPC-DS (TPC, 2014) benchmark.
Keywords: Heterogeneous Real-Time Geospatial Data; Update Transaction; User Transaction; Feedback Control Scheduling; Quality of Service; Nested Transaction; Speculative Concurrency Control; Imprecise Computation; Quality of Service; Simulation.
A BEST-EFFORT INTEGRATION FRAMEWORK FOR IMPERFECT INFORMATION SPACES
by Ashraf Jaradat, Ahmed Abu Halimeh, Aziz Deraman, Fadi Safiedinne
Abstract: Information integration (II) is the general process of producing a unified repository from a set of heterogeneous sources that may hold (semi)-structured or unstructured data. This process becomes significant in a variety of situations including commercial and academic research. Integrating information appears with increasing frequency as the volume, the heterogeneity and the need to share existing information grows. Entity Resolution (ER) with imperfection management is accepted as a major aspect while integrating heterogeneous information sources that exhibit entities in varied identifiers, abbreviated names, and multi-valued attributes. A review of the literature shows examples of novel integration applications that are inherently complex such as personal information management and Web-scale information management. Many of these applications require the ability to represent and manipulate imperfect data as the data items are inevitably imprecise, inconsistent, uncertain, error-prone and redundant. The process signifies the issues of starting with imperfect data to the production of the probabilistic database. However, classical data integration (CDI) framework fails to cope with new requirements of explicit imperfect or uncertain information management. This paper introduces an alternative integration framework based on the best-effort perspective to support instance integration automation. The new framework explicitly incorporates probabilistic management to the ER tasks. The probabilistic management includes a new probabilistic global entity, a new pair-wise-source-to-target ER process, and probabilistic decision model logic as alternatives. Together, the paper presents how these processes operate to support the current heterogeneous information sources integration requirements and challenges.
Keywords: data integration; information integration; uncertainty management; best-effort integration framework; probabilistic instance integration; data quality.
An ensemble of multi-model regression framework based on Fuzzy clustering using Inference System architecture for Reservoir Permeability prediction
by Van Huan Nguyen, Truong Duy Pham, Trong Hai Duong
Abstract: One of the critical engineering problems in optimisation reservoir development is petroleum reservoir description and characterisation. Also, the successful applications of fuzzy inference system (FIS) and ensemble learning method in reservoir characterisation have been reported. In this study, we proposed an ensemble of multi-model regression framework based on FIS architecture to tackle the challenge of permeability prediction using logs data properties. The study demonstrates the capability of the ensemble model when tested in well log properties which is practical data of Oligocene geological types from Cuu Long basin. Empirical results indicate that our proposed algorithm framework is efficient and has the significant improvement compare to each existing standard single model.
Keywords: eural networks; ANFIS; reservoir permeability prediction; multimodel regression.
Special Issue on: Big Data and Decision Sciences in Management and Engineering
Semantic Role Labeling of English Tweets Through Sentence Boundary Detection
by Dwijen Rudrapal, Amitava Das
Abstract: Social media service like Twitter has become a trendy communication medium for online users to share quick and up-to-date information. However, the tweets are extremely noisy, full of spelling and grammatical mistakes which pose unique challenges towards semantic information extraction. One prospective solution to this problem is semantic role labeling (SRL), which focuses on unifying variations in the facade syntactic forms of semantic relations. SRL for tweets plays central role in a wide range of tweet related applications associated with semantic information extraction. In this paper, we proposed an automatic SRL system for English tweets by identifying sentences and using Sequential minimal optimization (SMO). We conducted experiments on our SRL annotated dataset to evaluate proposed approach and report better performance than existing state-of-the art SRL systems for English tweets.
Keywords: Tweet Stream; Semantic Role Labeling; Tweet Summarization; Machine Learning Algorithm.