International Journal of Intelligent Information and Database Systems (7 papers in press)
Gain Parameter and Dropout Based Fine Tuning of Deep Networks
by M. Arif Wani, Saduf Afzal
Abstract: Dealing with high dimensional data is one of the major current challenges to many classical classification algorithms. While shallow architectures are best suited to small datasets with many features, they can be relatively inefficient at modeling variation in high dimensional datasets. Deep architectures such as deep neural networks can express complex relationships among variables than the shallower ones. Training of deep neural networks can involve two learning phases: unsupervised pretraining and supervised fine tuning. Unsupervised pretraining is used to learn the initial parameter values of deep networks, while as supervised fine tuning improves upon what has been learned in the pretraining stage. Backpropagation algorithm can be used for supervised fine tuning of deep neural networks. However, in the field of shallow neural networks, a number of modifications to backpropagation algorithm have been used by researchers that have improved the performance of trained model. One such variant is backpropagation with gain parameter. In this paper we evaluate the use of backpropagation with gain parameter algorithm for fine tuning of deep networks. We further propose a modification where backpropagation with gain parameter algorithm is integrated with the dropout technique and evaluate its performance in fine tuning of deep networks. The effectiveness of fine tuning done by proposed technique is also compared with other variants of backpropagation algorithm on benchmark datasets. The experimental results show that the fine tuning of deep networks using the proposed technique yields promising results among all the studied methods on the tested datasets.
Keywords: Deep Learning; Deep Neural Networks; Fine Tuning; Drop Out Technique; Gain Parameter and Drop Out Technique.
QoS Management in Real-Time Spatial Big Data using a Feedback Control Scheduling
by Sana Hamdi
Abstract: Geographic Information System (GIS) is a computer system designed to capture, store, manipulate, analyze, manage, and present all types of spatial data. Spatial data, whether captured through remote sensors or large scale simulations becomes big and heterogenous. As a result, structured data and unstructured content are simultaneously accessed via an integrated user interface. The issue of real-time and heterogeneity is extremely important for taking effective decision. Thus, heterogeneous real-time spatial data management is a very active research domain nowadays. We are talking about a real-time spatial Big Data that process a large amount of heterogeneous data accessed simultaneously by two types of transactions, Update transactions and User transactions (continuous requests). In these applications, it is desirable to execute transactions within their deadlines using a real-time spatial data. But the real-time spatial Big Data can be overloaded and many transactions may miss their deadlines, or real-time spatial data can be violated.To address these problems, we proposed, as a first contribution, a new
architecture called FCSA-RTSBD (Feedback Control Scheduling Architecture
for Real-Time Spatial Big Data) (Hamdi et al., 2015). The main objectives
of this architecture are the following: take in account the heterogeneity of
data, guarantee the data freshness, enhance the deadline miss ratio even in
the presence of conflicts and finally satisfy the requirements of users by the
improving of the quality of service (QoS). In real-time spatial Big Data, the
performance can be increased by allowing concurrent execution of transactions.
This activity is called concurrency control. Concurrency control algorithm must
be used to ensure serializability of transaction scheduling and too maintain data
consistency. Several works have been done in this area but without holding in
account the existence of a huge volume of data. As a solution, we propose, as
a second contribution, an improvement of an existing Two-Shadow Speculative
Concurrency Control (SCC-2S) with priority proposed in (Yee et al., 2013) with
the use of the imprecise real-time spatial transaction. Finally, a simulation study
is shown to prove that our contributions can achieve a significant performance
improvement using the TPC-DS (TPC, 2014) benchmark.
Keywords: Heterogeneous Real-Time Geospatial Data; Update Transaction; User Transaction; Feedback Control Scheduling; Quality of Service; Nested Transaction; Speculative Concurrency Control; Imprecise Computation; Quality of Service; Simulation.
A BEST-EFFORT INTEGRATION FRAMEWORK FOR IMPERFECT INFORMATION SPACES
by Ashraf Jaradat, Ahmed Abu Halimeh, Aziz Deraman, Fadi Safiedinne
Abstract: Information integration (II) is the general process of producing a unified repository from a set of heterogeneous sources that may hold (semi)-structured or unstructured data. This process becomes significant in a variety of situations including commercial and academic research. Integrating information appears with increasing frequency as the volume, the heterogeneity and the need to share existing information grows. Entity Resolution (ER) with imperfection management is accepted as a major aspect while integrating heterogeneous information sources that exhibit entities in varied identifiers, abbreviated names, and multi-valued attributes. A review of the literature shows examples of novel integration applications that are inherently complex such as personal information management and Web-scale information management. Many of these applications require the ability to represent and manipulate imperfect data as the data items are inevitably imprecise, inconsistent, uncertain, error-prone and redundant. The process signifies the issues of starting with imperfect data to the production of the probabilistic database. However, classical data integration (CDI) framework fails to cope with new requirements of explicit imperfect or uncertain information management. This paper introduces an alternative integration framework based on the best-effort perspective to support instance integration automation. The new framework explicitly incorporates probabilistic management to the ER tasks. The probabilistic management includes a new probabilistic global entity, a new pair-wise-source-to-target ER process, and probabilistic decision model logic as alternatives. Together, the paper presents how these processes operate to support the current heterogeneous information sources integration requirements and challenges.
Keywords: data integration; information integration; uncertainty management; best-effort integration framework; probabilistic instance integration; data quality.
An ensemble of multi-model regression framework based on Fuzzy clustering using Inference System architecture for Reservoir Permeability prediction
by Van Huan Nguyen, Truong Duy Pham, Trong Hai Duong
Abstract: One of the critical engineering problems in optimisation reservoir development is petroleum reservoir description and characterisation. Also, the successful applications of fuzzy inference system (FIS) and ensemble learning method in reservoir characterisation have been reported. In this study, we proposed an ensemble of multi-model regression framework based on FIS architecture to tackle the challenge of permeability prediction using logs data properties. The study demonstrates the capability of the ensemble model when tested in well log properties which is practical data of Oligocene geological types from Cuu Long basin. Empirical results indicate that our proposed algorithm framework is efficient and has the significant improvement compare to each existing standard single model.
Keywords: eural networks; ANFIS; reservoir permeability prediction; multimodel regression.
Impacts of Feature Selection on Classification of Individual Activity Recognitions for Prediction of Crowd Disasters
by Ali Selamat, Fatai Sadiq, Ondrej Krejcar, Roliana Ibrahim
Abstract: We examined possibility of feature selection using Statistical Based Time Frequency Domain (SBTFD) extracted features for human activity recognitions. This is to reduce the dimension of features space, remove redundant features to improve accuracy and minimize false negative alarm for crowd disasters. For this, we analyzed and classified 54 SBTFD features obtained from 22,350 instances comprising of climb down, climb up, peak shake while standing, standing, still, and walking; as classes V1, V2, to V8, respectively. The individual activity recognition dataset (D1) were collected from 20 students in a well-known institution in Malaysia. In addition, a similar dataset (D2) from repository was used. The dataset contains 250,936 instances from 9 users for smartphone accelerometer signals. Both datasets were subjected to Minimum Redundancy Maximum Relevance (MRMR), correlation and chi-square techniques to filter the relevant SBTFD features to select effective features to reduce the dimension. Based on the selected features, we applied 10-fold cross validation using WEKA with Random Forest (RF), J48, Sequential Minimal Optimization (SMO) and Naive Bayes (NB) classifiers to classify and predict abnormality behaviour classes V1 to V8. We achieved an excellent accuracy and reduce false negative rate to safe human lives from crowd disasters with 7 features of MRMR using RF.
Keywords: Statistical Based Time Frequency Domain (SBTFD); human activity recognitions; Minimum Redundancy Maximum Relevance (MRMR); chi-square; dimensional reductions.
Special Issue on: Big Data and Decision Sciences in Management and Engineering
Semantic Role Labeling of English Tweets Through Sentence Boundary Detection
by Dwijen Rudrapal, Amitava Das
Abstract: Social media service like Twitter has become a trendy communication medium for online users to share quick and up-to-date information. However, the tweets are extremely noisy, full of spelling and grammatical mistakes which pose unique challenges towards semantic information extraction. One prospective solution to this problem is semantic role labeling (SRL), which focuses on unifying variations in the facade syntactic forms of semantic relations. SRL for tweets plays central role in a wide range of tweet related applications associated with semantic information extraction. In this paper, we proposed an automatic SRL system for English tweets by identifying sentences and using Sequential minimal optimization (SMO). We conducted experiments on our SRL annotated dataset to evaluate proposed approach and report better performance than existing state-of-the art SRL systems for English tweets.
Keywords: Tweet Stream; Semantic Role Labeling; Tweet Summarization; Machine Learning Algorithm.
Special Issue on: Evolutionary Algorithms in Intelligent Systems
Object tracking using the particle filter optimized by the improved artificial fish swarm algorithm
by Zhi-Gao Zeng, Haixing Bao, Zhiqiang Wen, Wenqiu Zhu
Abstract: In particle filter algorithm, the weight values of particles will gradually decrease as the increase of iteration times and the variance of the weight values of the particles will increase. This will lead to an increase in the deviation between the estimated state and the true state. In order to deal with this problem, an improved particle filter algorithm is proposed in this paper. That is, an improved artificial fish swarm optimization algorithm is used to optimize the traditional particle filter. In the improved particle filter algorithm, the resampled particles will be driven to the region with high likelihood function to increase the weight values of the particles. Thus, the estimated state is closer to the real state. Experiment results show the advantage of our new algorithm over a range of existing algorithms.
Keywords: object tracking; particle filter; artificial fish swarm algorithm.