International Journal of Intelligent Information and Database Systems (14 papers in press)
QoS Management in Real-Time Spatial Big Data using a Feedback Control Scheduling
by Sana Hamdi
Abstract: Geographic Information System (GIS) is a computer system designed to capture, store, manipulate, analyze, manage, and present all types of spatial data. Spatial data, whether captured through remote sensors or large scale simulations becomes big and heterogenous. As a result, structured data and unstructured content are simultaneously accessed via an integrated user interface. The issue of real-time and heterogeneity is extremely important for taking effective decision. Thus, heterogeneous real-time spatial data management is a very active research domain nowadays. We are talking about a real-time spatial Big Data that process a large amount of heterogeneous data accessed simultaneously by two types of transactions, Update transactions and User transactions (continuous requests). In these applications, it is desirable to execute transactions within their deadlines using a real-time spatial data. But the real-time spatial Big Data can be overloaded and many transactions may miss their deadlines, or real-time spatial data can be violated.To address these problems, we proposed, as a first contribution, a new
architecture called FCSA-RTSBD (Feedback Control Scheduling Architecture
for Real-Time Spatial Big Data) (Hamdi et al., 2015). The main objectives
of this architecture are the following: take in account the heterogeneity of
data, guarantee the data freshness, enhance the deadline miss ratio even in
the presence of conflicts and finally satisfy the requirements of users by the
improving of the quality of service (QoS). In real-time spatial Big Data, the
performance can be increased by allowing concurrent execution of transactions.
This activity is called concurrency control. Concurrency control algorithm must
be used to ensure serializability of transaction scheduling and too maintain data
consistency. Several works have been done in this area but without holding in
account the existence of a huge volume of data. As a solution, we propose, as
a second contribution, an improvement of an existing Two-Shadow Speculative
Concurrency Control (SCC-2S) with priority proposed in (Yee et al., 2013) with
the use of the imprecise real-time spatial transaction. Finally, a simulation study
is shown to prove that our contributions can achieve a significant performance
improvement using the TPC-DS (TPC, 2014) benchmark.
Keywords: Heterogeneous Real-Time Geospatial Data; Update Transaction; User Transaction; Feedback Control Scheduling; Quality of Service; Nested Transaction; Speculative Concurrency Control; Imprecise Computation; Quality of Service; Simulation.
A systematic approach to efficiently managing the effects of retroactive updates of time-varying data in multiversion XML databases
by Hind Hamrouni, Zouhaier Brahmia, Rafik Bouaziz
Abstract: A retroactive update is an update that changes a past data. It is a common operation in both conventional and temporal databases. However, in temporal databases, a retroactive update is challenging since it could lead to data inconsistencies if the retroactively updated data were used for creating other data (like social contributions and taxes which are calculated based on the salaries of the employees). Such data inconsistencies must be repaired in order to preserve the database consistency. In this paper, we extend our previous approach on detecting and repairing automatically data inconsistencies that result from retroactive updates of multiversion temporal XML databases. The extension consists in: 1) providing an enhanced version of the architecture of our approach and explaining the process of handling a retroactive update; 2) showing how to extract data dependencies and how to use them in order to repair detected inconsistencies; 3) proposing a new log structure ensuring a complete and useful history of the executed transactions; 4) presenting a tool, named Retro-Update-Manager, that we have developed to prove technically our approach.
Keywords: XML database; temporal database; schema versioning; retroactive update; inconsistency period; data inconsistency; data dependency; transaction log.
K-means** - a fast and efficient K-means algorithms
by Cuong Duc Nguyen, Trong Hai Duong
Abstract: K-means often converges to a local optimum. In improved versions of K-means, k-means++ is well-known for achieving a rather optimum solution with its cluster initialisation strategy and high computational efficiency. Incremental K-means is recognised for its converging to the empirically global optimum but having a high complexity due to its stepping of the number of clusters K. The paper introduces K-means** with a doubling strategy on K. Additional techniques, including only doubling big enough clusters, stepping K for the last few values and searching on other candidates for the last K, are used to help K-means** have a complexity of O(K logK), which is lower than the complexity of incremental K-means, and still converge to empirically global optimum. On a set of synthesis and real datasets, K-means** archive the minimum results in almost of test cases. K-means** is much faster than incremental K-means and comparable with the speed of k-means++.
Keywords: data clustering; K-means; k-means++; incremental K-means; IKM; data mining.
OB-tree: a new write optimisation index on out-of-core column-store databases
by Feng Yu, Tyler J. Matacic, Brandon J. Latronica, Wen-Chi Hou
Abstract: The column-store database is a representative of next generation databases featuring a high reading speed. Write optimisation in the out-of-core column-store database remains a well-known challenge. Timestamped binary association table (or TBAT) and asynchronous out-of-core update (or AOC update) have shown improvements in write performance. However, a common restriction shared by the timestamp-based approaches is that, after a time period of updates, the searching performance will gradually decrease. In this work, we introduce a new index, called Offset B+-tree (or OB-tree), to further improve the data retrieval speed after many updates have taken place. OB-tree is a flexible and robust index that employs a special pointer elimination strategy to reduce the storage costs. Succinctly designed, OB-tree can be easily integrated into existing timestamp-based column-store databases. Extensive experiments show that OB-tree can be efficiently constructed and significantly improves the data retrieval speed on the TBAT even when a large number of updates occurred.
Keywords: column-store database; write optimisation; index; B+-tree.
Personality modelling and sentiment analysis on Chinese micro-blog posts
by Kai Gao, Chongyang Yue, Siyu Li, Duoxing Liu, Erliang Zhou, Herbert Daly
Abstract: Intelligent information process such as opinion mining and sentiment analysis on social media remains an ongoing challenge, and it is also useful in public opinion surveillance. Analysing micro-blog posts is often hampered by their very brief content as well as the use of misspelled or abbreviated words. This paper focuses on personality modelling and sentiment analysis on Chinese micro-blog posts. Social media data pre-processing, to identify named entities and word sense disambiguation, is essential. The proposed pre-processing includes the double-array trie based segmentation and viterbi based word sense disambiguation, together with the co-occurrence probability based processing of unknown words. The personality modelling procedure vectorises micro-blog posts into high dimension eigenvectors. As for the sentiment analysis, this paper proposes the multi-convolutional neural method to solve the sentiment tendency determination problem. The experimental results show the feasibility of the approach, and existing problems and future works are also present in the end.
Keywords: intelligent information; opinion mining; sentiment analysis; social media; micro-blog; personality modelling; pre-processing; double-array trie; segmentation; word sense disambiguation; neural; tendency.
Intensional FOL for reasoning about probabilities and probabilistic logic programming
by Zoran Majkić, Bhanu Prasad
Abstract: It is important to have a logic, both for computation of probabilities and for reasoning about probabilities, with well-defined syntax and semantics. The current approaches, which are based on Nilsson's probability structures/logics as well as linear inequalities, to reason about probabilities, have some deficiencies. In this research, we have presented a complete revision of those approaches and have shown that the logic for reasoning about probabilities can be naturally embedded into a 2-valued intensional first-order logic (FOL) with intensional abstraction, by avoiding current ad-hoc system composed of two different 2-valued logics: one for the classical propositional logic at a lower-level and a new one, at a higher-level, for probabilistic constraints with probabilistic variables. The theoretical results that are obtained are applied to probabilistic logic programming.
Keywords: probabilities; 2-valued intensional first-order logic; Nilsson's probability structures; linear inequalities.
Special Issue on: Big Data and Decision Sciences in Management and Engineering
Semantic Role Labeling of English Tweets Through Sentence Boundary Detection
by Dwijen Rudrapal, Amitava Das
Abstract: Social media service like Twitter has become a trendy communication medium for online users to share quick and up-to-date information. However, the tweets are extremely noisy, full of spelling and grammatical mistakes which pose unique challenges towards semantic information extraction. One prospective solution to this problem is semantic role labeling (SRL), which focuses on unifying variations in the facade syntactic forms of semantic relations. SRL for tweets plays central role in a wide range of tweet related applications associated with semantic information extraction. In this paper, we proposed an automatic SRL system for English tweets by identifying sentences and using Sequential minimal optimization (SMO). We conducted experiments on our SRL annotated dataset to evaluate proposed approach and report better performance than existing state-of-the art SRL systems for English tweets.
Keywords: Tweet Stream; Semantic Role Labeling; Tweet Summarization; Machine Learning Algorithm.
Special Issue on: Data Security, Privacy and Trust
Modelling behaviour of Cyber-Physical System and verifying its safety based on algebra of event
by Mingfu Tuo
Abstract: In Cyber-Physical System, computing unit and physical process are usually integrated deeply. This brings great difficulty for modelling Cyber-Physical System and verifying its properties. We propose algebra of events (AOE) to describe the process of composite events in complex event process. Then, we present an extended hybrid automata based on AOE. It can describe the transition among several states through actuator in CPS better. At last, we model the lunar rover by the extended hybrid automata. The simulation based on this model is introduced to verify the correctness and performance. Simulation result shows that the lunar rover can walk autonomously and safely.
Keywords: Cyber-Physical System; Event-driven; Modelling ; Verification; Lunar rover.
An improved trusted method for global congestion price computing based on software defined networking in data-centred network
by Shan Chun, Chen Xiaolong
Abstract: The past computing method of the link status price is difficult to meet the needs of dynamic network. This paper is presented to take full advantage of the feature that the global link status information can be grasped by the central controller of Software Defined Networking (SDN) architecture and proposed a two-tier trusted method for global congestion price computing (GCPC). The ratio argument vector B is given by the upper-tier method with the machine learning algorithm and Fuzzy C-means clustering algorithm according to the global network operating status information. The link status price is computed according to ratio argument vector B. The simulation results show that the link status price calculated by GCPC method is trusted and this improved method can efficiently promote the throughput of bisection bandwidth.
Keywords: data-centred; software defined networking; the trusted price; global link status.
Edge Computing Based Security Authentication Algorithm for Multiple RFID tags
by He XU, Jie Ding, Peng Li, Ruchuan Wang
Abstract: With the development of Internet of Things (IoT), Radio Frequency IDentification (RFID) and Cloud Computing technology are widely used in many areas. However, many data are generated by RFID systems which make the Cloud-based systems to process those data slowly. Edge computing is emergence for imporving the performance of Cloud-based systems. In this paper, a multi-tag authentication algorithm based on edge computing is proposed. The algorithm regards RFID reader and tag as the nodes of edge computing, and uses the tag and reader's computing ability to process and streamline the security authentication information. The authentication server can perform multiple tags' certification and identify false tags in RFID systems. The algorithm has the following advantages: (1) Edge computing is applied to the RFID authentication process, which can reduce the pressure on the authentication server. (2) The amount of data exchanged between the tag and the reader is reduced, thus avoiding signal collisions in wireless channels. (3) The tag itself has the ability to calculate the equivalent of the node in the edge computing systems, and handle its own ID to be detected by the reader as 1 bit signal, which can prevent too much information interaction between the tag and the reader, therefor, it can protect the privacy of tags.
Keywords: RFID; Security; Authentication; Edge computing; Privacy.
A Publicly Verifiable Network Coding Scheme With Null-Space HMAC
by Chen Yonghui
Abstract: The encode-and-forward mechanism of Network Coding (NC) system, not only could provide increased network throughput, but also might get seriously vulnerable to pollution attacks. It has been an interesting and challenging topic how to design a secure, efficient and publicly verifiable homomorphic NC scheme. The existing cryptography-based NC schemes are grouped in either Public Key Cryptosystem (PKC), or Symmetric Key Cryptosystem (SKC). NC schemes in PKC naturally have public verifiability, but imply much more computation cost and longer operation delay. NC schemes in SKC have cheaper computations cost, but are dilemma about how to share the secret key to those intermediate nodes who might be malicious. Therefore, in this paper, we provide a new NC scheme that based on null-space HMAC with hierarchically sharing keys. The inner sharing keys are for the destination nodes to verify the integrity of the messages; the outer sharing keys are for the intermediate nodes to verify the integrity of the received packets. Our scheme shows a way how to balance the computation efficiency and the public verifiability for the NC system with a SKC scheme.
Keywords: network coding; publicly verifiable; Null-Space HMAC.
Image Super-resolution via Gaussian Scale Patch Group Sparse Representation
by Minghu Wu, Yaqi Lu, Nan Zhao, Min Liu, Cong Liu
Abstract: This passage puts forward a Gaussian scale patch group sparse representation method, to solve the shortage problem of traditional image super-resolution restoration schemes. Our image reconstruction method is focused on the optimization of sparse representation method model, which brings the method and performance improvement to image sparse reconstruction. The overall framework of our approach is as follows. First of all, we utilized the nonlocal similar patches to extract the patch groups, and then we using the simultaneous sparse coding to develop a nonlocal extension of Gaussian scale mixture model. In the end, we integrate the patch group model and Gaussian scale sparsity model into encoding framework. The Experimental simulation results show that the proposed framework method can both maintain the clarity of the edge and also inhibit the bad artifacts. Our method can provide better recovery performance than the framework using original algorithm at low peak signal to noise ratio. More importantly, our method often provides a higher subjective / objective quality of reconstructed images than other competing methods. For the simulation images used in this paper, the proposed algorithm outperforms the advanced PGPD and NCSR image reconstruction methods. Analyzing the PSNR value, in the best condition, our method can improve 0.55dB compared with PGPD method and 0.58dB more than NCSR method.
Keywords: Sparse Representation; image super-resolution;patch grouping; gaussian scale mixture.
On the rotation boolean permutation
by Zhou Yu
Abstract: Internet of Things need the encryption algorithm with a piece of small area with small-scale. This paper obtains some rotation boolean permutation by
the matrix of linear expressions, and constructs three methods of rotation
nonlinear boolean permutations. The sub-functions of
the three permutations have some properties with three monomials, hight degree, 2-algebra immunity. Finally, we derive the disjoint spectra Boolean
Keywords: Stream cipher; Boolean function; Rotation boolean permutation.
Multi-hypothesis compressed video sensing by two-step iterative thresholding
by Rui Chen, Ying Tong, Jie Yang, Minghu Wu
Abstract: Traditional distributed video coding schemes achieved compression by using a single reference frame for side information. To improve the quality of decoded video, we propose a novel scheme for decoding video using more than one video frame as reference. For overall consideration of different features of video sequences and the temporal and spatial correlation, multi-hypothesis predictions of the current frame are applied to refine the side information for non-key frames reconstruction. Three side information candidates can be obtained by applying multi-hypothesis predictions and bi-directional motion estimation on the non-key frame and its bi-directional reference frames. Furthermore, the correlation coefficients between the non-key frame and the three candidates are calculated respectively, then the most similar side information is chosen to recover the non-key frame. Finally, the reconstruction algorithm BCS-SPL (Blocked Compressed Sensing with a Smoothed Project-Landweber) is improved by adopting two-step iterative thresholding for further enhancing the recovery quality of the non-key frames. Experimental results demonstrate that the proposed scheme outperforms the original schemes based on MH-BCS-SPL in refining the side information and improve the non-key frame's recovery performance.
Keywords: compressed sensing; multi-hypothesis prediction; distributed video coding; wireless sensor network.