International Journal of Intelligent Information and Database Systems (12 papers in press)
A Systematic Approach to Efficiently Managing the Effects of Retroactive Updates of Time-varying Data in Multiversion XML Databases
by Hind Hamrouni, Zouhaier Brahmia, Rafik Bouaziz
Abstract: A retroactive update is an update that changes a past data. It is a
common operation in both conventional and temporal databases. However, in
temporal databases, a retroactive update is challenging since it could lead to
data inconsistencies if the retroactively updated data were used for creating
other data (like social contributions and taxes which are calculated based on the
salaries of the employees). Such data inconsistencies must be repaired in order
to preserve the database consistency. In this paper, we extend our previous
approach on detecting and repairing automatically data inconsistencies that
result from retroactive updates of multiversion temporal XML databases. The
extension consists in: 1) providing an enhanced version of the architecture of
our approach and explaining the process of handling a retroactive update;
2) showing how to extract data dependencies and how to use them in order to
repair detected inconsistencies; 3) proposing a new log structure ensuring a
complete and useful history of the executed transactions; 4) presenting a tool,
named Retro-Update-Manager, that we have developed to prove technically our
Keywords: XML database; temporal database; schema versioning; retroactive update; inconsistency period; data inconsistency; data dependency; transaction log.
K-means** - A fast and efficient K-means algorithms
by Cuong Duc Nguyen, Trong Hai Duong
Abstract: K-means often converges to a local optimum. In improved versions of K-means, k-means++ is well-known for achieving a rather optimum solution with its cluster initialization strategy and high computational efficiency. Incremental K-means is recognized for its converging to the empirically global optimum but having a high complexity due to its stepping of the number of clusters K. The paper introduces K-means** with a doubling strategy on K. Additional techniques, including Only doubling big enough clusters, Stepping K for the last few values and Searching on other candidates for the last K, are used to help K-means** have a complexity of O(K logK), which is lower than the complexity of Incremental K-means, and still converge to empirically global optimum. On a set of synthesis and real data sets, K-means** archive the minimum results in almost of test cases. K-means** is much faster than Incremental K-means and comparable with the speed of k-means++.
Keywords: Data Clustering; K-means; k-means++; Incremental K-means.
OB-Tree: A New Write Optimization Index on Out-of-Core Column-Store Databases
by Feng Yu, Tyler Matacic, Brandon Latronica, Wen-Chi Hou
Abstract: The column-store database is a representative of next-generation databases featuring a high reading speed. Write optimization in the out-of-core column-store database remains a well-known challenge. Timestamped Binary Association Table (or TBAT) and Asynchronous Out-of-Core Update (or AOC Update) have shown improvements in write performance. However, a common restriction shared by the timestamp-based approaches is that, after a time period of updates, the searching performance will gradually decrease. In this work, we introduce a new index, called Offset B+ Tree (or OB-Tree), to further improve the data retrieval speed after many updates have taken place. OB-tree is a flexible and robust index that employs a special pointer elimination strategy to reduce the storage costs. Succinctly designed, OB-tree can be easily integrated into existing timestamp-based column-store databases. Extensive experiments show that OB-tree can be efficiently constructed and significantly improves the data retrieval speed on the TBAT even when a large number of updates occurred.
Keywords: Column-Store Database; Write Optimization; Index; B+ Tree.
Personality Modeling and Sentiment Analysis on Chinese Micro-blog Posts
by Kai Gao, Siyu Li, Herbert Daly
Abstract: Mining opinions and analyzing the sentiment information in social media content remains an ongoing challenge. It is also useful in public opinion surveillance. Analysis of Chinese micro-blog posts is often hampered by their very brief content as well as the use of misspelled or abbreviated words. In this domain, social media data, pre-processing, to identify named entities and word sense disambiguation, is essential. This paper focuses on personality modeling and sentiment analysis on Chinese micro-blog posts. The pre-processing method includes the double-array trie based segmentation and viterbi based word sense disambiguation, together with the co-occurrence probability based processing of unknown words. Compared with the traditional algorithms, the proposed approach can enhance the performance. On the basis of the above techniques, this paper also demonstrates their application of Chinese micro-blog sentiment analysis. The experimental results show the feasibility of the approach, and existing problems and future works are also present in the end.
Keywords: sentiment analysis; segmentation; word sense disambiguation; personalized modeling.
Intensional FOL for Reasoning About Probabilities and Probabilistic Logic Programming
by Zoran Majkic, Bhanu Prasad
Abstract: It is important to have a logic, both for computation of probabilities and for reasoning about probabilities, with well-defined syntax and semantics. The current approaches, which are based on Nilsson's probability structures/logics as well as linear inequalities, to reason about probabilities, have some deficiencies. In this research, we have presented a complete revision of those approaches and have shown that the logic for reasoning about probabilities can be naturally embedded into a 2-valued intensional First-Order Logic (FOL) with intensional abstraction, by avoiding current ad-hoc system composed of two different 2-valued logics: one for the classical propositional logic at a lower-level and a new one, at a higher-level, for probabilistic constraints with probabilistic variables. The theoretical results that are obtained are applied to probabilistic logicprogramming.
Keywords: Probabilities; 2-valued intensional first-order logic; Nilsson's probability
structures; Linear inequalities.
Special Issue on: Data Security, Privacy and Trust
Modelling behaviour of Cyber-Physical System and verifying its safety based on algebra of event
by Mingfu Tuo
Abstract: In Cyber-Physical System, computing unit and physical process are usually integrated deeply. This brings great difficulty for modelling Cyber-Physical System and verifying its properties. We propose algebra of events (AOE) to describe the process of composite events in complex event process. Then, we present an extended hybrid automata based on AOE. It can describe the transition among several states through actuator in CPS better. At last, we model the lunar rover by the extended hybrid automata. The simulation based on this model is introduced to verify the correctness and performance. Simulation result shows that the lunar rover can walk autonomously and safely.
Keywords: Cyber-Physical System; Event-driven; Modelling ; Verification; Lunar rover.
An improved trusted method for global congestion price computing based on software defined networking in data-centred network
by Shan Chun, Chen Xiaolong
Abstract: The past computing method of the link status price is difficult to meet the needs of dynamic network. This paper is presented to take full advantage of the feature that the global link status information can be grasped by the central controller of Software Defined Networking (SDN) architecture and proposed a two-tier trusted method for global congestion price computing (GCPC). The ratio argument vector B is given by the upper-tier method with the machine learning algorithm and Fuzzy C-means clustering algorithm according to the global network operating status information. The link status price is computed according to ratio argument vector B. The simulation results show that the link status price calculated by GCPC method is trusted and this improved method can efficiently promote the throughput of bisection bandwidth.
Keywords: data-centred; software defined networking; the trusted price; global link status.
Edge Computing Based Security Authentication Algorithm for Multiple RFID tags
by He XU, Jie Ding, Peng Li, Ruchuan Wang
Abstract: With the development of Internet of Things (IoT), Radio Frequency IDentification (RFID) and Cloud Computing technology are widely used in many areas. However, many data are generated by RFID systems which make the Cloud-based systems to process those data slowly. Edge computing is emergence for imporving the performance of Cloud-based systems. In this paper, a multi-tag authentication algorithm based on edge computing is proposed. The algorithm regards RFID reader and tag as the nodes of edge computing, and uses the tag and reader's computing ability to process and streamline the security authentication information. The authentication server can perform multiple tags' certification and identify false tags in RFID systems. The algorithm has the following advantages: (1) Edge computing is applied to the RFID authentication process, which can reduce the pressure on the authentication server. (2) The amount of data exchanged between the tag and the reader is reduced, thus avoiding signal collisions in wireless channels. (3) The tag itself has the ability to calculate the equivalent of the node in the edge computing systems, and handle its own ID to be detected by the reader as 1 bit signal, which can prevent too much information interaction between the tag and the reader, therefor, it can protect the privacy of tags.
Keywords: RFID; Security; Authentication; Edge computing; Privacy.
A Publicly Verifiable Network Coding Scheme With Null-Space HMAC
by Chen Yonghui
Abstract: The encode-and-forward mechanism of Network Coding (NC) system, not only could provide increased network throughput, but also might get seriously vulnerable to pollution attacks. It has been an interesting and challenging topic how to design a secure, efficient and publicly verifiable homomorphic NC scheme. The existing cryptography-based NC schemes are grouped in either Public Key Cryptosystem (PKC), or Symmetric Key Cryptosystem (SKC). NC schemes in PKC naturally have public verifiability, but imply much more computation cost and longer operation delay. NC schemes in SKC have cheaper computations cost, but are dilemma about how to share the secret key to those intermediate nodes who might be malicious. Therefore, in this paper, we provide a new NC scheme that based on null-space HMAC with hierarchically sharing keys. The inner sharing keys are for the destination nodes to verify the integrity of the messages; the outer sharing keys are for the intermediate nodes to verify the integrity of the received packets. Our scheme shows a way how to balance the computation efficiency and the public verifiability for the NC system with a SKC scheme.
Keywords: network coding; publicly verifiable; Null-Space HMAC.
Image Super-resolution via Gaussian Scale Patch Group Sparse Representation
by Minghu Wu, Yaqi Lu, Nan Zhao, Min Liu, Cong Liu
Abstract: This passage puts forward a Gaussian scale patch group sparse representation method, to solve the shortage problem of traditional image super-resolution restoration schemes. Our image reconstruction method is focused on the optimization of sparse representation method model, which brings the method and performance improvement to image sparse reconstruction. The overall framework of our approach is as follows. First of all, we utilized the nonlocal similar patches to extract the patch groups, and then we using the simultaneous sparse coding to develop a nonlocal extension of Gaussian scale mixture model. In the end, we integrate the patch group model and Gaussian scale sparsity model into encoding framework. The Experimental simulation results show that the proposed framework method can both maintain the clarity of the edge and also inhibit the bad artifacts. Our method can provide better recovery performance than the framework using original algorithm at low peak signal to noise ratio. More importantly, our method often provides a higher subjective / objective quality of reconstructed images than other competing methods. For the simulation images used in this paper, the proposed algorithm outperforms the advanced PGPD and NCSR image reconstruction methods. Analyzing the PSNR value, in the best condition, our method can improve 0.55dB compared with PGPD method and 0.58dB more than NCSR method.
Keywords: Sparse Representation; image super-resolution;patch grouping; gaussian scale mixture.
On the rotation boolean permutation
by Zhou Yu
Abstract: Internet of Things need the encryption algorithm with a piece of small area with small-scale. This paper obtains some rotation boolean permutation by
the matrix of linear expressions, and constructs three methods of rotation
nonlinear boolean permutations. The sub-functions of
the three permutations have some properties with three monomials, hight degree, 2-algebra immunity. Finally, we derive the disjoint spectra Boolean
Keywords: Stream cipher; Boolean function; Rotation boolean permutation.
Multi-hypothesis compressed video sensing by two-step iterative thresholding
by Rui Chen, Ying Tong, Jie Yang, Minghu Wu
Abstract: Traditional distributed video coding schemes achieved compression by using a single reference frame for side information. To improve the quality of decoded video, we propose a novel scheme for decoding video using more than one video frame as reference. For overall consideration of different features of video sequences and the temporal and spatial correlation, multi-hypothesis predictions of the current frame are applied to refine the side information for non-key frames reconstruction. Three side information candidates can be obtained by applying multi-hypothesis predictions and bi-directional motion estimation on the non-key frame and its bi-directional reference frames. Furthermore, the correlation coefficients between the non-key frame and the three candidates are calculated respectively, then the most similar side information is chosen to recover the non-key frame. Finally, the reconstruction algorithm BCS-SPL (Blocked Compressed Sensing with a Smoothed Project-Landweber) is improved by adopting two-step iterative thresholding for further enhancing the recovery quality of the non-key frames. Experimental results demonstrate that the proposed scheme outperforms the original schemes based on MH-BCS-SPL in refining the side information and improve the non-key frame's recovery performance.
Keywords: compressed sensing; multi-hypothesis prediction; distributed video coding; wireless sensor network.