International Journal of Intelligent Information and Database Systems (6 papers in press)
A Systematic Approach to Efficiently Managing the Effects of Retroactive Updates of Time-varying Data in Multiversion XML Databases
by Hind Hamrouni, Zouhaier Brahmia, Rafik Bouaziz
Abstract: A retroactive update is an update that changes a past data. It is a
common operation in both conventional and temporal databases. However, in
temporal databases, a retroactive update is challenging since it could lead to
data inconsistencies if the retroactively updated data were used for creating
other data (like social contributions and taxes which are calculated based on the
salaries of the employees). Such data inconsistencies must be repaired in order
to preserve the database consistency. In this paper, we extend our previous
approach on detecting and repairing automatically data inconsistencies that
result from retroactive updates of multiversion temporal XML databases. The
extension consists in: 1) providing an enhanced version of the architecture of
our approach and explaining the process of handling a retroactive update;
2) showing how to extract data dependencies and how to use them in order to
repair detected inconsistencies; 3) proposing a new log structure ensuring a
complete and useful history of the executed transactions; 4) presenting a tool,
named Retro-Update-Manager, that we have developed to prove technically our
Keywords: XML database; temporal database; schema versioning; retroactive update; inconsistency period; data inconsistency; data dependency; transaction log.
K-means** - A fast and efficient K-means algorithms
by Cuong Duc Nguyen, Trong Hai Duong
Abstract: K-means often converges to a local optimum. In improved versions of K-means, k-means++ is well-known for achieving a rather optimum solution with its cluster initialization strategy and high computational efficiency. Incremental K-means is recognized for its converging to the empirically global optimum but having a high complexity due to its stepping of the number of clusters K. The paper introduces K-means** with a doubling strategy on K. Additional techniques, including Only doubling big enough clusters, Stepping K for the last few values and Searching on other candidates for the last K, are used to help K-means** have a complexity of O(K logK), which is lower than the complexity of Incremental K-means, and still converge to empirically global optimum. On a set of synthesis and real data sets, K-means** archive the minimum results in almost of test cases. K-means** is much faster than Incremental K-means and comparable with the speed of k-means++.
Keywords: Data Clustering; K-means; k-means++; Incremental K-means.
OB-Tree: A New Write Optimization Index on Out-of-Core Column-Store Databases
by Feng Yu, Tyler Matacic, Brandon Latronica, Wen-Chi Hou
Abstract: The column-store database is a representative of next-generation databases featuring a high reading speed. Write optimization in the out-of-core column-store database remains a well-known challenge. Timestamped Binary Association Table (or TBAT) and Asynchronous Out-of-Core Update (or AOC Update) have shown improvements in write performance. However, a common restriction shared by the timestamp-based approaches is that, after a time period of updates, the searching performance will gradually decrease. In this work, we introduce a new index, called Offset B+ Tree (or OB-Tree), to further improve the data retrieval speed after many updates have taken place. OB-tree is a flexible and robust index that employs a special pointer elimination strategy to reduce the storage costs. Succinctly designed, OB-tree can be easily integrated into existing timestamp-based column-store databases. Extensive experiments show that OB-tree can be efficiently constructed and significantly improves the data retrieval speed on the TBAT even when a large number of updates occurred.
Keywords: Column-Store Database; Write Optimization; Index; B+ Tree.
Personality Modeling and Sentiment Analysis on Chinese Micro-blog Posts
by Kai Gao, Siyu Li, Herbert Daly
Abstract: Mining opinions and analyzing the sentiment information in social media content remains an ongoing challenge. It is also useful in public opinion surveillance. Analysis of Chinese micro-blog posts is often hampered by their very brief content as well as the use of misspelled or abbreviated words. In this domain, social media data, pre-processing, to identify named entities and word sense disambiguation, is essential. This paper focuses on personality modeling and sentiment analysis on Chinese micro-blog posts. The pre-processing method includes the double-array trie based segmentation and viterbi based word sense disambiguation, together with the co-occurrence probability based processing of unknown words. Compared with the traditional algorithms, the proposed approach can enhance the performance. On the basis of the above techniques, this paper also demonstrates their application of Chinese micro-blog sentiment analysis. The experimental results show the feasibility of the approach, and existing problems and future works are also present in the end.
Keywords: sentiment analysis; segmentation; word sense disambiguation; personalized modeling.
Special Issue on: Data Security, Privacy and Trust
Modelling behaviour of Cyber-Physical System and verifying its safety based on algebra of event
by Mingfu Tuo
Abstract: In Cyber-Physical System, computing unit and physical process are usually integrated deeply. This brings great difficulty for modelling Cyber-Physical System and verifying its properties. We propose algebra of events (AOE) to describe the process of composite events in complex event process. Then, we present an extended hybrid automata based on AOE. It can describe the transition among several states through actuator in CPS better. At last, we model the lunar rover by the extended hybrid automata. The simulation based on this model is introduced to verify the correctness and performance. Simulation result shows that the lunar rover can walk autonomously and safely.
Keywords: Cyber-Physical System; Event-driven; Modelling ; Verification; Lunar rover.
An improved trusted method for global congestion price computing based on software defined networking in data-centred network
by Shan Chun, Chen Xiaolong
Abstract: The past computing method of the link status price is difficult to meet the needs of dynamic network. This paper is presented to take full advantage of the feature that the global link status information can be grasped by the central controller of Software Defined Networking (SDN) architecture and proposed a two-tier trusted method for global congestion price computing (GCPC). The ratio argument vector B is given by the upper-tier method with the machine learning algorithm and Fuzzy C-means clustering algorithm according to the global network operating status information. The link status price is computed according to ratio argument vector B. The simulation results show that the link status price calculated by GCPC method is trusted and this improved method can efficiently promote the throughput of bisection bandwidth.
Keywords: data-centred; software defined networking; the trusted price; global link status.