International Journal of Intelligent Information and Database Systems (8 papers in press)
A Systematic Approach to Efficiently Managing the Effects of Retroactive Updates of Time-varying Data in Multiversion XML Databases
by Hind Hamrouni, Zouhaier Brahmia, Rafik Bouaziz
Abstract: A retroactive update is an update that changes a past data. It is a common operation in both conventional and temporal databases. However, in temporal databases, a retroactive update is challenging since it could lead to data inconsistencies if the retroactively updated data were used for creating other data (like social contributions and taxes which are calculated based on the salaries of the employees). Such data inconsistencies must be repaired in order to preserve the database consistency. In this paper, we extend our previous approach on detecting and repairing automatically data inconsistencies that result from retroactive updates of multiversion temporal XML databases. The extension consists in (i) providing an enhanced version of the architecture of our approach and explaining the process of handling a retroactive update, (ii) showing how to extract data dependencies and how to use them in order to repair detected inconsistencies, (iii) proposing a new log structure ensuring a complete and useful history of the executed transactions, and (iv) presenting a tool, named Retro-Update-Manager, that we have developed to prove technically our approach.
Keywords: XML database; temporal database; schema versioning; retroactive update; inconsistency period; data inconsistency; data dependency; transaction log.
K-means** - A fast and efficient K-means algorithms
by Cuong Duc Nguyen, Trong Hai Duong
Abstract: K-means often converges to a local optimum. In improved versions of K-means, k-means++ is well-known for achieving a rather optimum solution with its cluster initialization strategy and high computational efficiency. Incremental K-means is recognized for its converging to the empirically global optimum but having a high complexity due to its stepping of the number of clusters K. The paper introduces K-means** with a doubling strategy on K. Additional techniques, including Only doubling big enough clusters, Stepping K for the last few values and Searching on other candidates for the last K, are used to help K-means** have a complexity of O(K logK), which is lower than the complexity of Incremental K-means, and still converge to empirically global optimum. On a set of synthesis and real data sets, K-means** archive the minimum results in almost of test cases. K-means** is much faster than Incremental K-means and comparable with the speed of k-means++.
Keywords: Data Clustering; K-means; k-means++; Incremental K-means.
Specific K-mean clustering-based perceptron for dengue prediction
by Hoang Long Nguyen, Trong Hai Duong, Cuong Phan Nguyen, Duc Cuong Nguyen, Thach Phat Chiem, Manh Hung Nguyen, Thi Nhu Mai Nguyen, Hung Vi Nguyen
Abstract: Traditional neural networks come up with drawback relating to choosing the number of nodes in each layer. This paper proposes a novel adaptive network fuzzy inference system (ANFIS) to overcome the aforementioned problem. In particular, we use incremental k-mean to pre-identify the number of nodes in the adaptive network. Each node includes a set of samples in a training set. For each sample, we identify a fuzzy value of the particular sample data belonging to each node in the network. The learning perceptron algorithm also investigates to adjust weights by learning from real output data. In this study, the novel ANFIS model is employed to the dengue prediction application as well as evaluates performance execution by a real dataset of dengue disease in Tien Giang, Vietnam. The result shows that our proposed model of ANFIS gets better accuracy in comparison with linear regression, multiple linear regression, time series and neural network.
Keywords: perceptron; adaptive network fuzzy inference system; ANFIS; K-mean clustering; neural network; epidemic prediction.
Semi-active learning to rank algorithms for document retrieval
by Faiza Dammak, Hager Kammoun, Sawssen Ben Hmid, Abdelmajid Ben Hamadou
Abstract: Recently, several search engine applications are using learning to rank technologies to train their ranking models whose performance is strongly affected by labelled examples' number in the training set. Since these labels might be costly to acquire as labelling is usually scarce and expensive to get, active learning and semi-supervised learning technologies aim to reduce manual labelling workload. In this paper, we propose two inductive learning to rank strategies of alternatives that combine active and semi-supervised learning to assign the relevance scores to an unlabeled set of document-query pairs, using selectively sampled and automatically labelled data. These propositions enable the exploitation of all collected data and the avoidance of some problems caused by employing only active or semi-supervised learning. We showed through different ranking measures that the algorithms proposed yielded into competitive results compared to some other semi-supervised and active ranking algorithms on collections from the standard benchmark Letor.
Keywords: learning to rank; active learning; semi-supervised learning; supervised learning; document retrieval.
Special Issue on: Model and Data Engineering
Security-aware elasticity for NoSQL databases in multi-cloud environments
by Athanasios Naskos, Anastasios Gounaris, Haralambos Mouratidis, Panagiotis Katsaros
Abstract: We focus on horizontally scaling NoSQL databases in a cloud environment, in order to meet performance requirements while respecting security constraints. The performance requirements refer to strict latency limits on the query response time. The security requirements are derived from the need to address two specific kinds of threats that exist in cloud databases, namely data leakage, mainly due to malicious activities of actors hosted on the same physical machine, and data loss after one or more node failures. A key feature of our approach is that we account for multiple cloud providers offering resources of different characteristics. We explain that usually there is a trade-off between performance and security requirements and we derive a model checking approach to drive runtime decisions that strike a user-defined balance between them taking into account the infrastructure heterogeneity. Finally, we evaluate our proposal using real traces to prove the effectiveness in configuring the trade-offs.
Keywords: security-aware elasticity; horizontal scaling; multi-clouds.
Intelligent system for cultural objects identification, damage assessment and restoration
by Evangelos Sakkopoulos, Erion-Vasilis Pikoulis, Emmanouil Viennas, Nikolaos Nodarakis, Eleni Cheilakou, Amani-Christiana Saint, Maria Koui, Athanasios Tsakalidis
Abstract: Cultural objects and art works need ongoing conservation interventions in order to be available for the next generations. The most object-friendly analysis approaches are based on non-destructive techniques (NDTs) that allow both the materials characterisation as well as the decay detection of cultural artefacts. Non-destructive testing and evaluation includes the employment of several methods such as the well-established technique of diffuse reflectance spectroscopy with fibre optics (FORS). Such techniques produce output with multiple series of data for multiple different pigment used in objects. In this work, we present a data management solution that contributes with: 1) a library of known reference pigments/colours; 2) a proposed pattern matching technique that allows the automatic classification of any new pigment. The experimental evaluation results show that the data processing proposed is effective. Feedback is particularly encouraging as it allows automation and therefore radically decreased time for pigment/colour matching and identification.
Keywords: fibre optics diffuse reflectance spectroscopy; FORS; intelligent management systems; non-destructive techniques; NDT; NDT image analysis.
An interoperable open data framework for discovering popular tours based on geo-tagged tweets
by Gloria Bordogna, Alfredo Cuzzocrea, Luca Frigerio, Giuseppe Psaila, Maurizio Toccu
Abstract: In this paper, we introduce an original approach that exploits timestamped geo-tagged messages posted by Twitter users through their smartphones when they travel to trace their trips. A clustering approach is applied to group similar trips to identify tours, and an interoperable framework is used to share the popular tours on the web, in order to analyse them in relation with local geo-located territorial resources. Tools developed to reconstruct and mine the tours of tourists within a region are described, which identify, track, and group the tourists' trips through a knowledge-based approach, exploiting timestamped geo-tagged information associated with Twitter messages sent by tourists while travelling. The collected tracks are managed and shared on the web in compliance with OGC standards so as to be able to analyse the characteristic of localities visited by the tourists by spatial overlaying with other open geo-spatial data, such as maps of points of interest (POIs) of distinct type. The result is a novel interoperable framework, based on web-service technology.
Keywords: big data analytics; knowledge discovery from geo-located tweets; intelligent systems.
Cloud patterns for mobile collaborative applications
by Nadir Guetmi, Abdessamad Imine
Abstract: Deploying collaborative applications (e.g., group editors) over mobile devices is problematic because these devices will always be resource-poor and with unstable connectivity and constrained energy. To overcome these limitations, one straightforward solution is to leverage mobile collaboration via the cloud. This emerged model relies on virtualisation for efficient and flexible use of hardware assets and software services over a network without requiring user intervention. However, designing collaborative applications with flexibility and reusability has become a hot topic in mobile cloud computing as no mature models have been proposed yet. In this paper, we describe cloud patterns (i.e., extension of classic design patterns) focusing on the description of mobile real time data sharing through the cloud. Our design model consists of two levels: the first one provides self-protocol to create clones of mobile devices, manage users' groups and recover failed clones in the cloud. As for the second level, it supports group collaboration mechanisms for data sharing between mobile users via their clones. Our patterns have been used as a basis for the design of: 1) MidBox a platform for supporting mobile collaboration over a private cloud; 2) OptiCloud a cloud service for scalable real-time editing works.
Keywords: collaboration; mobile data sharing; mobile cloud computing; MCC; cloud pattern; cloning middleware; synchronisation.