# Forthcoming articles

International Journal of Computational Science and Engineering

These articles have been peer-reviewed and accepted for publication in IJCSE, but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Articles marked with this Open Access icon are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues of IJCSE are published online.

We also offer RSS feeds which provide timely updates of tables of contents, newly published articles and calls for papers.

 International Journal of Computational Science and Engineering (206 papers in press)  Regular Issues  Cricket chirping algorithm: an efficient meta-heuristic for numerical function optimisation   by Jonti Deuri, Siva Sathya Sundaram Abstract: Nature-inspired meta-heuristic algorithms have proved to be very powerful in solving complex optimisation problems in recent times. The literature reports several inspirations from nature, exploited to solve computational problems. This paper is yet another step in the journey towards the use of natural phenomena for seeking solutions to complex optimisation problems. In this paper, a new meta-heuristic algorithm based on the chirping behaviour of crickets is formulated to solve optimisation problems. It is validated against various benchmark test functions and then compared with popular state-of-the-art optimisation algorithms, such as genetic algorithm, particle swarm optimisation, bat algorithm, artificial bee colony algorithm and cuckoo search algorithm for performance efficiency. Simulation results show that the proposed algorithm outperforms its counterparts in terms of speed and accuracy. The implication of the results and suggestions for further research are also discussed. Keywords: optimisation; meta-heuristic algorithm; numerical function, cuckoo search; artificial bee colony; particle swarm optimisation; genetic algorithm; cricket chirping algorithm; calling chirp; aggressive chirp Optimising the stiffness matrix integration of n-noded 3D finite elements   by J.C. Osorio, M. Cerrolaza, M. Perez Abstract: The integration of the stiffness and mass matrices in finite element analysis is a time-consuming task. When dealing with large problems having very fine discretisations, the finite element mesh becomes very large and several thousands of elements are usually needed. Moreover, when dealing with nonlinear dynamic problems, the CPU time required to obtain the solution increases dramatically because of the large number of times the global matrix should be computed and assembled. This is the reason why any reduction in computer time (even being small) when evaluating the problem matrices is of concern for engineers and analysts. The integration of the stiffness matrix of n-noded high-order hexahedral finite elements is carried out by taking advantage of some mathematical relations found among the nine terms of the nodal stiffness matrix, previously found for the more simple brick element. Significant time savings were obtained in the 20-noded finite element example case. Keywords: stiffness matrix, finite elements, n-noded hexahedral elements, saving CPU time A cost-effective graph-based partitioning algorithm for a system of linear equations   by Hiroaki Yui, Satoshi Nishimura Abstract: There are many techniques for reducing the number of operations in directly solving a system of sparse linear equations. One such method is nested dissection (ND). In numerical analysis, the ND algorithm heuristically divides and conquers a system of linear equations, based on graph partitioning. In this article, we present a new algorithm for the first level of such graph partitioning, which splits a graph into two roughly equal-sized subgraphs. The algorithm runs in almost linear time. We evaluate and discuss the solving costs by applying the proposed algorithm to various matrices. Keywords: sparse matrix; nested dissection; graph partitioning; graph algorithm; Kruskal’s algorithm; Gaussian elimination; bit vector; adjacent list; refinement; system of equations. A based-on-set-partitioning exact approach to multi-trip of picking up and delivering customers to airports   by Wei Sun, Yang Yu, Jia Li Abstract: Picking up and delivering customers to airports (PDCA) is a new service provided in China. The multi-trip mode of PDCA (MTM-PDCA) service is a promising measure to reduce operation costs. To obtain the exact solution, we propose a novel modelling approach including two stages. In the first stage, all feasible trips of each subset of the customer point set are produced, and then the two local optimum trips of each subset can be obtained easily. Subsequently, using the local optimum trips obtained in the first stage, we establish the novel trip-oriented set-partitioning (TO-SP) model to formulate MTM-PDCA. The MTM-PDCA based on the TO-SP model can be solved exactly by CPLEX. By testing extensive instances, we summarise several managerial insights that can be used to successfully reduce the costs of PDCA by using multi-trip mode. Keywords: multi-trip; single-trip; set-partitioning; exact approach. Reliability prediction and QoS selection for web service composition   by Liping Chen, Weitao Ha Abstract: Web service composition is a distributed model to construct new web services on top of existing primitive or other composite web services. The key issues in the development of web service composition are the dynamic and efficient reliability prediction and the appropriate selection of component services. However, the reliability of the service-oriented systems heavily depends on the remote web services as well as the unpredictable internet. Thus, it is hard to predict the system reliability. In addition, there are many reliable functionally equivalent partner services for the same composite service which have different Quality of Service (QoS). It is important to identify the best QoS candidate web services from a set of functionally equivalent services. But efficient selection from the large numbers of candidate web services brings challenges to the existing methods. In this paper, we discuss web service composition in two ways: reliability prediction and QoS optimal selection. First, we propose a reliability prediction model based on Petri net. For atomic services, a staged reliability model is provided which predicts reliability from network environment availability, hermit equipment availability, discovery reliability and binding reliability. To address the complex connecting relationship among subservices, places of basic Petri net for input and output are extended to some subtypes for multi-source input place and multiuse output place. Secondly, we use a new skyline algorithm based on an R-tree index. The index tree is traversed to judge whether it is dominated by the candidate skyline sets. The leaf points store optimal component services. Experimental evaluation of real and synthetic data shows the effectiveness and efficiency of the proposed approach. The approach has been implemented and has been used in the context of travel process mining. Although the results are presented in the context of Petri nets, the approach can be applied to any process modelling language with executable semantics. Keywords: web service composition, atomic services, reliability prediction, QoS, skyline, optimisation Cost-sensitive ensemble classification algorithm for medical image   by Minghui Zhang, Haiwei Pan, Niu Zhang, Xiaoqin Xie, Zhiqiang Zhang, Xiaoning Feng Abstract: Medical image classification is an important part of domain-specific application image mining. In this paper, we quantify the domain knowledge about medical images for feature extraction. We propose a cost-sensitive ensemble classification algorithm(CEC), which uses a new training method and adopts a new method to acquire parameters. In the weak classifier training process, we mark the samples that are wrongly classified in the former iteration, use the method of re-sampling in the samples that are correctly classified, and put all the wrongly classified in the next training. The classification can pay more attention to those samples that are hardly classified. The weight parameters of weak classifiers are determined not only by the error rates, but also by their abilities to recognise the positive samples. Experimental results show that our algorithm is more efficient for medical image classification. Keywords: medical image; domain knowledge; cost-sensitive learning; ensemble classification. Mining balanced API protocols   by Deng Chen, Yanduo Zhang, Wei Wei, Rongcun Wang, Huabing Zhou, Xun Li, Binbin Qu Abstract: API protocols can be used in many aspects of software engineering, such as software testing, program validation, and software documentation. Mining API protocols based on probabilistic models is proved to be an effective approach to achieve protocols automatically. However, it always achieves unbalanced protocols, that is, protocols described using probabilistic models have unexpected extremely high and low probabilities. In this paper, we discuss the unbalanced probability problem and propose to address it by preprocessing method call sequences used for training. Our method first finds tandem arrays in method call sequences based on the suffix tree. Then, it substitutes each tandem array with a tandem repeat. Since repeated sub method call sequences are eliminated, balanced API protocols may be achieved. In order to investigate the feasibility and effectiveness of our approach, we implemented it in our previous prototype tool ISpecMiner, and used the tool to perform a comparison test based on several real-world applications. Experimental results show that our approach can achieve more balanced API protocols than existing approaches, which is essential for mining valid and precise API protocols. Keywords: mining API protocol; suffix tree; probability balance; method call sequence; Markov model; tandem array Advanced DDOS detection and mitigation technique for securing cloud   by Masoumeh Zareapoor, Pourya Shamsolmoali, M.Afshar Alam Abstract: Distributed Denial of Service (DDoS) attacks have become a serious problem for internet security and cloud computing. This kind of attack is the most complex form of DoS (Denial of Service) attacks. This type of attack can simply duplicate its source address, such as spoofing attack, which disguises the real location of the attack. Therefore, DDoS attack is the most significant challenge for network security. In this paper, we present a model to detect and mitigate DDOS attacks in cloud computing. The proposed model requires very small storage and it has the ability of fast detection. The experiment results show that the system is able to mitigate most of the attacks. Detection accuracy and processing time were the metrics used to evaluate the performance of the proposed model. From the results, it is evident the system achieves high detection accuracy (97%) with some minor false alarms. Keywords: distributed denial-of-service; DDOS; information divergence; cloud security; filtering Global and local optimisation based hybrid approach for cloud service composition   by Jyothi Shetty, Demian Antony D'Mello Abstract: The goal of service composition is to find the best set of services to meet the user's requirements. The efficient local optimisation methods may fail to satisfy the users end-to-end requirements. Global optimisation methods are popular when the users end-to-end requirements are to be satisfied. Optimal composition to end-to-end requirements consumes exponential time, in the case of a large search space. Metaheuristic methods are being used to solve this problem, which give near-optimal solutions. This paper proposes an approach in which both local and global optimisations are used. In order to avoid local optimums during local optimisation, the proposed work selects a set of best services from each task and then uses a global optimisation method on the smaller search space to select the best composition. In order to reduce the communication costs, the optimal solution identifies the minimum number of clouds for composition. Keywords: cloud service; service composition; task level selection; global optimisation; local optimisation; exact algorithm. Homomorphisms between the covering information systems   by Zengtai Gong, Runli Chai, Yongping Guo Abstract: The system of information is an important mathematical model in many fields, such as data mining, artificial intelligence, and machine learning. The relation or mapping is a popular method for exploring the communication between two systems of information. In this paper, we first introduce the concepts of the covering relation or mapping and the inverse covering relation or mapping between two covering systems of information and investigate their properties. Then, we propose the view of homomorphism of covering systems of information that are based on covering relation. Moreover, we prove that attribute reductions in the image system and original system are equivalent to each other under the conditions of homomorphism given in this paper. Keywords: covering-based rough sets; homomorphism of information; attribute reductions. The remote farmland environment monitoring system based on ZigBee sensor network   by Yongfei Ye, Xinghua Sun, Minghe Liu, Zhisheng Zhao, Xiao Zhang, Hongxi Wu Abstract: In order to change the traditional management of agricultural production, ZigBee technology is used in short distance wireless transmission to design an intelligent farmland environment remote monitoring system, which integrates communication, computing and all aspects of network technology. The real-time accurate data collection of farmland soil pH value, the temperature and humidity surrounding the plants, illumination intensity and crop chlorophyll content, all provide reliable data for the intelligent agricultural production, thereby the level of intelligence of agricultural management is increased. Based on precision guidance, irrigation will become intelligent, which can avoid the waste of water resources and low use rate caused by free operation. At the same time, it will promote modernisation of agricultural production processes. Keywords: farmland environment; remote monitoring; ZigBee technology; sensor network; intelligent farmland environment; precise agriculture; agricultural information; data collection; data transmission; real time; agricultural knowledge; computational science; computational engineering. Optimising order selection algorithm based on online taxi-hailing applications   by Tian Wang, Wenhua Wang, Yongxuan Lai, Diwen Xu, Haixing Miao, Qun Wu Abstract: Nowadays, with the widespread use of smart devices and networking technologies, the application of taxi-hailing servers is becoming more and more popular in our daily life. However, the drivers' behaviour of robbing orders while driving brings great potential traffic security problems. Considering the characteristics and deficiencies of the mainstream taxi-hailing apps in smart devices, this paper studies the order selection problem from the drivers' end. According to different customers' requirements, an order auto-selection algorithm is proposed. Moreover, it adopts a time buffer mechanism to avoid time conflicts among the orders, and a new concept of 'efficiency value of orders' is proposed to evaluate the profits of orders. This algorithm can auto-select orders for the driver according to their qualities, which can not only improve the safety, but also maximise the drivers' revenue. Extensive simulations validate the performance of the proposed method. Keywords: taxi-hailing application; order selection algorithm; biggest profit; greedy algorithm; safety; efficiency value of orders. Towards UNL based machine translation for Moroccan Amazigh language   by Imane Taghbalout, Fadoua Ataa Allah, Mohamed El Marraki Abstract: Amazigh languages, also called Berber, belong to the Afro-Asiatic language (Hamito-Semitic) family. They are a family of similar and closely related languages and dialects indigenous to North Africa. They are spoken in Morocco, Algeria, and some populations in Libya, Tunisia, northern Mali, western and northern Niger, northern Burkina Faso, Mauritania, and in the Siwa Oasis of Egypt. Large Berber-speaking migrant communities have been living in Western Europe since the 1950s. In this paper, we study the Standard Moroccan Amazigh. It became a constitutionally official language of Morocco in 2011. However, it is still considered as a less resourced language. So, it is time to develop linguistic resources and applications for processing automatically this language, in order to ensure its survival and promotion by integrating it into the new information and communication technologies (NICT). In this context and in the perspective to produce a Universal Networking Language (UNL) based machine translation system for this language, we have undertaken the creation of the Amazigh-UNL dictionary, as a first step of linguistic resources development required by the UNL system to achieve translation. Thus, this paper is focused on presenting linguistic features implementation, such as morphological, syntactical and semantic information of the Amazigh languages. Keywords: Amazigh language; machine translation; Universal Networking Language; Amazigh-UNL dictionary; inflectional paradigm; subcategorisation frame; Universal Word.DOI: 10.1504/IJCSE.2016.10009693  Population diversity of particle swarm optimisation algorithms for solving multimodal optimisation problems   by Shi Cheng, Junfeng Chen, Quande Qin, Yuhui Shi Abstract: The aim of multimodal optimisation is to locate multiple peaks/optima in a single run and to maintain these found optima until the end of a run. In this paper, seven variants of particle swarm optimisation (PSO) algorithms, which includes PSO with star structure, PSO with ring structure, PSO with four clusters structure, PSO with Von Neumann structure, social-only PSO with star structure, social-only PSO with ring structure, and cognition-only PSO, are used to solve multimodal optimisation problems. The population diversity, or more specifically, the position diversity, is used to measure the candidate solutions during the search process. Our goal is to measure the performance and effectiveness of variants of PSO algorithms and investigate why an algorithm performs effectively from the perspective of population diversity. The experimental tests are conducted on eight benchmark functions. Based on the experimental results, the conclusions could be made that the PSO with ring structure and social-only PSO with ring structure perform better than the other PSO variants on multimodal optimisation. From the population diversity measurement, it is shown that to obtain good performances on multimodal optimisation problems, an algorithm needs to balance its global search ability and solutions maintenance ability, which means that the population diversity should be converged to a certain level quickly and be kept during the whole search process. Keywords: swarm intelligence algorithm; multimodal optimisation; particle swarm optimisation; population diversity; nonlinear equation systems. A pseudo nearest centroid neighbour classifier   by Hongxing Ma, Jianping Gou, Xili Wang Abstract: In this paper, we propose a new reliable classification approach, called the pseudo nearest centroid neighbour rule, which is based on the pseudo nearest neighbour rule (PNN) and nearest centroid neighbourhood (NCN). In the proposed PNCN, the nearest centroid neighbours rather than nearest neighbours per class are first searched by means of NCN. Then, we calculate k categorical local mean vectors corresponding to k nearest centroid neighbours, and assign the weight to each local mean vector. Using the weighted k local mean vectors for each class, PNCN designs the corresponding pseudo nearest centroid neighbour and decides the class label of the query pattern according to the closest pseudo nearest centroid neighbour among all classes. The classification performance of the proposed PNCN is evaluated on real and artificial datasets in terms of the classification accuracy. The experimental results demonstrate the effectiveness and robustness of PNCN over the competing methods in many practical classification problems. Keywords: K-nearest neighbour rule; nearest centroid neighborhood; pseudo nearest centroid neighbour rule; local mean vector; pattern classification. A comparative study of mixed least-squares FEMs for the incompressible Navier-Stokes equations   by Alexander Schwarz, Masoud Nickaeen, Serdar Serdas, Abderrahim Ouazzi, Jörg Schröder, Stefan Turek, Carina Nisters Abstract: In the present contribution we compare (quantitatively) different mixed least-squares finite element methods (LSFEMs) with respect to computational costs and accuracy. In detail, we consider an approach for Newtonian fluid flows, which are described by the incompressible Navier-Stokes equations. Various first-order systems are derived based on the residual forms of the equilibrium equation and the continuity condition. From these systems L^2-norm least-squares functionals are constructed, which are the basis for the associated minimisation problems. The first formulation under consideration is a div-grad first-order system resulting in a three-field formulation with total stresses, velocities, and pressure (S-V-P) as unknowns. Here, the variables are approximated in H(div) x H^1 x L^2 on triangles and in H^1 x H^1 x L^2 on quadrilaterals. In addition to that a reduced stress-velocity (S-V) formulation is derived and investigated. An advantage of this formulation is a smaller system matrix due to the absence of the pressure degree of freedom, which is eliminated in this approach. S-V-P and S-V formulations are promising approaches when the stresses are of special interest, e.g. for non-Newtonian, multiphase or turbulent flows. Furthermore, since in the total stress approach the pressure is approximated instead of its gradient, the proposed S-V-P formulation could be used in formulations with discontinuous pressure interpolation. For comparison the well-known first-order vorticity-velocity-pressure (V-V-P) formulation is investigated. In here, all unknowns are approximated in H^1 on quadrilaterals. Besides some numerical advantages, as e.g. an inherent symmetric structure of the system of equations and a directly available error estimator, it is known that least-squares methods have a drawback concerning mass conservation, especially when lower-order elements are used. Therefore, the main focus of the work is drawn to performance and accuracy aspects on the one side for finite elements with different interpolation orders and on the other side on the usage of efficient solvers, for instance of Krylov-space or multigrid type. Finally, two well-known benchmark problems are presented and the results are compared for different first-order formulations. Keywords: least-squares FEM; V-V-P formulation; S-V-P formulation; S-V formulation; Navier-Stokes; multigrid.DOI: 10.1504/IJCSE.2016.10006921  Enhanced differential evolution with modified parent selection technique for numerical optimisation   by Xiang Li Abstract: Differential evolution (DE) is considered to be one of the most prominent evolutionary algorithms for numerical optimisation. However, it may suffer from a slow convergence rate, especially in the late stage of the evolution progress. The reason might be that the parents in the mutation operator are randomly selected from the parent population. To remedy this limitation and to enhance the performance of DE, in this paper, a modified parent selection technique is proposed, where the parents in the mutation operator are chosen based on their previous successful experiences. The major advantages of the proposed parent selection technique are its simplicity and generality. It does not destroy the simple structure of DE, and it can be used in most DE variants. To verify the performance of the proposed technique, it is integrated into the classical DE algorithm and three advanced DE variants. Thirteen widely used benchmark functions are used as the test suite. Experimental results indicate the the proposed technique is able to enhance the performance of the classical DE and advanced DE algorithms in terms of both the quality of final solutions and the convergence rate. Keywords: differential evolution; parent selection; mutation operator; numerical optimisation. Intelligent selection of parents for mutation in differential evolution   by Meng Zhao, Yiqiao Cai Abstract: In most DE algorithms, the parents for mutation are randomly selected from the current population. As a result, all vectors involved in mutation are equally selected as parents without any selective pressure. Although such a mutation strategy is easy to use, it is inefficient for solving complex problems. To address this issue, we present an intelligent parents selection strategy (IPS) for DE. The new algorithmic framework is named as DE with IPS-based mutation (IPSDE). In IPSDE, the neighbourhood of each individual is firstly constructed with a population topology. Then, all the neighbours of each individual are partitioned into two groups based on their fitness values, and a probability value for each neighbour being selected as the parents in the respective groups is calculated based on its distance from the current individual. With the probability values, IPS selects the parents from the neighbourhood of the current individual to guide the mutation process of DE. To evaluate the effectiveness of the proposed approach, IPSDE is applied to several original DE algorithms and advanced DE variants. Experimental results have shown that IPSDE is an effective framework to enhance the performance of most DE algorithms studied. Keywords: differential evolution; mutation operator; neighbourhood information; intelligent parents selection.DOI: 10.1504/IJCSE.2016.10002299  Modelling method of dynamic business process based on pi-calculus   by Yaya Liu, Jiulei Jiang, Weimin Li Abstract: The formal modelling of a dynamic business process is to make the collaborative relationship between organisations more detailed and explicit. It is convenient for people to analyse the structure and interaction of cross-organisational business processes, especially dynamic business processes, and assure the optimisation of the system architecture. Based on the channel mobility of pi-calculus, a new modelling method of the dynamic business process is proposed by combining with the extended directed acyclic graph. It is mainly discussed from three aspects: the selection of the interactive paths, the transition of business objects and the validation of accuracy. Meanwhile, a concrete example with multiple roles is presented to assist in the implementation of the method. It concludes that the method can effectively distinguish the collaborative relationship between organisations, and also be used to build formal models of complicated and dynamic business processes with the mature technology. Keywords: dynamic business process; cross-organisational business process; channel mobility; pi-calculus; extended directed acyclic graph. Unsupervised metric learning for person re-identification by image re-ranking   by Dengyi Zhang, Qian Wang, Xiaoping Wu, Yu Cao Abstract: In a multi-camera video surveillance system with non-overlapping areas, the same person may appear different according to different cameras; also, different people may look the same. This makes person re-identification an important and challenging problem. Most of the current person re-identification methods are based on the supervised distance metrics learning method, which is labels the same person from many cameras as positive samples for distance metric learning, while it is hardly done manually in large numbers of cameras. Thus, this paper describes an unsupervised distance metric learning method based on image re-ranking, calculating the original distance matrix for person samples from two cameras using the original distance metric function, and re-ranking the distance matrix by the image re-ranking method to acquire a better distance function, then using it to calculate the new distance rank matrix. This matrix is used to label positive and negative samples automatically, using unsupervised distanced distance metric learning, and thus to acquire a better Mahalanobis distance metric function, without the need to manually label person samples according to different cameras. Experiments were performed on public datasets VIPeR, i-LIDS, GRID and CAVIAR4REID, and the results compared with current distance learning methods. The results are evaluated by CMC, which indicates this algorithm could overcome the difficulties for labelling large numbers of person samples from cameras in distance metric learning, with a better re-identification rate. Keywords: video surveillance; non-overlapping area; person re-identification; unsupervised metric learning; image re-ranking. Discovery of continuous coherent evolution biclusters in time series data   by Meihang Li, Yun Xue, Haolan Zhang, Bo Ma, Jie Luo, WenSheng Chen, Zhengling Liao Abstract: Most traditional biclustering algorithms focus on the biclustering model of non-continuous columns, which is unsuitable for analysis of time series gene expression data. We propose an effective and exact algorithm that can be used to mine biclusters with coherent evolution on contiguous columns, as well as complementary and time-lagged biclusters in time series gene expression matrices. Experimental results show that the algorithm can detect biclusters with statistical significance and strong biological relevance. The algorithm is also applied to currency data analysis, in which meaningful results are obtained. Keywords: time series data; bicluster; coherent evolution; complementary; time-lagged. Empirical rules based views abstraction for distributed model-driven development   by Yucong Duan, Jiaxuan Li, Qiang Duan, Lixin Luo, Liang Huang Abstract: UML view integration has been extensively studied in the area of model transformation in model-driven engineering. Empirical processing rules are among the most widely employed approaches for processing view abstraction, which can support model simplification, consistency checking, and management complexity reduction. However, empirical rules face some challenges, such as completeness validation, consistency among rules, and composition priority arrangement. The challenge of rule composition is enlarged in the environment of distributed model-driven development for web service-based systems, where redundant information/data is emphasised. The same redundant information can be expressed in different forms that comprise different topological structures for entity relationship network representing the same part of the system. Such variation will result in choosing different compositions of the rules executed in different orders, which will increase the severity of the current non-determinism from the empirical probability of some rules. In this paper, we investigate the effect of redundancy on rule application through designing a simulated distributed storage for an example diagram model. We propose a formal solution for addressing this challenge through constructing a finite-state automaton to unify empirical abstraction rules while relieving the side effects caused by redundancy. We also show the results obtained from a prototype implementation. Keywords: UML; model transformation; view abstraction; finite-state automaton. Populating parameters of web services by automatic composition using search precision and WSDL weight matrix   by Sumathi Pawar, Niranjan Chiplunkar Abstract: Web service composition is meant for connecting different web services according to the requirement. The absence of public Universal Description, Discovery, and Integration (UDDI) made it difficult to get QoS information of the web services unless checked by execution. This research implements a system for invoking and composing web services according to the user requirements by searching required web services dynamically using the Bingo search engine. The user may not know the value of input parameters of the required web services, and these unknown parameters are populated by composing available web services automatically and dynamically. The methodology used here is searching the requested web services according to the functional word, finding the search precision with support and confidence values of search results, computation of Web Service Description Language(WSDL) weight matrix to select suitable web services for user satisfaction, and populating unknown input parameters values by composing the web services. Composable web services are found by intra-cluster search and inter-cluster search among different operation elements of community web services. A composition rule is framed for composable web services according to the order of composition. Pre-condition and effect elements are checked before execution of composition plan. Finally, web services are invoked according to the composition rule. Keywords: service composition; WSDL; match-making algorithm; service discovery; WSDL-S.DOI: 10.1504/IJCSE.2016.10007953  Fast elliptic curve scalar multiplication for resisting against SPAby Shuanggen Liu Abstract: This paper analyses the computation of the Symbolic Ternary Form (STF) elliptic curve scalar multiplication algorithm and the binary scalar multiplication algorithm. Compared with the binary scalar multiplication algorithm, the efficiency of the STF scalar multiplication algorithm is increased by 5.9% on average and has a corresponding advantage. For this reason, we improve the structure of the STF scalar multiplication algorithm and make the performance more "smooth" by constructing an indistinguishable operation between points addition (A) and triple point (T) and thus resist against the simple power analysis (SPA) attacks. At the same time, we propose the Highest-weight Symbolic Ternary Form (HSTF), which makes a scalar k transform into the highest-weight form. Thus, every cycle has a fixed pattern to resist SPA attack. With respect to binary scalar multiplication algorithm with anti-SPA, the average efficiency is enhanced by 17.7%. Keywords: elliptic curve scalar multiplication; simple power analysis; highest-weight symbolic ternary form.DOI: 10.1504/IJCSE.2016.10008630  Predicting rainfall using neural nets   by Kyaw Kyaw Htike Abstract: One of the most crucial factors that can help in making strategic decisions and planning in countries that rely on agriculture in some ways is successfully predicting rainfall. Despite its clear importance, forecasting rainfall up until now remains a big challenge owing to the highly dynamic nature of the climate process and its associated seemingly random fluctuations. A wide variety of models have been proposed to predict rainfall, among which statistical models have been one of the most relatively successful. In this paper, we propose a novel rainfall forecasting model using Focused Time-Delay Neural Networks (FTDNNs). In addition, we also contribute in comparing rainfall forecasting performances, using FTDNNs, for different prediction time scales, namely: monthly, quarterly, bi-annually and yearly. We present the optimal neural network architecture parameters automatically found for each of the aforementioned time scales. Our models are trained to perform one-step-ahead predictions and we demonstrate and evaluate our results, measured by mean absolute percentage error, on the rainfall dataset obtained from Malaysian Meteorological Department (MMD) for close to a thirty year period. For test data, we found that the most accurate result was obtained by our method on the yearly rainfall dataset (94.25%). For future work, dynamic meteorological parameters such as sunshine data, air pressure, cloudiness, relative humidity and wet bulb temperature can be integrated as additional features into the model for even higher prediction performance. Keywords: rainfall prediction; forecasting; statistical prediction models; artificial neural networks, focused time-delay networks. Overview of information visualisation in science education   by Chun Hua Wang, Dong Han, Wen-Kuang Chou Abstract: Developed as computer-assisted instruction, visual education is a new teaching method, which is a computer techniques-based visual design aimed to education. Based on an overview of previous studies, this paper expounds the main features of education visualisation, outlines the theoretical basis of education visualisation, summarises the empirical study of science education visualisation, and refines the application scenarios and attention matters in science education visualisation by using static and dynamic visualisation as the clues for classification. The paper concludes that whether the effect of education visualisation is a success depends on the students' knowledge background, visual perception and comprehension ability. Therefore, the design of education visualisation must ensure that the objects and contents of visualisation can adapt to the specific conditions and instructional objectives. Keywords: science education visualisation; static visualisation; dynamic visualisation.DOI: 10.1504/IJCSE.2016.10005643  An automation approach for architecture discovery in software design using genetic algorithm   by Sushama C, A Rama Mohan Reddy Abstract: Software architectures are treated as valuable artifacts in software engineering. The functionality of the software is dependent on the software architectures. The software architectures provide high-level analysis whenever the architects need to analyse the dynamic structure of the design. The modifications to the designs are made manually; it is a very complicated process and sometimes it will not solve the problem completely. This paper presents a genetic algorithm for discovery of underlying architectures of software design. The genetic algorithm is carried out with different modules like encoding, fitness function, and mutation. The algorithm was tested with real time projects and the complete experimental study is mentioned. Keywords: genetic algorithm, components, interactions, relations, search-based software engineering. A modified electromagnetism-like mechanism algorithm with pattern search for global optimization   by Qing Wu, Chunjiang Zhang, Liang Gao Abstract: The solution space of most global optimisation problems is very complex, which results in a high requirement for the search performance of algorithms. Electromagnetism-like mechanism (EM) algorithm is a rising global optimisation method. However, the intensification and the diversification of the original EM are not very efficient. This paper proposes a modified EM algorithm. To improve the intensification ability, a more effective variable step-size pattern search has been applied to replace the original random line search at the local search stage. Meanwhile, a perturbing point is used to increase the diversity. In addition, the formula of calculating the total force is simplified to accelerate the algorithms searching process. Numerical experiments are conducted to compare the proposed algorithm with other variants of EM algorithms and different variants of particle swarm optimisation algorithms. The results show that the approach is competitive. Keywords: electromagnetism-like mechanism algorithm; pattern search; global optimisation; meta-heuristic algorithm; local search.DOI: 10.1504/IJCSE.2016.10009976  Various GPU memory utilisation exploration for large RDF search   by Chantana Chantrapornchai Abstract: Graphic Processing Units (GPUs) are the important accelerators in our desktop com- puter nowadays. There are thousands of processing units that can simultaneously run the program and there are various memory types, with different sizes and access times, which are connected in a hierarchy. However, the GPUs have a much smaller internal memory size than a typical computer, which can be an obstacle to performing big data processing. In this paper, we study the use of various memory types: global, texture, constant, and shared memories, in simultaneously searching large Resource Description Framework (RDF) data, which are commonly used on the internet to link to the WWW data based on the GPUs. Using suitable memory types and properly managing the data transfer can lead to a better performance when processing such data. The results show that the parallel search in 45-Gigabyte RDF data on multiple GPUs that uses the global memory for storing large texts and uses the shared memory storing multiple keywords can run about 14 times faster than the sequential search on a low-cost desktop. Keywords: graphic processing units; large RDF; parallel string search Hough transform-based cubic spline recognition for natural shapes   by Cheng-Huang Tung, Wei-Jyun Syu, Wei-Cheng Huang Abstract: A two-stage GHT-based cubic spline recognition method is proposed for recognising flexible natural shapes. First, the proposed method uses cubic splines to interpolate a flexible natural shape, and a sequence of connected boundary points is generated from the cubic splines. Each such point has accurate tangent and curvature features. At the first recognition stage, the proposed method uses the modified GHT to adjust the scale and orientation factors of the input shape with respect to each reference model. At the second recognition stage, the proposed point-based matching technique calculates the difference between each specific reference model and its corresponding adjusted input shape at the point level. Experiments for recognising 15 categories of natural shapes, including fruits and vegetables, the recognition rate of the proposed two-stage method is 97.3%, much higher than 79.3% measured by the standard GHT. Keywords: Hough transform, GHT, cubic spline, natural shape, curvature, tangent, point-based matching, recognition method, template database, boundary point. Personalised service recommendation process based on service clustering   by Xiaona Xia Abstract: Personalised service recommendation is the key technology for service platforms, and the demand preferences of users are the important factors for personalised recommendation. First, in order to improve the accuracy and adaptability of service recommendation, services are needed to be initialised before being recommended and selected, then they are classified and clustered according to demand preferences, and service clusters are defined and demonstrated. For sparse problems of the service function matrix, historical and potential preferences are expressed as double matrices. Second, a service cluster is viewed as the basic business unit, and we optimise the graph summarisation algorithm and construct service recommendation algorithm SCRP. Helped by the experiments about variety parameters, SCRP has more advantages than other algorithms. Third, we select fuzzy degree and difference to be the two key indicators, and use some service clusters to complete the simulation and analyse the algorithm performance. The results show that our service selection and recommendation method is better than others, which might effectively improve the quality of service recommendation. Keywords: service clustering; service recommendation; graph summarisation algorithm; personalisation; preference matrix Power-aware high level evaluation model of interconnect length of on-chip memory network topology   by XiaoJun Wang, Feng Shi, Yi-Zhuo Wang, Hong Zhang, Xu Chen, Wen-Fei Fu Abstract: Interconnect power is the factor that dominates the power consumption on the on-chip memory architecture. Almost all dedicated wires and buses are replaced with packet switching interconnection networks which have become the standard approach to on-chip interconnection. Unfortunately, rapid advances in technology are making it more difficult to assess the interconnect power consumption of NoC. To resolve this problem, a new evaluating methodology Interconnect Power Evaluation based on Topology of On-chip Memory (IP-ETOM) is proposed in this paper. To validate this method, two multicore architectures 2D-Mesh and Triplet based Architecture (TriBA) are evaluated in this research work. The on-chip memory network model is evaluated based on characteristics of on-chip architecture interconnection. Matlab is used for conducting the experiment that evaluates the interconnection power of TriBA and 2D-Mesh. Keywords: power evaluation; on-chip memory network topology; NoC interconnects; IPETOM Optimising data access latencies of virtual machine placement based on greedy algorithm in datacentre   by Xinyan Zhang, Keqiu Li, Yong Zhang Abstract: The total completion time of a task is also the major bottleneck in the big data processing applications based on parallel computation, since the computation and data are distributed on more and more nodes. Therefore, the total completion time of a task is an important index to evaluate the cloud performance. The access latency between the nodes is one of the key factors affecting task completion time for cloud datacentre applications. Additionally, minimising total access time can reduce the overall bandwidth cost of running the job. This paper proposes an optimisation model focused on optimising the placement of virtual machines (VM) so as to minimise the total data access latency where the datasets have been located. According to the proposed model, our optimising VMs problem is linear programming. Therefore, we obtain the optimum solution of our model by the branch-and-bound algorithm that its time complexity is O(2^{NM}). Simultaneously, we also present a greedy algorithm, which has O(NM) of time complexity, to solve our model. Finally, the simulation results show that all of the solutions of our model are superior to existing models and close to the optimal value. Keywords: datacentre; cloud environment; access latency; virtual machine placement; greedy algorithm An empirical study of disclosure effects in listed biotechnology and medicine industry using MLR model   by Chiung-Lin Chiu, You-Shyang Chen Abstract: This research employs the multiple linear regression model to investigate the relationship between voluntary disclosure and firm performance in biotechnology and medicine industry in Taiwan. Using 44 firm-year observations collected from Information Transparency and Disclosure Ranking System and Taiwan Economic Journal financial database for companies listed in the Taiwan Stock Exchange and Taipei Exchange Market, the regression results reveal that there is a positive and significant relationship between voluntary disclosure and firm performance. Firms with better voluntary disclosure have better performance than do firms without voluntary disclosure. The results suggest that companies should pay more attention to voluntary disclosure as additional information. It is also considered by investors as valuable information when making their investment decision. Keywords: voluntary disclosure; firm performance; investment decision; MLR; multiple linear regression model, biotechnology and medicine industry; TSE; Taiwan Stock Exchange; ITDRS; information transparency and disclosure ranking system A static analytical performance model for GPU kernel   by Jinjing Li Abstract: Graphics processing units (GPUs) have shown increased popularity and play an important role as a kind of coprocessor in heterogeneous co-processing environments. Heavily data parallel problems can be solved efficiently by tens of thousands of threads collaboratively working in parallel in GPU architecture. The achieved performance, therefore,depends on the capability of multiple threads in parallel collaboration, the effectiveness of latency hiding, and the use of multiprocessors. In this paper, a static analytical kernel performance model (SAKP) is proposed, based on this performance principle, to estimate the execution time of the GPU kernel. Specifically, a set of kernel and device features for the target GPU is generated in the proposed model. We determine the performance-limiting factors and generate an estimation of the kernel execution time with this model. Matrix Multiplication (MM) and Histogram Generation (HG) in NVIDIA GTX680 GPU card were performed to verify our proposed model, and showed an absolute error in prediction of less than 6.8%. Keywords: GPU; co-processing; static analytical kernel performance model; kernel and device features; absolute error. Syntactic parsing of clause constituents for statistical machine translation   by Jianjun Ma, Jiahuan Pei, Degen Huang, Dingxin Song Abstract: The clause is considered to be the basic unit of grammar in linguistics, which is a structure between a chunk and a sentence. Clause constituents, therefore, are an important kind of linguistically valid syntactic phrase. This paper adopts the CRFs model to recognise English clause constituents with their syntactic functions, and testifies their effect on machine translation by applying this syntactic information to an English-Chinese PBSMT system, evaluated on a corpus of business domain. Clause constituents are mainly classified into six kinds: subject, predicator, complement, adjunct, residues of predicator, and residues of complement. Results show that our rich-feature CRFs model achieves an F-measure of 93.31%, a precision of 93.26%, and a recall of 93.04%. This syntactic knowledge in the source language is further combined with the NiuTrans phrasal SMT system, which slightly improves the English-Chinese translation accuracy. Keywords: syntactic parsing; clause constituents; PBSMT.DOI: 10.1504/IJCSE.2016.10004598  A universal compression strategy using sorting transformation   by Bo Liu, Xi Huang, Xiaoguang Liu, Gang Wang, Ming Xu Abstract: Although traditional universal compression algorithms can effectively use repetition located in a slide window, they cannot take advantage of some message source in which similar messages are distributed uniformly. In this paper, we come up with a universal segmenting-sorting compression algorithm to solve this problem. The key idea is to reorder the message source before compressing it with the Lz77 algorithm. We design transformation methods for two common data types, corpus of webpages and access log. The experimental results show that segmenting-sorting transformation is truly beneficial to the compression ratio. Our new algorithm is able to make the compression ratio 20% to 50% lower than the naive Lz77 algorithm does and takes almost the same decompression time. For some read-heavy sources, segmenting-sorting compression can reduce space cost while guaranteeing throughput. Keywords: segmenting; sorting; Lz77; compression; universal compression method. Executing time and cost-aware task scheduling in hybrid cloud using a modified DE algorithm   by Yuanyuan Fan, Qingzhong Liang, Yunsong Chen Abstract: Task scheduling is one of the basic problems in cloud computing. In a hybrid cloud, task scheduling faces new challenges. In this paper, we propose a GaDE algorithm, based on a differential evolution algorithm, to improve the single objective scheduling performance of a hybrid cloud. In order to better deal with the multi-objective task scheduling optimisation in hybrid clouds, on the basis of the GaDE and Pareto optimum of the quick sorting method, we present a multi-objective algorithm, named NSjDE. This algorithm also reduces the frequency of evaluation. Compared with experiments using the Min-Min algorithm, GaDE algorithm and NSjDE algorithm, results show that for the single object task scheduling, GaDE and NsjDE algorithms perform better in getting the approximate optimal solution. The optimisation speed of the multi-objective NSjDE algorithm is faster than the single-objective jDE algorithm, and NSjDE can produce more than one non-dominated solution meeting the requirements, in order to provide more options to the user. Keywords: hybrid cloud; task scheduling; executing time-aware; cost-aware A dynamic cold-start recommendation method based on incremental graph pattern matching   by Yanan Zhang, Guisheng Yin, Deyun Chen Abstract: In order to give accurate recommendations for a cold-start user who has few records, researchers find similar users for a cold-start user according to social networks. However, these efforts assume that the cold-start users social relationships are static and ignore the fact that updating social relationships in large scale social networks is time consuming. In social networks, cold-start users and other users may change their social relationships as time goes by. In order to give accurate and timely recommendations for cold-start users, it is necessary to continuously update users similar to the cold-start user according to his latest social relationships. In this paper, an incremental graph pattern matching based dynamic cold-start recommendation method (IGPMDCR) is proposed, which updates similar users for a cold-start user based on the topology of social networks, and gives recommendations according to latest similar users. The experimental results show that IGPMDCR could give accurate and timely recommendations for cold-start users. Keywords: dynamic cold-start recommendation; social network; incremental graph pattern matching; topology of social network.DOI: 10.1504/IJCSE.2016.10006198  Modelling and simulation research of vehicle engines based on computational intelligence methods   by Ling-ge Sui, Lan Huang Abstract: We assess the feasibility of two kinds of widely used artificial neural network (ANN) technologies applied in the field of transient emission simulation. In this work, the back-propagation feedforward neural network (BPNN) is shown to be more suitable than the radial basis function neural network (RBFNN). Considering the transient change rule of a transient operation, the composite transient rate is innovatively adopted as an input variable to the BPNN transient emission model, which is composited by the torque transient rate and air-fuel ratio (AFR) transient rate. Thus, a whole process transient simulation platform based on the multi-soft coupling technology of a test diesel engine is established. Through a transient emission simulation, the veracity and generalisation ability of the simulation platform is confirmed. The simulation platform can correctly predict the change trends and establish a peak value difference within 8%. Our findings suggest that the simulation platform can be applied to a control strategies study of typical transient operations. Keywords: transient emission; simulation; back-propagation feedforward neural network; radial basis function neural network; diesel engine.DOI: 10.1504/IJCSE.2018.10006094  Institution-based UML activity diagram transformation with semantic preservation   by Amine Achouri, Yousra Bendaly Hlaoui, Leila Jemni Ben Ayed Abstract: This paper presents a specific tool, called MAV-UML-AD, allowing the specification and the verification of workflow models using UML Activity Diagrams (UML AD) and Event-B and Based on Institutions. The developed tool translates an activity diagram model into an equivalent Event-B specification according to a mathematical semantics. The transformation approach of UML AD models is based on the theory of institutions. In fact, each of UML AD and Event-B specification is defined by an instance of its corresponding institution. The transformation approach is represented by an institution co-morphism, which is defined between the two institutions. Institution theory is adopted as the theoretical framework of the tool essentially for two reasons. First, it gives a locally mathematical semantics for UML AD and Event-B. Second, to define a semantic preserving mapping between UML AD specification and Event-B machine. Thanks to the B theorem prover, functional proprieties such as liveness and fairness can be formally checked. The core of the model transformation approach will be highlighted in this paper and how institution concepts such category, co-morphism and signature are presented in the two used formalisms. This paper will also illustrate the use of the developed tool MAV-UML-AD through an example of specification and verification. Keywords: formal semantics; model-driven engineering; institution theory; Event-B; UML activity diagram; formal verification The analysis of evolutionary optimisation on the TSP(1,2) problem   by Xiaoyun Xia, Xinsheng Lai, Chenfu Yi Abstract: The TSP(1,2) problem is a special case of the travelling salesperson problem, which is NP-hard. Many heuristics including evolutionary algorithms (EAs) are proposed to solve the TSP(1,2) problem. However, we know little about the performance of the EAs for the TSP(1,2) problem. This paper presents an approximation analysis of the (1+1) EA on this problem. It is shown that both the (1+1) EA and $(mu+lambda)$ EA can obtain $3/2$ approximation ratio for this problem in expected polynomial runtime $O(n^3)$ and $Oleft((frac{mu}{lambda})n^3+nright)$, respectively. Furthermore, we prove that the (1+1) EA can provide a much tighter upper bound than a simple ACO on the TSP(1,2) problem. Keywords: evolutionary algorithms; TSP(1,2); approximation performance; analysis of algorithm; computational complexity.DOI: 10.1504/IJCSE.2016.10007955  A novel rural microcredit decision model and solving via binary differential evolution algorithm   by Dazhi Jiang, Jiali Lin, Kangshun Li Abstract: Generally, as an economic means of lifting people out of poverty, microcredit has been accepted as an effective method for empowering both individuals and communities. However, risk control is still a core part of the implementation of agriculture-related loans business for microcredit companies. In this paper, a rural microcredit decision model is presented based on maximising the profit while minimising the risk. Then, a binary differential evolution algorithm is applied to solve the decision model. The result shows that the proposed method and model are scientific and easy to operate, which can also provide a referential solution for the decision management in microcredit companies. Keywords: risk control; microcredit; decision model; binary differential evolution Q-grams-imp: an improved q-grams algorithm aimed at edit similarity join   by Zhaobin Liu, Yunxia Liu Abstract: Similarity join is more and more important in many applications and has attracted widespread attention from scholars and communities. Similarity join has been used in many applications, such as spell checking, copy detection, entity linking, pattern recognition and so on. Actually, in many web and enterprise scenarios, where typos and misspellings often occur, we need to find an efficient algorithm to handle these situations. In this paper, we propose an improved algorithm on q-grams called q-grams-imp that is aimed at solving edit similarity join. We use this algorithm in order to reduce the number of tokens and thus reduce space costs; it is best fitted for same size strings. But for different sizes of strings, we need to handle these strings in order to fit the algorithm. Finally, we conclude and get the results that our proposed algorithm is better than the traditional method. Keywords: similarity join; q-grams algorithm; edit distance.DOI: 10.1504/IJCSE.2016.10008631  An algorithm based on differential evolution for satellite data Transmission Scheduling   by Qingzhong Liang, Yuanyuan Fan, Xuesong Yan, Ye yan Abstract: Data transmission task scheduling is one of the important problems in satellite communication. It can be considered as a combinatorial optimisation problem among satellite data transmission demand, visible time window and ground station resource, which is an NP-complete problem. In this paper, we propose a satellite data transmission task scheduling algorithm that searches for an optimised solution based on a differential evolution algorithm framework. In its progress of evolution, the individuals evaluating procedure is improved by a modified 0/1 knapsack based method. Extensive experiments are conducted to examine the effectiveness and performance of the proposed scheduling algorithm. Experimental results show that the scheduling results generated from the algorithm satisfy scheduling constraints and are consistent with the expectation. Keywords: data transmission; task scheduling; differential evolution; knapsack problem Dynamic load balance strategy for parallel rendering based on deferred shading   by Mingqiang Yin, Dan Sun, Hui Sun Abstract: To solve the problem of low efficiency in rendering of large scenes with a complex illumination model, a new deferred shading method is proposed, which is applied to the parallel rendering system. In order to make the rendering times of slave nodes in the parallel rendering system equal to each other, the algorithm for rendering task assignment is designed. For the deferred shading method, the process of rendering every frame is divided into two phases. The first one called geometrical process is responsible for the visibility detection. In this phase, the primitives are distributed to each rendering node evenly and are rendered without illumination. The pixels which should be shaded and their corresponding primitives are found. The second one called pixel shading is responsible for colouring the pixels which have been found in the first phase. The pixels are assigned to the rendering node evenly according the image of the last frame. As the rendering tasks in the two phases are assigned evenly, the rendering times of node in the cluster system are roughly equal to each other. Experiments show that this method can improve the rendering efficiency of the parallel rendering system. Keywords: parallel rendering; deferred shading; load balance. Big data automatic analysis system and its applications in rockburst experiment   by Yu Zhang Abstract: In 2006, State Key Laboratory for GeoMechanics and Deep Underground Engineering, GDLab for short, has successfully reconstructed the rockburst procedure indoors. Since then, a series of valuable research results has been gained in the area of rockburst mechanism. At the same time, there are some dilemmas, such as data storage dilemma, data analysis dilemma and prediction accuracy dilemma. GDLab has accumulated more than 500 TB data of rockburst experiments. But so far, the amount of analysed data is less than 5%. The primary cause of these dilemmas is that a large amount of experimental data in the procedure of the study of rockburst are produced. In this paper, a novel big data automatic analysis system for rockburst experiment is proposed. Various modules and algorithms were designed and realised. Theoretical analysis and experimental research show that the system can improve the existing research mechanism of rockburst. It also can make many impossible things become possible. The work of this paper has laid a theoretical foundation for rockburst mechanism research. Keywords: rock burst; experiment data; big data; automatic analysis Training auto-encoders effectively via eliminating task-irrelevant input variables   by Hui Shen, Dehua Li, Zhaoxiang Zang, Hong Wu Abstract: Auto-encoders are often used as building blocks of deep network classifiers to learn feature extractors, but task-irrelevant information in the input data may lead to bad extractors and result in poor generalisation performance of the network. In this paper, via dropping the task-irrelevant input variables the performance of auto-encoders can be obviously improved. Specifically, an importance-based variable selection method is proposed to aim at finding the task-irrelevant input variables and dropping them. The paper first estimates the importance of each variable, and then drops the variables with importance value lower than a threshold. In order to obtain better performance, the method can be employed for each layer of stacked auto-encoders. Experimental results show that when combined with our method the stacked denoising auto-encoders achieve significantly improved performance on three challenging datasets. Keywords: feature learning; deep learning; neural network; auto-encoder; stacked auto-encoders; variable selection; feature selection; unsupervised training Model-checking software product lines based on feature slicing   by Mingyu Huang, Yumei Liu Abstract: Feature model is a popular formalism for describing the commonality and variability of software product line in terms of features. Feature models symbolise a presentation of the possible application configuration space, and can be customised based on specific domain requirements and stakeholder goals. As feature models are becoming increasingly complex, it is desired to provide automatic support for customised analysis and verification based on the specific goals and requirements of stakeholders. This paper first presents feature model slicing based on the requirements of the users. We then introduce three-valued abstraction of behaviour models based on the slicing unit. Finally, based on a multi-valued model checker, a case study was conducted to illustrate the effectiveness of our approach. Keywords: feature model; slicing; three-valued model; model checking Decomposition-based multi-objective comprehensive learning particle swarm optimisation   by Xiang Yu, Hui Wang, Hui Sun Abstract: This paper proposes decomposition-based comprehensive learning particle swarm optimisation (DCLPSO) for multi-objective optimisation. DCLPSO uses multiple swarms, with each swarm optimising a separate objective. Two sequential phases are conducted: independent search and then cooperative search. Important information related to extreme points of the Pareto front often can be found in the independent search phase. In the cooperative search phase, a particle randomly learns from its personal best position or an elitist on each dimension. Elitists are non-dominated solutions and are stored in an external repository shared by all the swarms. Mutation is applied to each elitist in this phase to help escaping from local Pareto fronts. Experiments conducted on various benchmark problems demonstrate that DCLPSO is competitive in terms of convergence and diversity of the resulting non-dominated solutions. Keywords: particle swarm optimisation; comprehensive learning; decomposition; multi-objective optimisation. Applicability evaluation of different algorithms for daily reference evapotranspiration model in KBE system   by Yubin Zhang, Zhengying Wei, Lei Zhang, Jun Du Abstract: An irrigation decision-making system based on Knowledge-based Engineering (KBE) is reported in this paper. It can accurately predict water and fertiliser requirements and achieve intelligent irrigation diagnosis and decision support. However, the basis of the KBE was knowledge of reference crop evapotranspiration (ET0). Therefore, the research examined the accuracy of the support vector machines (SVMs) in the model of ET0. The main obstacles of computing ET0 by the PenmanMonteith model were the complicated nonlinear process and the many climate variables required; furthermore, these were calculated based on the original meteorological data, and the calculation standard was not the only one. Thus, the SVM models are applied with the original or limited data, especially in developing countries. The flexibility of the SVMs in ET0 modelling was assessed using the original meteorological data (Tmax, Tm, Tmin, n, Uh, RHm, φ, Z ) of the years 1990-2014 in five stations of Shaanxi, China. Those eight parameters were used as the input, while the reference evapotranspiration values were the output. In the first part of the study, the SVMs were compared with FAO-24, Hargreaves, McCloud, Priestley-Taylor and Makkink models. The comparison results indicated that the SVMs performed better than other models. In the second part, the total ET0 estimation of the SVMs was compared with the other models in the validation. It was found that the SVM models were superior to the others in terms of relative error. The further assessment of SVMs was conducted, and confirmed that the models could provide a powerful tool in KBE irrigation with a lack of meteorological data. This research could provide a reference for accurate ET0 estimation for decision-making in KBE irrigation systems based on collecting data from humidity sensors and weather stations in the field. Keywords: reference evapotranspiration; support vector machines; knowledge-based engineering; original meteorological data. Multi hidden layer extreme learning machine optimised with batch intrinsic plasticity   by Shan Pang, Xinyi Yang Abstract: Extreme learning machine (ELM) is a novel learning algorithm where the training is restricted to the output weights to achieve a fast learning speed. However, ELM tends to require more neurons in the hidden layer and sometimes leads to ill-condition problem owing to random selection of input weights and hidden biases. To address these problems, we propose a multi hidden layer ELM optimised with batch intrinsic plasticity (BIP) scheme. The proposed algorithm has a deep structure and thus learns features more efficiently. The combination with the BIP scheme helps to achieve better generalisation ability. Comparisons with some state-of-the-art ELM algorithms on both regression and classification problems have verified the performance and effectiveness of our proposed algorithm. Keywords: neural network; extreme learning machine; batch intrinsic plasticity; multi hidden layers. Chaotic artificial bee colony with elite opposition-based learning strategy   by Zhaolu Guo, Jinxiao Shi, Xiaofeng Xiong, Xiaoyun Xia, Xiaosheng Liu Abstract: Artificial bee colony (ABC) algorithm is a promising evolutionary algorithm inspired by the foraging behaviour of honey bee swarms, which has obtained satisfactory solutions in diverse applications. However, the basic ABC demonstrates insufficient exploitation capability in some cases. To address this issue, a chaotic artificial bee colony with elite opposition-based learning strategy (CEOABC) is proposed in this paper. During the search process, CEOABC employs the chaotic local search to promote the exploitation ability. Moreover, the elite opposition-based learning strategy is used to exploit the potential information of the exhausted solution. Experimental results compared with several ABC variants show that CEOABC is a competitive approach for global optimisation. Keywords: artificial bee colony; chaotic local search; opposition-based learning; elite strategy. Numerical simulations of electromagnetic wave logging instrument response based on self-adaptive hp finite element method   by L.I. Hui, Zhu Xifang, Liu Changbo Abstract: Numerical simulation of instrument response is an important method to calibrate instrument parameters, evaluate detection performance, and verify complex system theory. Measurement results of electrical well logging are important for the interpretation of measurement data and characterisation of oil reservoirs, especially in horizontal directional drilling and shale gas and oil development. In this paper, a self-adaptive hp finite element method has been used to investigate the electrical well logging instrument responses, such as the electromagnetic wave resistivity logging- while-drilling (LWD) tool and the through-casing resistivity logging (TCRL) tool. Measurement results illustrate the efficiency of the methods, and provide physical interpretation of resistivity measurements obtained with the LWD and TCRL tools. Numerical simulation examples are provided to show the validity, accuracy, and efficiency of the self-adaptive hp finite element method. The high accuracy simulation results have great importance for electrical well logging tools calibration and logging data interpretation. Keywords: numerical simulation; parameters calibration; electromagnetic wave resistivity logging-while-drilling; through-casing resistivity logging; self-adaptive hp finite element method. Upgrading event and pattern detection to big data   by Soumaya Cherichi, Rim Faiz Abstract: One of the marvels of our time is the unprecedented development and use of technologies that support social interaction. Social mediating technologies have engendered radically new ways of information and communication, particularly during events; in cases of natural disaster, such as earthquakes and tsunami, and the American presidential election. This paper is based on data obtained from Twitter because of its popularity and sheer data volume. This content can be combined and processed to detect events, entities and popular moods to feed various new large-scale data-analysis applications. On the downside, these content items are very noisy and highly informal, making it difficult to extract sense out of the stream. Taking into account all the difficulties, we propose a new event detection approach combining linguistic features and Twitter features. Finally, we present our event detection system from microblogs that aims (1) to detect new events, (2) to recognise temporal markers pattern of an event, and (3) to classify important events according to thematic pertinence, author pertinence and tweet volume. Keywords: microblogs; event detection; temporal markers; patterns; social network analysis. A security ensemble framework for securing a file in cloud computing environments   by Sharon Moses J, Nirmala M Abstract: Scalability and on-demand features of cloud computing have revolutionised the IT industry. Cloud computing provides flexibility to the user in several aspects, including pay as you use. The entire burdens of computing, managing resources and file storage are moved to the cloud service provider end. File storage in clouds is an important issue for both service providers and the end users. Securing the file stored in cloud storage from internal and external attacks has become a primary concern for cloud storage providers. Owing to the accumulation of enormous amounts of personal and confidential information in cloud storage, it draws hackers and data-pirates to steal the information at any cost. Once a file gets stored in cloud storage, the user has no authority over the file as well as any knowledge of its physical location. In this paper, the threats involved in file storage and a secure way of protecting the stored files using a novel ensemble of security strategies is presented. An encryption ensemble module is incorporated over an OpenStack cloud infrastructure for protecting the file. Five symmetric block ciphers are used in the encryption module to encrypt and decrypt the file without disturbing existing security measures provided to a file. This proposed strategy helps service providers as well as users to secure the file in cloud storage more efficiently. Keywords: Cloud Storage; File Privacy; File Security; Swift storage; OpenStack security; Security ensemble. Virtual guitar: using real-time finger tracking for musical instruments   by Noorkholis Luthfil Hakim, Shih-Wei Sun, Mu-Hsen Hsu, Timothy K. Shih, Shih-Jung Wu Abstract: Kinect, a 3D sensing device from Microsoft, invokes the Human Computer Interaction (HCI) research evolution. Kinect has been implemented in many areas, including music. One implementation was in a Virtual Musical Instrument (VMI) system, which uses natural gestures to produce synthetic sounds similar to a real musical instrument. From related work, we found that the use of a large joint, such as hand, arm or leg, is inconvenient and limits the way of playing VMI. Thus this study proposed a fast and reliable finger tracking algorithm suitable for VMI playing. In addition, a virtual guitar system application was developed as an implementation of the proposed algorithm. Experimental results show that the proposed method can be used to play a variety of tunes with an acceptable quality. Furthermore, the proposed application could be used by a beginner who does not have any experience in music or playing a real musical instrument. Keywords: virtual guitar; finger tracking; musical instrument; human computerrninteraction; HCI; hand detection; hand tracking; hand gesture recognition; virtual musical instrument; VMI; depth camera.DOI: 10.1504/IJCSE.2016.10008449  A cloud computing price model based on virtual machine performance degradation   by Dionisio Machado Leite, Maycon Peixoto, Carlos Ferreira, Bruno Batista, Danilo Costa, Marcos Santana, Regina Santana Abstract: This paper reports the interference effects in virtual machines performance running higher workloads to improve the resources payment in cloud computing. The objective is to produce an acceptable pay-as-you-go model to be used by cloud computing providers. Presently, a price of pay-as-you-go model is based on the virtual machine used per time. However, this scheme does not consider the interference caused by virtual machines running concurrently, which may cause performance degradation. In order to obtain a fair charging model, this paper proposes an approach considering a recovery over the initial price considering the virtual machine performance interference. Results showed benefits of a fair pay-as-you-go model, ensuring the effective user requirement. This novel model contributes to cloud computing in a fair and transparent price composition. Keywords: cloud computing; pay-as-you-go; virtualisation; quality of service. Designing scrubbing strategy for memories suffering MCUs through the selection of optimal interleaving distance   by Wei Zhou, Hong Zhang, Hui Wang, Yun Wang Abstract: As technology scales, multiple cell upsets (MCUs) have shown prominent effect, thus affecting the reliability of memory to a great extent. Ideally, the interleaving distance (ID) should be chosen as the maximum expected MCU size. In order to mitigate MCUs errors, interleaving schemes together with single error correction (SEC) codes can be used to provide the greatest protection. In this paper, we propose the use of scrubbing sequences to improve memory reliability. The key idea is to exploit the locality of the errors caused by a MCU to make scrubbing more efficient. The single error correction, double error detection, and double adjacent error correction (SEC-DEDDAEC) codes have also been used. A procedure is presented to determine a scrubbing sequence that maximizes reliability. An algorithm of scrubbing strategy, which keeps the area overhead and complexity as low as possible without compromising memory reliability, is proposed for the optimal interleaving distance, which should be maximized under some conditions. The approach is further applied to a case study and results show a significant increase in the Mean Time To Failure (MTTF) compared with traditional scrubbing. Keywords: interleaving distance; memory; multiple cell upsets (MCUs); soft error; reliability; scrubbing; radiation.DOI: 10.1504/IJCSE.2016.10004753  A model of mining approximate frequent itemsets using rough set theory   by Yu Xiaomei, Wang Hong, Zheng Xiangwei Abstract: Datasets can be described by decision tables. In real-life applications, data are usually incomplete and uncertain, which poses big challenges for mining frequent itemsets in imprecise databases. This paper presents a novel model of mining approximate frequent itemsets using the theory of rough sets. With a transactional information system constructed on the dataset under consideration, a transactional decision table is put forward, then lower and upper approximations of support are available that can be easily computed from the indiscernibility relations. Finally, by a divide-and-conquer way, the approximate frequent itemsets are discovered taking consideration of support-based accuracy and coverage defined. The evaluation of the novel model is conducted on both synthetic datasets and real-life applications. The experimental results demonstrate its usability and validity. Keywords: rough set theory; data mining; decision table; approximate frequent itemsets; indiscernibility relation. Improved predicting algorithm of RNA pseudoknotted structure   by Zhendong Liu, Daming Zhu, Qionghai Dai Abstract: The prediction of RNA structure with pseudoknots is an NP-hard problem. According to minimum free energy models and computational methods, we investigate the RNA pseudoknotted structure. The paper presents an efficient algorithm for predicting RNA structure with pseudoknots, and the algorithm takes O(n3) time and O(n2) space. The experimental tests in Rfam10.1 and PseudoBase indicate that the algorithm is more effective and precise, and the algorithm can predict arbitrary pseudoknots. And there exists an 1+e (e>0) polynomial time approximation scheme in searching the maximum number of stackings, and we give the proof of the approximation scheme in RNA pseudoknotted structure. Keywords: RNA pseudoknotted structure; predicting algorithm; PTAS; pseudoknots; minimum free energy.DOI: 10.1504/IJCSE.2016.10010413  An efficient algorithm for modelling and dynamic prediction of network traffic   by Wenjie Fan Abstract: Network node degradation is an important problem in the internet of things, given the ubiquitous high number of personal computers, tablets, phones and other equipment present nowadays. In order to verify the network traffic degradation as one or multiple nodes in a network fail, this paper proposes an algorithm based on Product Form Results (PRF) for the Fractionally Auto Regressive Integrated Moving Average (FARIMA) model, namely PFRF. In this algorithm, the prediction method is established by the FARIMA model, through equations for queuing situation and average queue length in steady state derived from queuing theory. Experimental simulations were conducted to investigate the relationships between average queue length and service rate. Results demonstrated that it not only has good adaptability, but has also achieved promising magnitude of 9.87 as standard deviation, which shows its high prediction accuracy, given the low-magnitude difference between original value and the algorithm. Keywords: prediction; product form results; FARIMA model; average length of queue.DOI: 10.1504/IJCSE.2016.10008908  Reversible image watermarking based on texture analysis of grey level co-occurrence matrix   by Shu-zhi Li, Qin Hu, Xiao-hong Deng, Zhaoquan Cai Abstract: Embedding the watermark in the complex area of the image can effectively improve concealment. However, most methods simply use the mean squared error (MSE) and some simple methods to judge the texture complexity. In this paper, we propose a new texture analysis method based on grey level co-occurrence matrix (GLCM) and provide an in-depth discussion on how to accurately choose a complex region. This new method is applied to the reversible image watermarking. Firstly, the original host image is divided into 128 * 128 sub-blocks. Then, the mean square error is used to assign the weight of the four texture feature parameters to establish the relationship between the characteristic parameters and the complexity of image sub-block. Applying this formulaic series, we can calculate the complexity of each sub-block, along with the selection of the maximum sub-blocks of the texture complexity. If the embedding position is insufficient, then we select the second sub-block to be embedded in the watermark, until a satisfactory embedding capacity is reached. Pairwise prediction error extend (PPEE) is used to hide the data. Keywords: grey level co-occurrence matrix; image sub block; texture complexity; reversible image watermarking. A semantic recommender algorithm for 3D model retrieval based on deep belief networks   by Li Chen, Hong Liu, Philip Moore Abstract: Interest in 3D modelling is growing; however, the retrieval results achieved for semantic-based 3D model retrieval systems have been disappointing. In this paper we propose a novel semantic recommendation algorithm based on a Deep Belief Network (DBN-SRA) to implement semantic retrieval with potential semantic correlations [between models] being achieved using deep learning form known model samples. The algorithm uses the feature correlation [between the models] as the conditions to enable semantic matching of 3D models to obtain the final recommended retrieval result. Our proposed approach has been shown to improve the effectiveness of 3D model retrieval, in terms of both retrieval time and, importantly, accuracy. Additionally, our study and our reported results suggest that our posited approach will generalise to recommender systems in other domains that are characterised by multiple feature relationships. Keywords: deep belief network; 3D model retrieval; recommender algorithm; cluster analysis. Differential evolution with spatially neighbourhood best search in dynamic environment   by Dingcai Shen, Longyin Zhu Abstract: In recent years, there has been a growing interest in applying differential evolution (DE) to optimisation problems in a dynamic environment. The ability to track a changing optimum over time is concerned in dynamic optimisation problems (DOPs). In this study, an improved niching-based scheme, named spatially neighbourhood best search DE (SnDE), for DOPs is proposed. The SnDE adopts DE with DE/best/1/bin scheme. The best individual in the selected scheme is searched around the considered individual in a predefined neighbourhood size, thus keeping a balance between exploitation ability and exploration ability. A comparative study with several algorithms with different characteristics on a common platform by using the moving peaks benchmark (MPB) and various problem settings is presented in this paper. The results indicate that the proposed algorithm can track the changing optimum in each circumstance effectively on the selected benchmark function. Keywords: differential evolution; dynamic optimisation problem; neighbourhood search; niching. Optimal anti-interception orbit design based on genetic algorithm   by Yifang Liu Abstract: The space defence three-player problem with impulsive thrust is studied in this work. Interceptor spacecraft and anti-interceptor spacecraft have only one chance to manoeuvre, while target spacecraft just keeps running in the target orbit without the ability to manoeuvre. Based on the Lambert theorem, the space defence three-player problem is modelled and divided into two layers. The internal layer is an interception problem in which the interceptor spacecraft tries to intercept the target spacecraft. The external layer is an anti-interception problem in which the anti-interceptor spacecraft tries to defend against the interceptor spacecraft. Because it can get the global solution and does not need the gradient information that is required in traditional optimisation methods, the genetic algorithm is employed to solve the resulting parameter optimisation problem in the interception/anti-interception problem. A numerical simulation is provided to verify the availability of the obtained solution, and the results show that this work is useful for some practical applications. Keywords: space three-player problem; anti-interception orbit design; impulsive thrust; parameter optimisation problem; genetic algorithm.DOI: 10.1504/IJCSE.2016.10009742  Detecting sparse rating spammer for accurate ranking of online recommendation   by Hong Wang, Xiaomei Yu, Yuanjie Zheng Abstract: Ranking method for online recommendation system is challenging owing to the rating sparsity and the spam rating attacks. The former can cause the well-known cold start problem while the latter complicates the recommendation task by detecting these unreasonable or biased ratings. In this paper, we treat the spam ratings as 'corruptions', which spatially distribute in a sparse pattern, and model them with a L1 norm and a L2,1 norm. We show that these models can characterise the property of the original ratings by removing spam ratings and help to resolve the cold start problem. Furthermore, we propose a group reputation-based method to re-weight the rating matrix and an iteratively programming-based technique for optimising the ranking for online recommendation. We show that our optimisation methods outperform other recommendation approaches. Experimental results on four famous datasets show the superior performances of our methods. Keywords: ranking; group-based reputation; sparsity; spam rating; collaborative recommendation. Differential evolution with dynamic neighborhood learning strategy based mutation operators   by Guo Sun, Yiqiao Cai Abstract: As the core operator of differential evolution (DE), mutation is crucial for guiding the search. However, in most DE algorithms, the parents in the mutation operator are randomly selected from the current population, which may lead to DE being slow to exploit solutions when faced with complex problems. In this study, a dynamic neighborhood learning (DNL) strategy is proposed for DE to alleviate this drawback. The new proposed DE framework is named DE with DNL-based mutation operators (DNL-DE). Unlike the original DE algorithms, DNL-DE uses DNL to dynamically construct neighborhood for each individual during the evolutionary process and intelligently select parents for mutation from the defined neighborhood. In this way, the neighborhood information can be effectively used to improve the performance of DE. Furthermore, two instantiations of DNL-DE with different parent selection methods are presented. To evaluate the effectiveness of the proposed algorithm, DNL-DE is applied to the original DE algorithms, as well as several advanced DE variants. The experimental results demonstrate the high performance of DNL-DE when compared with other DE algorithms. Keywords: differential evolution; dynamic neighborhood; learning strategy; mutation operator; numerical optimisation.DOI: 10.1504/IJCSE.2016.10005940  A word-frequency-preserving steganographic method based on synonym substitution   by Lingyun Xiang, Xiao Yang, Jiahe Zhang, Weizheng Wang Abstract: Text steganography is a widely used technique to protect communication privacy but it still suffers a variety of challenges. One of these challenge is that a synonym substitution based method may change the statistical characteristics of the content, which may be easily detected by steganalysis. In order to overcome this disadvantage, this paper proposes a synonym substitution based steganographic method taking the word frequency into account. This method dynamically divides the synonyms appearing in the text into groups, and substitutes some synonyms to alter the positions of the relatively low frequency synonyms in each group to encode the secret information. By maintaining the number of relatively low frequency synonyms across the substitutions, it preserves some characteristics of the synonyms with various frequencies in the stego and the original cover texts. The experimental results illustrate that the proposed method can effectively resist attack from the detection using relative frequency analysis of synonyms. Keywords: synonym substitution; steganography; word-frequency-preserving; multiple-base coding; steganalysis. A personalised ontology ranking model based on analytic hierarchy process   by Jianghua Li, Chen Qiu Abstract: Ontology ranking is one of the important functions of ontology search engines, which ranks searched ontologies based on the ranking model applied. A good ranking method can help users to acquire the exactly required ontology from a considerable amount of search results, efficiently. Existing approaches that rank ontologies take only a single aspect into consideration, and ignore users personalised demands, hence produce unsatisfactory results. It is believed that the factors that influence ontology importance and the users demands both need to be considered comprehensively in ontology ranking. A personalised ontology ranking model based on the hierarchical analysis approach is proposed in this paper. We build a hierarchically analytical model and apply an analytic hierarchy process to quantify ranking indexes and assign weights to them. The experimental results show that the proposed method can rank ontologies effectively and meet users personalised demands. Keywords: hierarchical analysis approach; ontology ranking; personalised demands; weights assignment. Deploying parallelised ciphertext-policy attributed-based encryption in clouds   by Hai Jiang Abstract: In recent years, cloud storage has become an attractive solution owing to its elasticity, availability and scalability. However, the security issue has started to prevent public clouds to move forward being more popular. Traditional encryption algorithms (both symmetric and asymmetric ones) fail to support achieving effective secure cloud storage owing to severe issues such as complex key management and heavy redundancy. Ciphertext-Policy Attribute Based Encryption (CP-ABE) scheme overcomes the aforementioned issues and provides fine-grained access control as well as deduplication features. CP-ABE has become a possible solution to cloud storage. However, its high complexity has prevented it from being widely adopted. This paper parallelises CP-ABE where issues to ensure secured cloud storage are considered and deployed in cloud storage environments. Major performance bottlenecks, such as key management and encryption/decryption process, are identified and accelerated, and a new AES encryption operation mode is adopted for further performance gains. Experimental results have demonstrated the effectiveness and promise of such a design. Keywords: CP-ABE; cloud storage; parallelisation; authentication. Collective intelligence value discovery based on citation of science article   by Yi Zhao, Zhao Li, Bitao Li, Keqing He, Junfei Guo Abstract: One of the tasks of scientific paper writing is to recommend. When the number of references is increased, there is no clear classification and the similarity measure of the recommendation system will show poor performance. In this work, we propose a novel recommendation research approach using classification, clustering and recommendation models integrated into the system. In an evaluation of the ACL Anthology papers network data, we effectively use a complex network of knowledge tree node degrees (refer to the number of papers) to enhance the accuracy of recommendation. The experimental results show that our model generates better recommended citation, achieving 10% higher accuracy and 8% higher F-score than the keyword march method when the data is big enough. We make full use of the collective intelligence to serve the public. Keywords: citation recommendation; classification; clustering; similarity; citation network. Differential evolution with k-nearest-neighbour-based mutation operator   by Gang Liu, Cong Wu Abstract: Differential evolution (DE) is one of the most powerful global numerical optimisation algorithms in the evolutionary algorithm family, and it is popular for its simplicity and effectiveness in solving numerous real-world optimisation problems in real-valued spaces. The performance of DE depends on its mutation strategy. However, the traditional mutation operators have difficulty in balancing the exploration and exploitation. To address these issues, in this paper, a k-nearest-neighbour-based mutation operator is proposed for improving the search ability of DE. This operator is used to search in the areas in which the vector density distribution is sparse. This method enhances the exploitation of DE and accelerates the convergence of the algorithm. In order to evaluate the effectiveness of our proposed mutation operator on DE, this paper compares other state-of-the-art evolutionary algorithms with the proposed algorithm. Experimental verifications are conducted on the CEC05 competition and two real-world problems. Experimental results indicate that our proposed mutation operator is able to enhance the performance of DE and can perform significantly better than, or at least comparably with, several state-of-the-art DE variants. Keywords: differential evolution; unilateral sort; k-nearest-neighbour-based mutation; global optimisation. Topic-specific image indexing and presentation for MEDLINE abstract   by Lan Huang, Ye Wang, Leiguang Gong, Tian Bai Abstract: MEDLINE is one of the largest databases of biomedical literature. The search results from MEDLINE for medical terms are in the form of lists of articles with PubMed IDs. To further explore and select articles that may help to identify potentially interesting interactions between terms, users need to navigate through the lists of URLs to retrieve and read actual articles to find relevancies among these terms. Such work becomes extremely time consuming and unbearably tedious when each query returns tens of thousands of results with an uncertain recall rate. To overcome this problem, we develop a topic-specific image indexing and presentation method for discovering interactions or relatedness of medical terms from MEDLINE, based on which a prototype tool is implemented to help discover interactions between terms of types of disease. The merits of the method are illustrated by search examples using the tool and MEDLINE abstract dataset. Keywords: MEDLINE; data visualisation; customised retrieval. Simultaneous multiple low-dimensional subspace dimensionality reduction and classification   by Lijun Dou, Rui Yan, Qiaolin Ye Abstract: Fisher linear discriminant (FLD) for supervised learning has recently emerged as a computationally powerful tool for extracting features for a variety of pattern classification problems. However, it works poorly with multimodal data. Local Fisher linear discriminant (LFLD) is proposed to reduce the dimensionality of multimodal data. Through experiments tried out on the multimodal but binary data sets created from several multi-class datasets, it has been shown to be better than FLD in terms of performance. However, LFLD has a serious limitation, which is that it is limited to use on small-scale datasets. In order to address the above disadvantages, in this paper we develop a Multiple low-dimensionality Dimensionality Reduction Technique (MSDR) of performing the dimensionality reduction (DR) of input data. In contrast to FLD and LFLD finding an optimal low-dimensional subspace, the new algorithm attempts to seek multiple optimal low-dimensional subspaces that best make the data sharing the same labels more compact. Inheriting the advantages of NC, MSDR reduces the dimensionality of data and directly performs classification tasks without the need to train the model. Experiments of comparing MSDR with the existing traditional approaches tried out on UCI, show the effectiveness and efficiency of MSDR. Keywords: Fisher linear discriminant; local FLD; dimensionality reduction; multiple low-dimensional subspaces. Using Gaussian mixture model to fix errors in SFS approach based on propagation   by Huang WenMin Abstract: A new Gaussian mixture model is used to improve the quality of the propagation method for SFS in this paper. The improved algorithm can overcome most difficulties of the method, including slow convergence, interdependence of propagation nodes and error accumulation. To slow convergence and interdependence of propagation nodes, a stable propagation source and integration path are used to make sure that the reconstruction work of each pixel in the image is independent. A Gaussian mixture model based on prior conditions is proposed to fix the error of integration. Good results have been achieved in the experiment for the Lambert composite image of front illumination. Keywords: shape from shading; propagation method; silhouette; Gaussian mixture model; surface reconstruction. Sign fusion of multiple QPNs based on qualitative mutual information   by Yali Lv, Jiye Liang, Yuhua Qian Abstract: In the era of big data, the fusion of uncertain information from different data sources is a crucial issue in various applications. In this paper, a sign fusion method of multiple Qualitative Probabilistic Networks (QPNs) with the same structure from different data sources is proposed. Specifically, firstly, the definition of parallel path in multiple QPNs is given and the problem of fusion ambiguity is described. Secondly, the fusion operator theorem has been introduced in detail, including its proof and algebraic properties. Further, an efficient sign fusion algorithm is proposed. Finally, experimental results demonstrate that our fusion algorithm is feasible and efficient. Keywords: qualitative probabilistic reasoning; QPNs; Bayesian networks; sign fusion; qualitative mutual information. Estimation of distribution algorithms based on increment clustering for multiple optima in dynamic environments   by Bolin Yu Abstract: Aiming to locate and track multiple optima in dynamic multimodal environments, an estimation of distribution algorithms based on increment clustering is proposed. The main idea of the proposed algorithm is to construct several probability models based on an increment clustering which improved performance for locating multiple local optima and contributed to find the global optimal solution quickly for dynamic multimodal problems. Meanwhile, a policy of diffusion search is introduced to enhance the diversity of the population in a guided fashion when the environment is changed. The policy uses both the current population information and the part history information of the optimal solutions available. Experimental studies on the moving peaks benchmark are carried out to evaluate the performance of the proposed algorithm in comparison with several state-of-the-art algorithms from the literature. The results show that the proposed algorithm is effective for the function with moving optimum and can adapt to the dynamic environments rapidly. Keywords: EDAs; dynamic multimodal problems; diffusion policy; incremental clustering.DOI: 10.1504/IJCSE.2017.10010004  A blind image watermarking algorithm based on amalgamation domain method   by Qingtang Su Abstract: Combining with the spatial domain and the frequency domain, a novel blind digital image watermarking algorithm is proposed in this paper to resolve the protecting copyright problem. For embedding a watermark, the generation principle and distribution features of direct current (DC) coefficient are used to directly modify the pixel values in the spatial domain, then four different sub-watermarks are embedded into different areas of the host image for four times, respectively. When extracting the watermark, the sub-watermarks are extracted in a blind manner according to the DC coefficients of the watermarked image and the key-based quantisation step, and then the statistical rule and first to select, second to combine are proposed to form the final watermark. Hence, the proposed algorithm not only has the simple and quick performance of the spatial domain but also has the high robustness feature of DCT domain. Many experimental results have proved that the proposed watermarking algorithm has good invisibility of watermark and strong robustness for many added attacks, e.g., JPEG compression, cropping, adding noise, etc. Comparison results also have shown the preponderance of the proposed algorithm. Keywords: information security; digital watermarking; combine domain; direct current. A data cleaning method for heterogeneous attribute fusion and record linkage   by Huijuan Zhu, Tonghai Jiang, Yi Wang, Li Cheng, Bo Ma, Fan Zhao Abstract: In big data era, when massive heterogeneous data are generated from various data sources, the cleaning of dirty data is critical for reliable data analysis. Existing rule-based methods are generally developed in a single data source environment, so issues such as data standardisation and duplication detection for different data-type attributes are not fully studied. In order to address these challenges, we introduce a method based on dynamic configurable rules which can integrate data detection, modification and transformation together. Secondly, we propose a type-based blocking and a varying window size selection mechanism based on a classic sorted-neighborhood algorithm. We present a reference implementation of our method in a real-life data fusion system and validate its effectiveness and efficiency using recall and precision metrics. Experimental results indicate that our method is suitable in the scenario of multiple data sources with heterogeneous attribute properties. Keywords: big data; varying window; data cleaning; record linkage; record similarity; SNM; type-based blocking. Chinese question speech recognition integrated with domain characteristics   by Shengxiang Gao, Dewei Kong, Zhengtao Yu, Jianyi Guo, Yantuan Xian Abstract: Aiming at domain adaptation in speech recognition, we propose a speech recognition method for Chinese question sentence based on domain characteristics. Firstly, by virtue of syllable association characteristics implied in domain term, syllable feature sequences of domain terms are used to construct the domain acoustic model. Secondly, in decoding process of domain-specific Chinese question speech recognition, we use a domain knowledge relationship to optimise and prune the speech decoding network generated by the language model, to improve continuous speech recognition. The experiments on the tourist domain corpus show that the proposed method has an accuracy of 80.50% on Chinese question speech recognition and of 91.50% on domain term recognition, respectively. Keywords: Chinese question speech recognition; speech recognition; domain characteristic; acoustic model library; domain terms; language model; domain knowledge library.DOI: 10.1504/IJCSE.2017.10008632  Original image tracing with image relational graph for near-duplicate image elimination   by Fang Huang, Zhili Zhou, Ching-Nung Yang, Xiya Liu Abstract: This paper proposes a novel method for near-duplicate image elimination, by tracing the original image of each near-duplicate image cluster. For this purpose, image clustering based on the combination of global feature and local feature is firstly achieved in a coarse-to-fine way. To accurately eliminate redundant images of each cluster, an image relational graph is constructed to reflect the contextual relationship between images, and the PageRank algorithm is adopted to analyse this contextual relationship. Then the original image will be correctly traced with the highest rank, while other redundant near-duplicate images in the cluster will be eliminated. Experiments show that our method achieves better performance both in image clustering and redundancy elimination, compared with the state-of-the-art methods. Keywords: near-duplicate image clustering; near-duplicate image elimination; image retrieval; image search; near-duplicate image retrieval; partial-duplicate image retrieval; image copy detection; local feature; contextual relationship. IFOA: an improved forest algorithm for continuous nonlinear optimisation   by Borong Ma, Zhixin Ma, Dagan Nie, Xianbo Li Abstract: The Forest Optimisation Algorithm (FOA) is a new evolutionary optimisation algorithm which is inspired by seed dispersal procedure in forests, and is suitable for continuous nonlinear optimisation problems. In this paper, an Improved Forest Optimisation Algorithm (IFOA) is introduced to improve convergence speed and the accuracy of the FOA, and four improvement strategies, including the greedy strategy, waveform step, preferential treatment of best tree and new-type global seeding, are proposed to solve continuous nonlinear optimisation problems better. The capability of IFOA has been investigated through the performance of several experiments on well-known test problems, and the results prove that IFOA is able to perform global optimisation effectively with high accuracy and convergence speed. Keywords: forest optimisation algorithm; evolutionary algorithm; continuous nonlinear optimisation; scientific decision-making. A location-aware matrix factorisation approach for collaborative web service QoS prediction   by Zhen Chen, Limin Shen, Dianlong You, Chuan Ma, Feng Li Abstract: Predicting the unknown QoS is often required because most users would have invoked only a small fraction of web services. Previous prediction methods benefit from mining neighborhood interest from explicit user QoS ratings. However, the implicitly existing but significant location information that would potentially tackle the data sparsity problem is overlooked. In this paper, we propose a unified matrix factorisation model that fully capitalises on the advantages of both location-aware neighborhood and latent factor approach. We first develop a multiview-based neighborhood selection method that clusters neighbours from the views of both geographical distance and rating similarity relationships. Then a personalised prediction model is built up by transforming the wisdom of neighborhoods. Experimental results have demonstrated that our method can achieve higher prediction accuracy than other competitive approaches and also better alleviate the concerned data sparsity issue. Keywords: service computing; web service; QoS prediction; matrix factorisation; location awareness. Pairing-free certificateless signature with revocation   by Sun Yinxia, Shen Limin Abstract: How to revoke a user is an important problem in public key cryptosystems. Free of costly certificate management and key escrow, the certificateless public key cryptography (CLPKC) are advantageous over the traditional public key system and the identity-based public key system. However, there are few solutions to the revocation problem in CLPKC. In this paper, we present an efficient revocable certificateless signature scheme. This new scheme can revoke a user with high efficiency. We also give a method to improve the scheme to be signing-key-exposure-resilient. Based on the discrete logarithm problem, our scheme is provably secure. Keywords: revocation; certificateless signature; without pairing; discrete logarithm problem.DOI: 10.1504/IJCSE.2017.10009399  Large universe multi-authority attribute-based PHR sharing with user revocation   by Enting Dong, Jianfeng Wang, Zhenhua Liu, Hua Ma Abstract: In the patient-centric model of health information exchange, personal health records (PHRs) are often outsourced to third parties, such as cloud service providers (CSPs). Attribute-based encryption (ABE) can be used to realise flexible access control on PHRs in the cloud environment. Nevertheless, the issues of scalability in key management, user revocation and flexible attributes remain to be addressed. In this paper, we propose a large-universe multi-authority ciphertext-policy ABE system with user revocation. The proposed scheme achieves scalable and fine-grained access control on PHRs. In our scheme, there are a central authority (CA) and multiple attribute authorities (AAs). When a user is revoked, the system public key and the other users' secret keys need not be updated. Furthermore, because our scheme supports a large attribute universe, the number of attributes is not polynomially bounded and the public parameter size does not linearly grow with the number of attributes. Our system is constructed on prime order groups and proven selectively secure in the standard model. Keywords: attribute-based encryption; large universe; multi-authority; personal health record; user revocation. A multi-objective optimisation multicast routing algorithm with diversity rate in cognitive wireless mesh networks   by Zhufang Kuang Abstract: Cognitive Wireless Mesh Networks (CWMNs) were developed to improve the usage ratio of the licensed spectrum. Since the spectrum opportunities for users vary over time and location, enhancing the spectrum effectiveness is a goal and also a challenge for CWMNs. Multimedia applications have recently generated much interest in CWMNs supporting Quality-Of-Service (QoS) communications. Multicast routing and spectrum allocation is an important challenge in CWMNs. In this paper, we design an effective multicast routing algorithm based on diversity rate with respect to load balancing and the number of transmissions for CWMNs. A Load Balancing wireless links weight computing function and computing algorithm based on Diversity Rate (LBDR) are proposed, and a load balancing Channel and Rate Allocating algorithm based on Diversity Rate (CRADR) is proposed. On this basis, a Load balancing joint Multicast Routing, channel and Rate allocation algorithm based on Diversity rate with QoS constraints for CWMNs (LMR2D) is proposed. Balancing the load of node and channel, and minimising the number of transmissions of multicast tree are the objectives of LMR2D. Firstly, LMR2D computes the weight of wireless links using LBDR and the Dijkstra algorithm for constructing the load balancing multicast tree step by step. Secondly, LMR2D uses CRADR to allocate channel and rate of its to links, which is based on the Wireless Broadcast Advantage (WBA). Simulation results show that LMR2D can achieve the expected goal. Not only can it balance the load of node and channel, but also it needs fewer transmissions for multicast tree. Keywords: cognitive wireless mesh networks; multicast routing; spectrum allocation; load balanced; diversity rate. Online multi-label learning with cost-sensitive budgeted SVM   by Jing Liu, Zhongwen Guo, Ling Jian, Like Qiu, Xupeng Wang Abstract: Multi-label learning deals with data associated with multiple labels simultaneously. It has been extensively studied in diverse areas such as information retrieval, bioinformatics, image annotation, etc. Explosive growth of multi-label related data has brought challenges of how to efficiently learn these labelled data and automatically label the unlabelled data. In this paper, we propose an online learning algorithm which processes the data arriving in streaming fashion. It is space-saving and scalable to large-scale problems. Specifically, to tackle the class imbalance problem, we exploit label prior to construct cost-sensitive function for sub-classification problem. Experimental studies corroborate the performance of our approaches on datasets drawn from diverse domains, and demonstrate that our proposed algorithm is an ideal candidate to process streaming data and deal with online multi-label learning tasks. Keywords: online learning; budgeted SVM; multi-label learning; cost-sensitive; stochastic gradient descent.DOI: 10.1504/IJCSE.2017.10009977  Context discriminative dictionary construction for topic representation   by Shufang Wu Abstract: The construction of a discriminative topic dictionary is important for describing the topic and increasing the accuracy of topic detection and tracking. In this method, we rank the mutual information of words, and the top few words with the maximum mutual information are selected to construct the discriminative topic dictionaries. Considering context words can provide a more accurate expression of the topic, during word selection, we both consider the differences between different topics and the context words that appear in the stories. Since the news topic is dynamic over time，it is not reasonable to keep the topic dictionary unchanged, so a dictionary updating method is also proposed. Experiments were carried out on TDT4 corpus, and we adopt miss probability and false alarm probability as evaluation criteria to compare the performance of incremental TF-IDF and the proposed method. Extensive experiments are conducted to show that our method can provide better results. Keywords: discriminative dictionary; context word; topic representation; word selection. Demystifying echo state network with deterministic simple topologies   by Duaa Elsarraj, Maha Al Qisi, Ali Rodan, Nadim Obeid, Ahmad Sharieh, Hossam Faris Abstract: Echo State Networks (ESN) are a special type of Recurrent Neural Networks (RNN) with distinct performance in the field of Reservoir computing. The state space of the ESN is initially randomised and the reservoir weights are fixed with training done only on the state readout. Beside the advantages of ESN, there remains some opacity in the dynamic properties of the reservoir owing to the presence of randomisation. Our aims in this paper are to demystify the model of ESN in a complete deterministic structure with the use of different proposed reservoir structures (topologies) and to compare their performance with the random ESN on different benchmark datasets. All applied topologies maintain the simplicity of random ESN computation complexity. Most of the topologies showed comparable or even better performance. Keywords: echo state network; reservoir computing; reservoir structure topology; memory capacity; echo state network algorithm; complexity. A state space distribution approach based on system behaviour   by Imene Bensetira, Djamel Eddine Saidouni, Mahfud Al-la Alamin Abstract: In this paper, we propose a novel approach to deal with the state space explosion problem occurring in model checking. We propose an off-line algorithm for distributed state space construction. That is carried out by reviewing the behaviour of the constructed system and redistributing the state space according to the accumulated information about the optimal considered behaviour. Therefore, the distribution will be guided by the systems behaviour. The proposed policy maintains the spatial-time balance. The simulation and implementation of our system are based on a multi-agent technique which fits very well the development of distributed systems. The experimental measures performed on a cluster of machines have shown very promising results for both workload balance and communication overhead. Keywords: model checking; combinatorial state space explosion; distributed state space construction; graph distribution; system behaviour; distributed algorithms; reachability analysis. Consensus RNA secondary structure prediction using information of neighbouring columns and principal component analysis   by Tianhang Liu, Jianping Yin, Long Gao, Wei Chen, Minghui Qiu Abstract: RNA is a family of biological macromolecules. It is important to all kinds of biological processes. RNA structures are closely related to their functions. Hence, determining the structure is invaluable in understanding genetic diseases and creating drugs. Nowadays, RNA secondary structure prediction is a field yet to be researched. In this paper, we present a novel method using an RNA sequence alignment to predict a consensus RNA secondary structure. In essence, the goal of the method is to give a prediction about whether any two columns of an alignment correspond to a base pair or not, using the information provided by the alignment. The information includes the covariation score, the fraction of complementary nucleotides and the consensus probability matrix of the column pair and those of its neighbours. Then principal component analysis is applied to overcome the problem of over-fitting. A comparison of our method and other consensus RNA secondary structure prediction methods, including NeCFold, ELMFold, KnetFold, PFold and RNAalifold, in 47 families from Rfam (version 11.0), is performed. Results show that our method surpasses the other methods in terms of Matthews correlation coeﬃcient, sensitivity and selectivity. Keywords: RNA secondary structure prediction; comparative sequence analysis; principal component analysis; information of neighbouring columns. Research on RSA and Hill hybrid encryption algorithm   by Hongyu Yang, Yuguang Ning, Yue Wang Abstract: An RSA-Hill hybrid encryption algorithm model based on random division of plaintext is proposed. First, the key of the Hill cipher is replaced by a Pascal matrix. Secondly, the session key of the model is replaced by random numbers of plaintext division, and encrypted by the RSA cipher. Finally, the dummy problem in the Hill cipher can be solved, and the model can achieve the one-time pad. Security analysis and experimental results show that our method has better encryption efficiency and stronger anti-attack capacity. Keywords: hybrid encryption; plaintext division; Pascal matrix; RSA cipher; Hill cipher. An auction mechanism for cloud resource allocation with time-discounting values   by Yonglong Zhang Abstract: Group-buying has emerged as a new trading paradigm and has become more attractive. Both sides of the transaction benefit from group-buying: buyers enjoy a lower price and sellers receive more demanding orders. In this paper, we investigate an auction mechanism for cloud resource allocation with time discounting values via group-buying, called TDVG. TDVG consists of two steps: winning seller and buyer selection, and pricing. In the first step, we choose winning seller and buyer in a greedy manner according to some criterion, and calculate the payment for each winning seller and buyer in the second step. Rigorous proof demonstrates that TDVG satisfies the properties of truthfulness, budget balance and individual rationality. Our experiment results show that TDVG achieves better total utility, matching rate and commodities use than the existing works. Keywords: cloud resource allocation; auction; time discounting values; group-buying. Study on data sparsity in social network-based recommender system   by Ru Jia, Ru Li, Meng Gao Abstract: With the development of information technology and the expanding of information resources, it is more difficult for people to get the information that they are really interested in, which is so-called information overload. Recommender systems are regarded as an important approach to deal with information overload, because it can predict users preferences according to users records. Matrix factorisation is very successful in recommender systems, but it faces the problem of data sparsity. This paper deals with the sparsity problem from the perspective of adding more kinds of information from social networks, such as friendships and tags, into the recommending model in order to alleviate the sparsity problem. The paper also validates the impacts of users friendships, tags and neighbours of items on reducing the sparseness of the data and improving the accuracy of recommending by the experiments using the dataset from real life. Keywords: social network-based recommender systems; matrix factorisation; data sparsity. A novel virtual disk bandwidth allocation framework for data-intensive applications in cloud environments   by Peng Xiao, Changsong Liu Abstract: Recently, cloud computing has become a promising distributed processing paradigm to deploy various kinds of non-trivial applications. In those applications, most of them are considered data-intensive and therefore require the cloud system providing massive storage space as well as desirable I/O performance. As a result, virtual disk technique has been widely applied in many real-world platforms to meet the requirements of these applications. Therefore, how to efficiently allocate the virtual disk bandwidth become an important issue that need to be addressed. In this paper, we present a novel virtual disk bandwidth allocation framework, in which a set of virtual bandwidth brokers are introduced to make allocation decisions by playing two game models. Theoretical analysis and solution are presented to prove the effectiveness of the proposed game models. Extensive experiments are conducted on a real-world cloud platform, and the results indicate that the proposed framework can significantly improve the use of virtual disk bandwidth compared with other existing approaches. Keywords: cloud computing; bandwidth reservation; quality of service; queue model; gaming theory. Academic research trend analysis based on big data technology   by Weiwei Lin, Zilong Zhang, Shaoliang Peng Abstract: Big data technology can well support the analysis of academic research trends, which requires the ability to process an enormous amount of metadata efficiently. On this point, we propose an academic trend analysis method that exploits a popular topic model for paper feature extraction and an influence propagation model for field influence evaluation. We also propose a parallel association rule mining algorithm based on Spark to accelerate trend analysis process. Experimentally, a vast amount of paper metadata was collected from four popular digital libraries: ACM, IEEE, Science Direct and Springer, serving as the raw data for our final feature dataset. Focusing on the hotspot of cloud computing, our result demonstrates that the most relevant topics to cloud computing have been changing these years from basic research to applied research, and from a microscopic point of view, the development of cloud computing related fields presents a certain periodicity. Keywords: big data; associate rule mining; Spark; Apriori; technology convergence. The discovery in uncertain-social-relationship communities of opportunistic network   by Xu Gang, Wang Jia-Yi, Jin Hai-He, Mu Peng-Fei Abstract: In the current studies of communities division of the opportunistic network, we always take the uncertain social relations as the input. In the practical application scenarios, because communications are always disturbed and the movements of nodes are random, the social relations are in the uncertain states. Therefore, the result of the community division based on the certain social relations is impractical. To solve the problem which cannot get the accurate communities under the uncertain social relations, we propose an uncertain-social-relation model of the opportunistic network in this paper. Meanwhile we analyze the probability distribution of the uncertain social relation and propose an algorithm of the community division based on the social cohesion, and then we divide communities by the uncertain social relations of opportunistic network. The experimental result shows that the Clique_detection_Based_SoH algorithm of the community division, which is based on the social cohesion, is more in accord with practical communities than the traditional K-clique algorithm of community division. Keywords: opportunistic network; uncertain social relations; k-clique algorithm; social cohesion; key node. Tag recommendation based on topic hierarchy of folksonomy   by Han Xue, Bing Qin, Ting Liu, Shen Liu Abstract: As a recommendation problem, tag recommendation has been receiving increasing attention from both the business and academic communities. Traditional recommendation methods are inappropriate for folksonomy because the basis of such mechanism remains un-updated in time owing to the bottleneck of knowledge acquisition. Therefore, we propose a novel method of tag recommendation based on the topic hierarchy of folksonomy. The method applies the topic tag hierarchy constructed automatically from folksonomy to tag recommendation using the proposed strategy. The method can improve the quality of folksonomy and can evaluate the topic tag hierarchy through tag recommendation. The precision of tag recommendation reaches 0.892. The experimental results show that the proposed method significantly outperforms state-of-the-art methods (t-test, p-value <0.0001) and demonstrates effectiveness with respect to data sources on tag recommendation. Keywords: tag recommendation; topic hierarchy; folksonomy. Collective entity linking via greedy search and Monte Carlo calculation   by Lei Chen, Chong Wu Abstract: Owing to the large amount of entities appearing on the web, entity linking has become popular recently. It assigns an entrance of a resource to one entity to help users grasp the meaning of this entity. Apparently, the entities that usually co-occur are related and can be considered together to find their best assignments. This approach is called collective entity linking and is often conducted based on entity graphs. However, traditional collective entity linking methods either consume much time owing to the large scale of entity graph or obtain low accuracy owing to simplifying graph to boost speed. To improve both accuracy and efficiency, this paper proposes a novel collective entity linking method based on greedy search and Monte Carlo calculation. Experimental results show that our linking algorithm can obtain both accurate results and low running time. Keywords: collective entity linking; relationship calculation; Monte Carlo calculation; greedy search.DOI: 10.1504/IJCSE.2017.10006975  Incremental processing for string similarity join   by Cairong Yan, Bin Zhu Abstract: String similarity join is an essential operation of data quality management and a key step to find the value of data. Now in the era of big data, the existing methods cannot meet the demands of incremental processing. By using the string partition technique, an incremental processing framework for string similarity join is proposed in this paper. This framework treats the inverted index of strings as a state that will be updated after each operation of a string similarity match. Compared with the batching processing model, such framework can avoid the heavy time cost and the space cost brought by the duplicate similarity computation among historical strings and is suitable for processing data streams. We implement two algorithms, Inc-join and Inp-join. Inc-join runs on a stand-alone machine while Inp-join runs on a cluster with Spark environment. The experimental results show that this incremental processing framework can reduce the number of string matchings without affecting the join accuracy and improve the response time for the streaming data join compared with the batch computation model. When the data quantity becomes large, Inp-join can make full use of the advantage of parallel processing and obtain a better performance than Inc-join. Keywords: string similarity join; incremental processing; parallel processing; string matching. A hybrid filtering-based network document recommendation system in cloud storage   by Wu Yuezhong, Liu Qin, Li Changyun, Wang Guojun Abstract: Since the key requirement of users is to efficiently obtain personalised services from mass network document resources, a hybrid filtering-based network document recommendation system is designed with the method of incorporating the content-based recommendation and collaborative filtering recommendation based on the powerful and extensible storage and computing power in cloud storage. The proposed system realises the main service module on Hadoop and Mahout platform, and processes the documents containing the information of user interests by applying AHP-based attribute weighted fusion method. Based on the network interaction, the proposed system not only has advantages on the extensible storage space and high recommendation precision but also has an essential role in realizing network resources sharing and personalised recommendation. Keywords: user interest model; collaborative filtering; recommendation system; cloud storage.DOI: 10.1504/IJCSE.2017.10008648  Multiobjective evolutionary algorithm on simplified biobjective minimum weight minimum label spanning tree problems   by Xinsheng Lai, Xiaoyun Xia Abstract: As general purpose optimisation methods, evolutionary algorithms have been efficiently used to solve multiobjective combinatorial optimisation problems. However, few theoretical investigations have been conducted to understand the efficiency of evolutionary algorithms on such problems, and even fewer theoretical investigations have been conducted on multiobjective combinatorial optimisation problems coming from the real world. In this paper, we analyse the performance of a simple multiobjective evolutionary algorithm on two simplified instances of the biobjective minimum weight minimum label spanning tree problem, which comes from real world. This problem is to find spanning trees that simultaneously minimise the total weight and also the total number of distinct labels in a connected graph where each edge has a label and a weight. Though these two instances are similar, the analysis results show that the simple multiobjective evolutionary algorithm is efficient for one instance, but it may be inefficient for the other. According to the analysis on the second instance, we think that the restart strategy may be useful in making the multiobjecctive evolutionary algorithm more efficient for the biobjective problem. Keywords: multiobjective evolutionary algorithm; biobjective; spanning tree problem; minimum weight; minimum label. High dimensional Arnold inverse transformation for multiple images scrambling   by Weigang Zou, Wei Li, Zhaoquan Cai Abstract: The traditional scrambling technology based on the low dimensional Arnold transformation (AT) is not able to assure the security of images during the transmission process, since the key space of the low dimensional AT is small and the scrambling period is short. Actually, the Arnold inverse transformation (AIT) is also a good image scrambling technique. The high-dimension AIT used in image scrambling can solve the shortcomings of low dimensional geometric transformation, have good image scrambling effect, and achieve the purpose of image encryption, which enriches the theory and application of image scrambling. Taking into account that an image has location space and colour space, the high dimensional AIT for image scrambling improves the anti-attack ability of image scrambling since the combination of the location space coordinates and the colour space component is very flexible. We investigated the property and application of AIT with five or six dimensions in the digital images scrambling. Specifically, we propose the theory of n dimensional AIT. Our investigations show that the technology in larger key space has a good effect on scrambling and has a certain application value. Keywords: information hiding; image scrambling; high dimensional transformation; Arnold transformation; Arnold inverse transformation; periodicity. CAT: a context-aware teller for supporting tourist experiences   by Francesco Colace, Massimo De Santo, Saverio Lemma, Marco Lombardi, Mario Casillo Abstract: The aim of this paper is the introduction of a methodology for the dynamic creation of an adaptive generator of stories related to a tourist context. The proposed approach selects the most suitable contents for the user and builds a context-aware teller that can support them during the exploration of the context, making it more appealing and immersive. The tourist can use the system by a hybrid app. The dynamic context-aware telling engine grabs the contents from a knowledge base that contains data coming both from the knowledge base and from the web. The user profile is updated thanks to information obtained during the visit and from social networks. A case study and some experimental results are presented and discussed. Keywords: context-aware; storyteller; social content; pervasive systems. Saving energy consumption for mixed workloads in cloud platforms   by Dongbo Liu, Peng Xiao, Yongjian Li Abstract: Virtualisation technology has been widely applied in cloud systems, however it also introduces many energy-efficiency losses especially when I/O virtualisation mechanism is concerned. In this paper, we present an energy-efficiency enhanced virtual machine (VM) scheduling policy, namely Share-Reclaiming with Collective I/O (SRC-I/O), with aim to reducing the energy-efficiency losses caused by I/O virtualisation. The SRC-I/O scheduler allows running VMs to reclaim extra CPU shares in certain conditions so as to increase CPU use. Meanwhile, SRC-I/O policy separates I/O-intensive VMs from CPU-intensive ones and schedules them in a batch manner, so as to reduce the context-switching costs of scheduling mixed workloads. Extensive experiments are conducted on various platforms by using different benchmarks to investigate the performance of the proposed policy. The results indicate that when the virtualisation platform is in presence of mixed workloads, the SRC-I/O scheduler outperforms existing VM schedulers in terms of energy efficiency and I/O responsiveness. Keywords: cloud computing; virtual machine; energy efficiency; mixed workload; task scheduling. The extraction of security situation in heterogeneous log based on Str-FSFDP density peak cluster   by Chundong Wang, Tong Zhao, Xiuliang Mo Abstract: Log analysis has been widely developed for identifying intrusion at the host or network. In order to reduce the false alarm rate in the process of security events extraction and discover a wide range of anomalies by scrutinising various logs, an improvement of Str-FSFDP (a fast search and find of peak density based data stream) clustering algorithm in heterogeneous log analysis is presented. Because of the advantages in data attribute relationship analysis for mixed attributes data, this algorithm can classify log data into two types whose corresponding distance measure metrics are designed. In order to apply Str-FSFDP in various logs, 12 attributes are defined in the unified XML format for clustering in this paper. These attributes are divided by the characteristics of each type of log and the importance of expressing a security event. To match the new micro cluster characteristic vector mentioned in the Str-FSFDP algorithm, this paper uses time gap to improve the UHAD (unsupervised anomaly detection model) framework. The time gap is designed as a threshold value based on micro cluster strategy. Experimental results reveal that the framework using Str-FSFDP clustering algorithm with time threshold can improve the aggregation rate of the log events and reduce the false alarm rate. As the algorithm has an analysis of attributes correlation, the connections between different IP addresses have been tested in the experiment. This helps us to look for the same attackers exploitation traces even if he fakes the IP addresses. It can increase the degree of aggregation in the same event. According to our analysis of each cluster, some serious attacks in the experiment have been summarised through the time line. Keywords: heterogeneous log; micro cluster; mixed attributes; unsupervised anomaly detection. An improved KNN text classification method   by Fengfei Wang, Zhen Liu, Chundong Wang Abstract: A text classification method based on improved SOM and KNN is introduced in this paper. In order to overcome the shortcomings of KNN in the text space model, this paper uses the SOM neural network to optimise the text classification. Based on this, this paper presents an improved SOM combined with KNN algorithm model. The SOM neural network weights of each dimension of the vector space model are calculated, using the SOM neural network in an unsupervised and no prior knowledge state of the sample to execute self-organisation and self-learning, to achieve evaluation and classification of the sample. This characteristic, using the SOM neural network combined with the KNN algorithm, effectively reduces the dimension of the vector, improves the clustering accuracy and speed and can effectively improve the efficiency of text classification. Keywords: text classification; KNN; SOM; neural network. On the evaluation of machine-learned network traffic classifiers   by Junhao Xu, Yu Wang Abstract: The recent years have seen extensive work on using machine learning techniques to classify network traffic based on flow and packet level characteristics. Previous studies reported promising results where the machine-learned classifiers generally achieved highly accurate predictions. However, some properties of the classifiers remain unexplored, of which the most critical one is the ability to identify unknown traffic. In this paper, we present an evaluation study on the issue. We show that most of the training and testing schemes in previous work are unrealistic as they assume that all classes are known a priori and sufficient training data for each class is available. Thus the fact that these classifiers were incapable of dealing with unseen traffic was over-looked. Experimental results obtained in two real-world internet traffic datasets are presented to illustrate the whole picture of the effectiveness of machine learning traffic classifiers. Keywords: machine learning; traffic analysis; data sharing; performance evaluation. Privacy-preserving location-based service protocols with flexible access   by Shuyang Tang, Shengli Liu, Xinyi Huang, Zhiqiang Liu Abstract: We propose an efficient privacy-preserving, content-protecting Location-based Service (LBS) scheme. Our proposal gives refined data classification and uses generalised ElGamal to support flexible access to different data classes. We also make use of Pseudo-Random Function (PRF) to protect users' position query. Since PRF is a light-weighted primitive, our proposal enables the cloud server to locate position efficiently while preserving the privacy of the queried position. Keywords: location-based services; outsourced cloud; security; privacy preserving. On providing on-the-fly resizing of the elasticity grain when executing HPC applications in the cloud   by Rodrigo Righi, Cristiano Costa, Vinicius Facco, Luis Cunha Abstract: Today, we observe that cloud infrastructures are gaining more and more space to execute HPC (High Performance Computing) applications. Unlike clusters and grids, the cloud offers elasticity, which refers to the ability of enlarging or reducing the number of resources (and consequently, processes) to support as close as possible the needs of a particular moment of the execution. In the best of our knowledge, current initiatives explore the elasticity and HPC duet by always handling the same number of resources at each scaling in or out of operation. This fixed elasticity grain commonly reveals a stair-shaped behaviour, where successive elasticity operations take place to address the load curve. In this context, this article presents GrainElastic: an elasticity model to execute HPC applications with the capacity to adapt the elasticity grain to the requirements of each elasticity operation. Its contribution concerns a mathematical formalism that uses historical execution traces and ARIMA time series model to predict the required number of resources (in our case, VMs) to address a reconfiguration point. Based on the proposed model, we developed a prototype that was compared with two other scenarios: (i) non-elastic application and (ii) elastic middleware with a fixed grain. The results presented gains up to 30% in favour of GrainElastic, showing us the relevance on adapting the elasticity grain to enhance system reactivity and performance. Keywords: elasticity; resource management; HPC; cloud computing; elasticity grain; adaptivity. Can the hybrid colouring algorithm take advantage of multi-core architectures?   by João Fabrício Filho, Luis Gustavo Araujo Rodriguez, Anderson Faustino Da Silva Abstract: Graph colouring is a complex computational problem that focuses on colouring all vertices of a given graph using a minimum number of colours. However, adjacent vertices are restricted from receiving the same colour. Over recent decades, various algorithms have been proposed and implemented to solve such a problem. An interesting algorithm is the Hybrid Coloring Algorithm (HCA), which was developed in 1999 by Philippe Galinier and Jin-Kao Hao. The HCA was widely regarded at the time as one of the best performing algorithms for graph colouring. Nowadays, high-performance out-of-order multi-cores have emerged that execute applications faster and more efficiently. Thus, the objective of this paper is to analyse whether the HCA can take advantage of multi-core architectures, in terms of performance, or not. For this purpose, we propose and implement a parallel version of the HCA that takes advantage of all hardware resources. Several experiments were performed on a machine with two Intel(R) Xeon(R) CPU E5-2630 processors, thus having a total of 24 cores. The experiment proved that the parallel HCA, using multi-core architectures, is a significant improvement over the original because it achieves enhancements of up to 40% in terms of the distance to the best chromatic number found in the literature. The expected contribution of this paper is to encourage developers to take advantage of high performance out-of-order multi-cores to solve complex computational problems. Keywords: metaheuristics; hybrid colouring algorithm; graph colouring problem; architecture of modern computers. Learning pattern of hurricane damage levels using semantic web resources   by Quang-Khai Tran, Sa-kwang Song Abstract: This paper proposes an approach for hurricane damage level prediction using semantic web resources and matrix completion algorithms. Based on the statistical unit node set framework, streaming data from five hurricanes and damage levels from 48 counties in the USA were collected from the SRBench dataset and other web resources, and then trans-coded into matrices. At a time t, the pattern of possible highest damage levels at 6 hours into the future was estimated using a multivariate regression procedure based on singular value decomposition. We also applied the Soft-Impute algorithm and k-nearest-neighbours concept to improve the statistical unit node set framework in this research domain. Results showed that the model could deal with inaccurate, inconsistent and incomplete streaming data that were highly sparse, to learn future damage patterns and perform forecasting in near real time. It was able to estimate the damage levels in several scenarios even if two-thirds of the relevant weather information was unavailable. The contributions of this work will be able to promote the applicability of the semantic web in the context of climate change. Keywords: hurricane damage; statistical unit node set; matrix completion; SRBench dataset; streaming data. CUDA GPU libraries and novel sparse matrix-vector multiplication implementation and performance enhancement in unstructured finite element computations   by Richard Haney, Ram V. Mohan Abstract: The efficient solution to systems of linear and non-linear equations arising from sparse matrix operations is a ubiquitous challenge for computing applications that can be exacerbated by the employment of heterogeneous architectures such as CPU-GPU computing systems. There is a common need for efficient implementation and computational performance of solution of sparse system of linear equations in many unstructured finite element-based computations of physics based modeling problems. This paper presents our implementation of a novel sparse matrix-vector multiplication (a significant compute load operation in the iterative solution via pre-conditioned conjugate gradient based methods) employing LightSpMV with Compressed Sparse Row (CSR) format, and the resulting performance characteristics. An unstructured finite element-based computational simulation involving multiple calls to iterative pre-conditioned conjugate gradient algorithm for the solution to a linear system of equations employing a single CPU-GPU computing system using NVidia Compute Unified Device Architecture libraries is employed for the results discussed in the present paper. The matrix-vector product implementation is examined within the context of a resin transfer molding simulation code. Results from the present work can be applied without loss of generality to many other unstructured, finite element-based computational modeling applications in science and engineering that employ solutions to sparse linear and non-linear system of equations using CPU-GPU architecture. Computational performance analysed indicates that LightSpMV can provide an asset to boost performance for these computational modelling applications. This work also investigates potential improvements in the LightSpMV algorithm using CUDA 35 intrinsic, which results in an additional performance boost by 1%. While this may not be significant, it supports the idea that LightSpMV can potentially be used for other full-solution finite element-based computational implementations. Keywords: general purpose GPU computing; sparse matrix-vector; finite element method; CUDA; performance analysis. Rational e-voting based on network evolution in the cloud   by Tao Li, Shaojing Li Abstract: Physically distributed voters can vote online through an electronic voting (e-voting) system. It can outsource the counting work to the cloud when the system is overloaded. However, this kind of outsourcing may lead to some security problems such as anonymity, privacy, fairness etc. Suppose servers in the cloud have no incentives to deviate from the e-voting system, these security problems can be effectively solved. In this paper, we assume that servers in the cloud are rational, and try to maximise their utilities. We look for incentives for rational servers not to deviate from the e-voting system. Here, no deviation means rational servers prefer to cooperate in the e-voting system. Simulation results of our evolution model show that the cooperation level is high after certain rounds. Finally, we put forward a rational e-voting protocol based on the above results and prove that the system is secure under proper assumptions. Keywords: electronic voting; utility; cloud computing; rational secret sharing. Water contamination monitoring system based on big data: a case study   by Gaofeng Zhang, Yingnan Yan, Yunsheng Tian, Yang Liu, Yan Li, Qingguo Zhou, Rui Zhou, Kuan-Ching Li Abstract: Water plays a vital role in peoples lives, and individuals cannot survive without it. However, water contamination has become a serious issue with the development of industry and agriculture, and has become a threat to peoples daily life. Moreover, the amount of data people need to process becomes excessively complex and huge in the big data era. Hence, data management is increasingly a difficult task. There is an urgent need to develop a system to identify major changes of water quality through monitoring and managing these water quality variables. In this paper, we develop a data monitoring system named Monitoring and Managing Data Center (MMDC) for monitoring, downloading, sharing, and time-series analysis based on big data technology. In order to reflect the real hydrological ecosystem, water quality variable data collected from Taihu Lake in China is used to verify the effectiveness of MMDC. Results show that MMDC is effective for monitoring and management of massive data. Although this investigation is focused on Taihu Lake, it is applicable as a general monitoring system for other similar natural resources. Keywords: water contamination; big data; MMDC; monitoring; data analysis. Passive image autofocus by using direct fuzzy transform   by Ferdinando Di Martino, Salvatore Sessa Abstract: We present a new passive autofocusing algorithm based on fuzzy transforms. In a previous work a localised variation of the variance operator was proposed based on the concept of fuzzy subspaces of the image: fuzzy C-means and conditional fuzzy C-means algorithms are applied for detecting the fuzzy subspaces. The direct fuzzy transform is used for extracting the mean values of the image intensity in a fuzzy subspace, then a weighted sum of the local variance operators obtained in each subspace is calculated as well. We propose a new approach based on the fuzzy generalised fuzzy C-means algorithm, where the number of fuzzy subspaces is obtained by using the partition coefficient and exponential separation validity indexes. Comparisons show that our method is more robust with respect to the localised variation of the variance operator. Keywords: image autofocusing; image contrast; variance; FCM; fuzzy transform. Arrhythmia recognition and classification through deep learning based approach   by Rui Zhou, Xue Li, Binbin Yong, Zebang Shen, Chen Wang, Qingguo Zhou, Yunshan Cao, Kuan-Ching Li Abstract: Arrhythmia is a cardiac condition caused by abnormal electrical activity of the heart, which can be life-threatening. Electrocardiogram (ECG) is the principal diagnostic tool used to detect arrhythmias or heart abnormalities. It contains information about the different types of arrhythmia. However, owing to the complexity and non-linearity of ECG signals, such as the presence of noise, the time dependence of ECG signals and the irregularity of the heartbeat, it is troublesome to analyse ECG signals manually. Moreover, the interpretation of ECG signals is subjective and might vary among experts in the field. Therefore, an automatic, high-precision ECG recognition method is important to arrhythmia detection. For such, it is proposed in this paper a method to arrhythmia classification, which is based on a deep learning based approach called Long Short-Term Memory (LSTM), where five classes of arrhythmia as recommended by the Association for Advancement of Medical Instrumentation (AAMI) are analysed. The method has been tested on the MIT-BIH Arrhythmia Database with a number of useful performance evaluation measures, showing that it has a promising and better performance than other artificial intelligence methods used. Keywords: electrocardiogram signal; long short-term memory; arrhythmia classification; artificial intelligence; deep learning. Publicly verifiable function secret sharing   by Qiang Wang, Fucai Zhou, Su Peng, Jian Xu Abstract: Function Secret Sharing (FSS) allows a dealer to split a secret function into n sub-functions, described by n evaluation keys, such that only a combination of all of these keys could reconstruct the secret function. However, it is impossible to recover the secret correctly if there exist some sharers deviating from intended behaviors. To settle this problem, we propose a new primitive called Publicly Verifiable Function Secret Sharing (PVFSS), in which any client could verify the validity of secret in constant time. Furthermore, we define three important properties: public delegation, public verification and high efficiency, which are an essential part in our scheme. Finally, we construct a PVFSS scheme for point function, then we prove its security and make performance analysis in two major directions: key length and algorithm efficiency. The analysis validates that our proposed scheme is asymptotic to FSS. It would be applicable to cloud computing. Keywords: PVFSS; cloud computing; high efficiency; public delegation; public verification. Parallel context-aware multi-agent tourism recommender system   by Richa Singh, Punam Bedi Abstract: The presence of millions and millions of users and items makes real-time filtering a time-consuming process in recommender systems. In context-aware recommender systems, the choices of users depend on the contextual information as well as available items. This helps to reduce the user item data to some extent, but the rapid change in the interests of a user under different contexts puts an extra load on recommender systems. To address this problem, we present a parallel approach for context-aware recommender systems using a multi-agent system that greatly accelerates the processing time. General Purpose Graphic Processing Unit (GPGPU) is used to exploit the parallel behaviour of the system along with CUDA (Compute Unified Device Architecture) and JCuda. The proposed algorithm works in both offline and online phases. Contextual filtering and multi-agent environment help to keep the system updated with the context of the user. A prototype of the system is developed using JCuda, JADE and Java technologies for the tourism domain. The performance of the presented system is compared with the context-aware recommender system without parallel processing with respect to processing time and scalability, as well as precision, recall and F-measure. The results show a significant speedup for the presented system over the non-parallel context-aware recommender system. Keywords: multi-agent system; recommender system; context aware; parallel processing; tourism.DOI: 10.1504/IJCSE.2017.10010189  Graph databases for openEHR clinical repositories   by Samar El Helou, Shinji Kobayashi, Goshiro Yamamoto, Naoto Kume, Eiji Kondoh, Shusuke Hiragi, Kazuya Okamoto, Hiroshi Tamura, Tomohiro Kuroda Abstract: The archetype-based approach has now been adopted by major EHR interoperability standards. Soon, owing to an increase in EHR adoption, more health data will be created and frequently accessed. Previous research shows that conventional persistence mechanisms such as relational and XML databases have scalability issues when storing and querying archetype-based datasets. Accordingly, we need to explore and evaluate new persistence strategies for archetype-based EHR repositories. To address the performance issues expected to occur with the increase of data, we proposed an approach using labelled property graph databases for implementing openEHR clinical repositories. We implemented the proposed approach using Neo4j and compared it with an Object Relational Mapping (ORM) approach using Microsoft SQL Server. We evaluated both approaches over a simulation of a pregnancy home-monitoring application in terms of required storage space and query response time. The results show that the proposed approach provides a better overall performance for clinical querying. Keywords: openEHR; graph database; EHR; database; performance; archetypes; reference model; EHR repository; archetype-based storage; query response time. Kernel-based tensor discriminant analysis with fuzzy fusion for face recognition   by Xiaozhang Liu, Hangyu Ruan Abstract: This paper proposes a novel kernel-based image subspace learning method for face recognition, by encoding a face image as a tensor of second order (matrix). First, we propose a kernel-based discriminant tensor criterion, called kernel bilinear fisher criterion (KBFC), which is designed to simultaneously pursue two projection vectors to maximise the interclass scatter and at the same time minimise the intraclass scatter in its corresponding subspace. Then, a score level fusion method is presented to combine two separate projection results to achieve classification tasks. Experimental results on the ORL and UMIST face databases show the effectiveness of the proposed approach. Keywords: kernel; tensor discriminant; bilinear discriminant; matrix representation; face recognition. Modelling of advanced persistent threat attack monitoring based on the artificial fish swarm algorithm   by Biaohan Zhang Abstract: In recent years, Advanced Persistent Threat (APT) has become one of the important factors that threaten network security. Aiming at the APT attack defence problem, this paper proposes an APT attack monitoring method based on the principle of artificial fish swarm algorithm. The attack monitoring model is established by imitating the behaviour of the artificial fish swarm. The model is used to dynamically monitor the environment, and the APT attack index is simulated with the food consistence to monitor the position of the highest APT attack index. The experimental results show that the monitoring model designed by this method can effectively monitor and forecast the attack target, and also has good expansibility and practicability. Keywords: artificial fish swarm algorithm; advanced persistent threat attack; monitoring model. Multilayer ensemble of ELMs for image steganalysis with multiple feature sets   by Punam Bedi, Veenu Bhasin Abstract: A multilayer ensemble of Extreme Learning Machines (ELM) for multi-class image steganalysis is proposed in this paper. The proposed ensemble consists of three levels and uses multiple feature sets extracted from images. The first two layers form sub-ensembles, one sub-ensemble for each of the feature sets. Each feature set is partitioned and used with multiple ELMs at level-1. These feature sets along with the output of the ELMs at level-1 are used by different ELMs at level-2 to classify images into multiple classes. To combine these results from sub-ensembles a stacking technique is used. Results of level-2 ELMs are used as input for the last level ELM. The fast learning process of ELM aids the speedy execution of the proposed method. Performance of the proposed method is compared with existing steganalysis methods based on individual feature sets and on 2-level ensemble. The experimental study demonstrates that the proposed method classifies images into multiple classes with higher accuracy and this has been confirmed using t-test with 99% confidence. Keywords: steganalysis; extreme learning machine; Markov random process; ensemble of ELMs.DOI: 10.1504/IJCSE.2017.10010576  An anchor node selection mechanism-based node localisation for mines using wireless sensor networks   by Kangshun Li, Hui Wang, Ying Huang Abstract: To tackle the low localisation accuracy problem in wireless sensor network (WSN) nodes in mines, a localisation algorithm is proposed to improve the localisation accuracy of Received Signal Strength Indication (RSSI) using an anchor node selection mechanism. This localisation mainly includes three phases. First, the anchor node RSSI values received from an unknown node are sorted from high to low. Second, the four anchor nodes with the highest RSSI values are selected by a Gaussian elimination method. These nodes are not in the same plane and form a prismatic shape, and the distance from any one node to a plane consisting of another three points is not less than a certain threshold value. Finally, the least squares method is used to estimate the coordinates of the unknown nodes to realise the precise localisation of the unknown nodes. The simulation results show that the proposed algorithm has greatly improved the localisation accuracy compared with other traditional localisation algorithms. Keywords: underground tunnel; received signal strength indication; anchor node selection; least squares method; Gauss elimination method. A malware variants detection methodology with an opcode-based feature learning method and a fast density-based clustering algorithm   by Hui Yin, Jixin Zhang, Zheng Qin Abstract: Malware is one of the most terrible and major security threats facing the internet today, which can be defined as any type of malicious code to harm a computer or network. As malware variants may be equipped with sophisticated mechanisms to bypass traditional detection systems, in this paper, we propose a malware variant detection approach that can automatically, quickly, and accurately detect malware variants. In our approach, we present an asynchronous architecture for automated training and detection. Under this architecture, to improve the detection speed while retaining the accuracy, we propose an information entropy-based feature extraction method to extract a few but very useful features and a distance-based weight learning method to weight these features. To further improve the detection speed, we propose our fast density-based clustering algorithm. We evaluate our approach with a number of Windows-based malware instances that belong to six large families, and our experiments demonstrate that our automated malware variant detection method is able to achieve high accuracy with a significant speedup in comparison with the other state-of-art approaches. Keywords: distance-based weight learning; fast density-based clustering; information entropy; malware variants. Optimised tags with time attenuation recommendation algorithm based on tripartite graphs network   by Ming Zhang, Wei Chen Abstract: Social recommendation has attracted increasing attention in recent years owing to the potential value of social relations in recommender systems. Social tags play an important role in improving the recommendation accuracy. However, garbage tags may lead to the issue of data matrix sparseness and affect the accuracy and performance of the recommendation system. To optimise the social tags in the recommendation system, the tags are sorted by popularity ranking method with the time attention model in order to remove the garbage tags. The time attention model is used to consider the variation of tags with the change of time. Then a novel recommendation algorithm with the optimised social tags is proposed, based on the complete tripartite graph network. This method considers the preference information of users and items, and generates the recommendation items for users on the basis of collaborative filtering. Experimental results show that the proposed algorithm predicts the recommendation items more accurately than other existing approaches. Keywords: tags optimisation; tripartite graphs network; time attenuation model; social recommendation. Probabilistic rough-set-based band selection method for hyperspectral data classification   by Li Min, Wang Lei, Deng Shaobo Abstract: This paper proposes an innovative band selection algorithm called probabilistic rough-set-based band selection (PRSBS) algorithm. The proposed algorithm is a supervised band selection algorithm with efficiency because it needs to calculate only the first-order significance measure. The main novelty of the proposed PRSBS algorithm is lined in criterion function, which measures the effectiveness of the considered band. The algorithm uses a probabilistic distribution dependency as the relevance measure between the bands and class labels, which can effectively measure the uncertainty of both the positive and the boundary samples in a dataset. We compared the proposed PRSBS with the most relevant band selection algorithm, RSBS, on three different hyperspectral datasets; the experimental results show that the PRSBS has better results than the RSBS. Moreover, the PRSBS algorithm runs significantly faster than the RSBS algorithm, which makes it a good choice for band selection in hyperspectral image datasets. Keywords: band selection; probabilistic rough set; hyperspectral image; classification. A universal designated multi verifiers content extraction signature scheme   by Min Wang, Yuexin Zhang, Jinhua Ma, Wei Wu Abstract: A notion to combine the content extraction signature and the universal designated verifier signature was put forth by Lin in 2012. Specifically, it allows an extracted signature holder to designate the signature to a prospective verifier. However, existing designs become inefficient when multi verifiers are involved. To improve the efficiency, in this paper, we extend the notion to the Universal Designated Multi Verifiers Content Extraction Signature ($\mathrm{UDMVCES}$). Implementing our new scheme, the extracted signature holder can efficiently designate the signature to multi verifiers. Additionally, we provide the security notions and prove the security of the proposed scheme in the random oracle model. To illustrate the efficiency of our $\mathrm{UDMVCES}$ scheme, we analyse its performance. The analysis shows that the computation costs and signature lengths of the new scheme are independent of the number of verifiers. Keywords: content extraction signature; universal designated multi verifiers signature; extracted signature; random oracle model. Dynamic input domain reduction for test data generation with iterative partitioning   by Esmaeel Nikravan, Saeed Parsa Abstract: A major difficulty concerning test data generation for white box testing is to detect the domain of input variables covering a certain path. To this aim a new concept, domain coverage, is introduced in this article. In search of appropriate input variable subdomains, covering a desired path, the domains are randomly partitioned as far as subdomains whose boundaries satisfy the path constraints are found. When partitioning, the priority is given to those subdomains whose boundary variables do not satisfy the path constraints. Representing the relation between the subdomains and their parents as a directed acyclic graph, an Euler/Venn reasoning system could be applied to select the most appropriate subdomains. To evaluate our proposed path-oriented test data generation method, the results of applying the method to six known benchmark programs, Triangle, GCD, Calday, Shellsort, Quicksort and Heapsort, is presented. Keywords: random testing; test data generation; Euler/Venn diagram; directed acyclic graph. Multi-class instance-incremental framework for classification in fully dynamic graphs   by Hardeo Kumar Thakur, Anand Gupta, Ritvik Shrivastava, Sreyashi Nag Abstract: Existing work in the area of graph classification is mostly restricted to static graphs. These static classification models prove ineffective in several real-life scenarios that require an approach capable of handling data of a dynamic nature. Further, the limited work in the domain of dynamic graphs has mainly focused on solely incremental graphs, which fail to accommodate Fully Dynamic Graphs (FDG). Hence, in this paper, we propose a comprehensive framework targeting multi-class classification in fully dynamic graphs by using the efficient Weisfeiler-Lehman graph kernel (W-L) with a multi-class Support Vector Machine (SVM). The framework iterates through each update using the instance-incremental method while retaining all historical data in order to ensure higher accuracy. Reliable validation metrics are used for the model parameter selection and output verification. Experimental results over four case studies on real-world data demonstrate the efficacy of our approach. Keywords: fully dynamic graph; dynamic graph; graph classification; multi-class classification. Assessment of nested-parallel task model under real-time scheduling on multi-core processors   by Mahesh Lokhande, Mohammad Atique Abstract: Real-time applications contain numerous time-bound parallel tasks with enormous computations. Parallel models, not sequential models, have the capability to handle intra-task parallelism and accomplish such tasks in a specific time or before. Previous researchers presented the task models for parallel tasks, but not for the nested-parallel tasks. This paper deals with the real-time scheduling of periodic nested-parallel tasks with an implicit-deadline on multi-core processors. Initially, the focus is on a nested-parallel task model. Next, a novel task disintegration technique is studied where the MAMs ratio is defined to categorise the segments. It is theoretically proved that the discussed disintegration technique achieved a speedup factor of 4.30 and 3.40 when the tasks, after disintegration, were scheduled under partitioned DM (Deadline Monotonic) and global EDF (Earliest Deadline First) scheduling, respectively. Further, considering the overhead factor (β) for non-preemptive global EDF scheduling, the disintegration technique was analysed and achieved a speedup factor of 3.73 (for β=1). The proposed disintegration technique is assessed through the simulations thereby indicating the adequacy of derived speedup factors. Keywords: nested-parallel tasks; real-time scheduling; partitioned DM scheduling; EDF scheduling; multi-core processors; task disintegration; speedup factor. Recognition of landslide disasters with extreme learning machine   by Guanyu Chen, Xiang Li, Wenyin Gong Abstract: The geological disasters of landslides induced by the Wenchuan earthquake are great in number so landslide disaster recognition and investigation must be conducted in the early stage of large construction planning in the disaster area. In recent years, the studies on image recognition have focused on the extreme learning machine (ELM)algorithm. Based on the preprocessing of remote sensing images, this paper conducts landslide recognition with remote sensing images through the ELM classification combined with colour and texture features of ground objects. The comparison experiments of landslide recognition with the support vector machine (SVM) algorithm shows that the recognition accuracy of the ELM algorithm is not much different from that of the SVM algorithm, but the ELM takes short time in training with absolute advantage. Keywords: geological disaster; remote sensing image; extreme learning machine; landslide recognition. Graffiti-writing recognition with fine-grained information   by Jiashuang Xu, Jiashuang Zhangjie Fu Abstract: Contactless HCI (Human-Computer Interaction) has become a new trend due to the springing up of the novel intelligent terminals. The existing interaction systems usually adopt depth cameras, motion controller, and radiofrequency devices. The common drawback of the above approaches is that all the participants are required to obey the unistroke writing standard for data acquisition. The uniformity of the writing rule simplifies the data acquisition stage, but it breaks the integrity of the handwriting system. In practice, the writing habits vary among people. It is observed that eight capitalised letters of the alphabet possess more than one writing pattern. Thus, we are motivated to propose a more adaptive, contactless graffiti-writing recognition system with CSI (Channel State Information) derived from Wi-Fi signals. The discrete wavelet transform is used for denoising. We choose a sliding window to calculate the MAD (Mean Absolute Deviation)to detect the start and end points. We extract the unique CSI waveform caused by writing action to represent each letter. To cater for more users writing customs and improve the universality of the system, we train separate HMMs (Hidden Markov Model) for the eight letters and conduct cross-validation for testing. The average detection accuracy reaches 94.5%. The average recognition accuracy for the 26-letter model is 85.96% when the number of the training sample is 100 from five subjects. The real-time recognition efficiency measured by characters per minute is 11.97(= 31/155.24 s). Keywords: air-write recognition; wireless sensing; channel state information. A new neural architecture for feature extraction of remote sensing data   by Mustapha Si Tayeb, Hadria Fizazi Abstract: The paper presents a novel method for the classification of remote sensing data. The proposed approach comprises two main steps: 1) Extractor Multi-Layer Perceptron (EMLP) is used for feature extraction of the remote sensing data; then 2) the data resulting from the EMLP are classified using a Support Vector Machine (SVM) algorithm. The contribution of this work is mainly in the creation of the EMLP method based on the Multi-Layer Perceptron (MLP) method, which has the role of creating a dataset more representative of the classes from the original dataset. To better situate and evaluate our proposed approach, we applied our proposed technique to three datasets, namely, Statlog Landsat Satellite, Urban Land Cover and Landsat TM Oran, and several measures were used, for example, classification rate, classification error, precision, recall and F-measure. The experimental results show that the proposed approach (EMLP-SVM) is more efficient and powerful than the basic methods (MLP and SVM) and the existing state-of-the-art classification methods. Keywords: classification methods; feature extraction; remote sensing data; extractor multi-layer perceptron; support vector machine; supervised learning. Zero knowledge proof for secure two-party computation with malicious adversaries in distributed networks   by Xiaoyi Yang, Meijuan Huang Abstract: Distributed networks remarkably enhance the convenience of network connectivity. How to achieve efficient cooperative computation while preserving data privacy is a challenge in the scenario of distributes networks. Secure computation, as the key technology of information security and privacy protection in distributed networks, attracts more and more attention. The paper concerns the protocols of secure two-party computation in the presence of malicious adversaries, which are constructed with homomorphic probabilistic cryptosystem, and proposes four honest-verifier zero-knowledge proof protocols to detect two cheating behaviours of the malicious adversary. The proposed protocols are more targeted than the existing work. The analysis shows that the proposed protocols are complete, sound and zero-knowledge. As an application, we show how to use our protocols in a secure two-party protocol to detect cheating, which can make it secure in the presence of malicious adversaries. Keywords: distributed networks; secure two-party computation; malicious adversaries; Homomorphic encryption; zero knowledge proof. Parallelisation of practical shared sampling alpha matting with OpenMP   by Tien-Hsiung Weng, Chi-Ching Chiu, Huimin Lu Abstract: In the modern filmmaking industry, image matting has been one of the common tasks in video side effects and the necessary intermediate steps in computer vision. It pulls the foreground object from the background of an image by estimating the alpha values. However, the computational speed for matting high resolution images can be significantly slow owing to its complexity and the computation that is proportional to the size of unknown region. In order to improve the performance, we implement a parallel alpha matting code with OpenMP from existing sequential code for running on the multicore servers. We present and discuss the algorithm and experimentation results in the parallel application developer perspective. The development takes less effort, however the results show significant performance improvement of the entire program. Keywords: OpenMP; image matting; multicores; parallel programming. A novel coverless text information hiding method based on double-tags and twice-send   by Xiang Zhou, Xianyi Chen, Fasheng Zhang, Ningning Zheng Abstract: Recently, coverless text information hiding (CTIH) has attracted the attention of an increasing number of researchers because of the high security. However, there are still many problems to be solved, for example the efficiency of retrieving and the hiding capacity. In the existing CTIH methods, the secret information is embedded to be one carrier with one label to ensure the success rate of hiding. In this paper, we proposed a novel CTIH method based on the double tags and twice-send, in which the double tags in a text are achieved by designing the odd-even adjudgement, and a reverse index is created firstly to promote the efficiency of retrieving, then transform characters into binary numbers, which will be employed as the location tags to determine the secret information in the received texts. Finally, this improves the success rate of hiding by sending the document twice. The experimental results show that the proposed method improves the hiding capacity and efficiency compared with existing text CIH algorithms. Keywords: coverless information hiding; double tags; twice-send. Secure search service based on Word2vec in the public cloud   by Yangen Liu, Zhangjie Fu Abstract: Accompanied by the continuous development of the information technology, many information domains have grown explosively. The users of cloud servers continue to grow as the cloud is flexible and useful to the economy. For data privacy, data owners will encrypt their private data before outsourcing them to the public cloud. How to query the data quickly becomes a new problem when a large amount of encrypted data is stored in the public cloud. Most existing solutions generate index vectors based on dictionaries, but these vectors do not reflect the semantic information of the articles. In this paper, we propose two safety retrieval schemes based on Word2vec (SSSW-1 and SSSW-2). By establishing a model through Word2vec training method, we generate index vectors directly for keywords extracted from data documents. At the same time, the index vector can also reflect the semantic relationship of the document so as to improve the accuracy of the retrieval. Subsequently, we outsource data and index that have been encrypted to the public cloud. The cloud server will return documents in the order of similarity scores according to the search request. Cosine measure will be used in this paper to calculate the similarity scores. Experiments based on real data show that the two schemes are effective and feasible. Keywords: encrypted cloud data; Word2vec; secure search; cloud computing. Fast CU size decision based on texture-depth relationship for depth map encoding in 3D-HEVC   by Liming Chen, Zhaoqing Pan, Xiaokai Yi, Yajuan Zhang Abstract: Because many advanced encoding techniques are introduced into the 3D-HEVC, it achieves higher encoding efficiency than HEVC. However, the encoding complexity of 3D-HEVC increases significantly as these high complexity coding techniques are used. In this paper, a new fast CU size decision algorithm based on texture depth is proposed to reduce the depth map encoding complexity, because there is strong correlation between texture and depth map, including motion characteristic and background region. Both kinds of map tend to choose the same CU depth as their best depth level. By building a one-to-one match for collocated largest coding unit (LCU), the information of texture encoding can be used to predict the depth level of the depth map. Experimental results have shown that the proposed method can achieve 41.89% time saving on average, with the negligible drop of 0.04 dB on BDPSNR and a small increase of 2.29% on BDBR. Keywords: 3D-HEVC; early termination; CU split; PU mode; depth map. Special Issue on: High-Performance Information Technologies for Engineering Applications Parallel data processing approaches for effective intensive care units with the internet of things   by N. Manikandan, S Subha Abstract: Computerisation in health care is more general and monitoring Intensive Care Units(ICU) is more significant and life-critical. Accurate monitoring in an ICU is essential. Failing to take right decisions at the right time may prove fatal. Similarly, a timely decision can save people's lives in various critical situations. In order to increase the accuracy and timeliness in ICU monitoring, two major technologies can be used, namely parallel processing through vectorisation of ICU data and data communication through the Internet of Things (IoT). With our approach, we can improve efficiency and accuracy in data processing. This paper proposes a parallel decision tree algorithm in ICU data to take faster and accurate decisions on data selection. Uses of parallelised algorithms optimise the process of collecting large sets of patient information. A decision tree algorithm is used for examining and extracting knowledge-based data from large databases. Finalised information will be transferred to concerned medical experts in cases of medical emergency using the IOT. Parallel implementation of the decision tree algorithm is implemented with threads, and output data is stored in local IOT tables for further processing. Keywords: medical data processing; internet of things; ICU data; vectorisation; multicore architecture; parallel data processing. Study of runtime performance for Java-multithread PSO on multiCore machines   by Imed Bennour, Monia Ettouil, Rim Zarrouk, Abderrazak Jemai Abstract: Optimisation meta-heuristics, such as Particle Swarm Optimization (PSO), require high-performance computing (HPC). The use of software parallelism and hardware parallelism is mandatory to achieve HPC. Thread-level parallelism is a common software solution for programming on multicore systems. The Java language, which includes important aspects such as its portability and architecture neutrality, its multithreading facilities and its distributed nature, makes it an interesting language for parallel PSO. However, many factors may impact the runtime performance: the coding styles, the threads-synchronisation levels, the harmony between the software parallelism injected into the code and the available hardware parallelism, the Java networking APIs, etc. This paper analyses the Java runtime performance on handling multithread PSO over general purpose multicore machines and networked machines. Synchronous, asynchronous, single-swarm and multi-swarm PSO variants are considered. Keywords: high-performance computing , particle swarm optimisation,multicore, multithread, performance, simulation. Execution of scientific workflows on IaaS cloud by PBRR algorithm   by S.A. Sundararaman, T. SubbuLakshmi Abstract: Job scheduling of scientific workflow applications in IaaS cloud is a challenging task. Optimal resource mapping of jobs to virtual machines is calculated considering schedule constraints such as timeline and cost. Determining the required number of virtual machines to execute the jobs is key in finding the optimal schedule makespan with minimal cost. In this paper, VMPROV algorithm has been proposed to find the required virtual machines. Priority-based round robin (PBRR) algorithm is proposed for finding the job to resource mapping with minimal makespan and cost. Execution of four real-world scientific application jobs by PBRR algorithm are compared with MINMIN, MAXMIN, MCT, and round robin algorithms execution times. The results show that the proposed algorithm PBRR can predict the mapping of tasks to virtual machines in better way compared to the other classic algorithms. Keywords: cloud job scheduling; virtual machine provisioning; IaaS. Development and evaluation of the cloudlet technology within the Raspberry Pi   by Nawel Kortas, Anis Ben Arbia Abstract: Nowadays, communication devices, such as laptops, computers, smartphones and personal media players, have extensively increased in popularity thanks to the rich set of cloud services that they allow users to access. This paper focuses on setting solutions of network latency for communication devices by the use of cloudlets. This work also proposes a conception of a local datacentre that allows users to connect to their data from any point and through any device by the use of the Raspberry. We also display the performance demonstration results of the resource utilisation rate, the average execution time, the latency, the throughput and the lost packets that provide the big advantage of cloudless application from local and distant connections. Furthermore, we display an evaluation of cloudless by comparing it with similar services and by setting simulation results through the CloudSim simulator. Keywords: cloudlets; cloud computing; cloudless; Raspberry Pi; datacentre; device communication; file-sharing services.DOI: 10.1504/IJCSE.2016.10008320  Special Issue on: Technologies and Applications in the Big Data Era Research on implementation of digital forensics in cloud computing environment   by Hai-Yan Chen Abstract: Cloud computing is a promising next-generation computing paradigm which integrates multiple existing and new technologies. With the maturing and wide application of cloud computing technology, there are more and more crimes occuring in the environment of cloud computing, so the effective investigations of evidence against these crimes are extremely important and of urgent need. Because of the characteristics of the virtual computing environment (mass storage and distribution of data, and multi-tenant), cloud computing sets an extremely hard condition for the investigation of evidence. For this purpose, in this paper, we propose a digital forensics reference model in the cloud environment. First, we divide cloud forensics into four steps and the implementation scheme is given respectively. Secondly, a cloud platform trusted evidence collection mechanism based on trusted evidence collection agent is put forward. Finally, methods of using various data mining algorithms in the evidences analysed are introduced. The experiment and simulation on real data show the accuracy and effectiveness of the proposed method. Keywords: cloud computing; digital forensics; cloud environment; digital evidence. Special Issue on: Advanced Information Processing in Communication Hybrid genetic, variable neighbourhood search and particle swarm optimisation based job scheduling for cloud computing   by Rachhpal Singh Abstract: In a Cloud Computing Environment (CCE), many scheduling mechanisms have been proposed to balance the load between the given set of distributed servers. Genetic Algorithm (GA) has been verified to be the best technique to reduce the energy consumed by distributed servers, but it becomes unsuccessful to strengthen the exploration in the rising areas. The performance of Particle Swarm Optimisation (PSO) depends on initially selected random particles, i.e. wrongly selected particles may produce poor results. The Variable Neighbourhood Search (VNS) can be used to set the stability of non-local searching and local utilisation for an evolutionary processing period. Therefore, this paper proposes a hybrid VNS, GA and PSO, called HGVP, in order to overcome the constraint of a poorly selected initial amount of particles in the case of PSO-based scheduling for CCE. The simulation results of the proposed technique have shown effective results over the available techniques, especially in terms of energy consumption Keywords: cloud computing environment; job scheduling; particle swarm optimisation; genetic algorithm; variable neighbourhood search. Secured image compression using AES in bandelet domain   by S.P. Raja, A. Suruliandi Abstract: Compression and encryption are jointly used in network systems to improve efficiency and security. A secure and reliable means for communicating images and video is, consequently, indispensable for networks. In this paper, a new methodology is proposed for secure image compression. Initially, a bandelet transform is applied to the input image to obtain coefficients and kernel matching pursuits (KMP) used to choose key bandelet coefficients. The coefficients obtained from the KMP are encrypted using the advanced encryption standard (AES) and encoded using the listless set partitioning embedded block (listless SPECK) image compression encoding technique. For performance evaluation, the peak signal to noise ratio (PSNR), mean square error (MSE), structural similarity index (SSIM) and image quality index (IQI) are taken. From the experimental results and performance evaluation, it is shown that the proposed approach produces high PSNR values and compresses images securely. Keywords: bandelet transform; KMP; AES; listless SPECK. A semantic layer to improve collaborative filtering systems   by Sahraoui Kharroubi, Youcef Dahmani, Omar Nouali Abstract: According to IBM statistics, the internet generates 2.5 trillion items of heterogeneous data on a daily basis. Known as big data, this degrades the performance of search engines and reduces their ability to satisfy requests. Filtering systems such as Netflix, eBay, iTunes and others are widely used on the web to select and distribute interesting resources to users. Most of these systems recommend only one kind of resource, which limits the ambitions of their users. In this paper, we propose a hybrid recommendation system that includes a variety of resources (books, films, music, etc.). A similarity process was applied to group users and resources on the basis of appropriate metadata. We have also used a graph data model known as a Resource Description Framework (RDF) to represent the different modules of the system. RDF syntax allows for perfect integration and data exchange via the SPARQL query language. Real data sets are used to perform the experiments, showing promising results in terms of performance and accuracy. Keywords: big data, namespace, rating, relevant item, RDF vocabulary, sparsity, user’s relationship QoS-aware web service selection based on self-organising migrating algorithm and fuzzy dominance.   by Amal Halfaoui, Fethallah Hadjila, Fedoua Didi Abstract: Web service composition consists of creating a new complex web service by combining existing ones. The selection of composite services is a very complex and challenging task, especially with the increasing number of services offering the same functionality. The web service selection can be considered as a combinatorial problem that focuses on delivering the optimal composition that satisfies the user's requirements (functional and non functional needs). Several optimisation algorithms have been proposed in the literature to tackle the web service selection. In this work, we propose an approach that adapts a recent stochastic optimisation algorithm called Self Organising Migrating Algorithm (SOMA) for QoS web service selection. Furthermore, we propose a fuzzification of the Pareto dominance and use it to improve SOMA by comparing the services within the local search. The proposed approach is applicable to any combinatorial workflow with parallel, choice and loop pattern. We test our algorithm with a set of synthetic datasets and compare it with the most recently used algorithm (PSO). The comparative study shows that SOMA produces promising results and therefore it is able to select the user's composition in an efficient manner. Keywords: web service selection; SOMA; fuzzy dominance; swarm-based optimisation algorithms. Fault detection and behavioural prediction of a constrained complex system using cellular automata   by Priya Radha, Elizabeth Sherly Abstract: Functionality-based failure analysis and validation during the design process in a constrained complex system is challenging. In this paper, we advocate a model to validate the functionality of a constrained complex control system with its structural behaviour. An object-constrained model is proposed for validation of any component of a complex system with constraints, and its state of safeness is predicted using cellular automata. The model consists of two sub-systems: an inference engine that functions based on a rule-based expert system, and a failure forecast engine based on cellular automata. The system is tested against a thermal power plant for early detection of failure in the system, which enhances the process efficiency of power generation. Keywords: complex system, constrained objects, cellular automata, control system, prediction engine, failure forecast engine. Distributed diagnosis based on distributed probability propagation nets   by Yasser Moussa Berghout, Hammadi Bennoui Abstract: This paper addresses the problem of modelling uncertainty in the distributed context. It is situated in the field of diagnosis; more precisely, model-based diagnosis of distributed systems. A special focus is given to modelling uncertainty and probabilistic reasoning. Thus, this work is based on a probabilistic modelling formalism called: "probability propagation nets" (PPNs), which are designed for centralised systems. Hence, an extension of this model is proposed to suit the distributed context. Distributed probability propagation nets (DPPNs), the proposed extension, were conceived to consider the distributed systems' particularities. So, the set we consider is a set of interacting subsystems, each of which is modelled by a DPPN. The interaction among the subsystems is modelled through the firing of common transitions belonging to more than one subsystem. All of that is logically supported by means of probabilistic Horn abductions (PHAs). Furthermore, the diagnostic process is done by exploiting transition-invariants; a diagnostic technique developed for Petri nets. The proposed extension is illustrated through a real life example. Keywords: model-based diagnosis; distributed systems; probabilistic reasoning; probability propagation nets; probabilistic Horn abduction; Petri nets. Novel automatic seed selection approach for mass detection in mammograms   by Ahlem Melouah, Soumai Layachi Abstract: The success of mass detection using seeded region growing segmentation depends on seed point selection operation. The seed point is the first point from which the process of aggregation starts. This point must be inside the mass otherwise the segmentation fails. There are two principal ways to perform the seed point selection. The first one is manual, performed by a medical expert who manually outlines the point of interest using a pointer device. The second one is automatic; in this case the whole process is performed without any user interaction. This paper proposes a novel approach to select automatically the seed point for further region growing expansion. Firstly, suspicious regions are extracted by a thresholding technique. Secondly, the suspicious region whose features match with the predefined masses features is identified as the region of interest. Finally, the seed point is placed inside the region of interest. The proposed method is tested using the IRMA database and the MIAS database. The experimental results show the performance and robustness of the proposed method. Keywords: breast cancer; masses detection; mammograms; segmentation; seeded region growing; automatic seed selection; region of interest; features; thresholding. Combining topic-based model and text categorisation approach for utterance understanding in human-machine dialogue   by Mohamed Lichouri, Rachida Djeradi, Amar Djeradi Abstract: In the present paper, we suggest an implementation of an automatic understanding system of the statement in human-machine communication. The architecture we adopt was based on a stochastic approach that assumes that the understanding of a statement is nothing but a simple theme identification process. Therefore, we present a new theme identification method based on a documentary retrieval technique which is text (document) classification [1]. The method we suggest was validated on a basic platform that gives information related to university schooling management (querying a student database), taking into consideration a textual input in French. This method has achieved a theme identification rate of 95% and a correct utterance understanding rate of about 91.66%. Keywords: communication; human-machine dialogue; understanding; utterance; thematic; text classification; topic model. A Manhattan distance based binary bat algorithm vs integer ant colony optimisation for intrusion detection in audit trails.   by Wassila Guendouzi, Abdelmadjid Boukra Abstract: An intrusion detection system (IDS) is the process of monitoring and analysing security activities occurring in a computer or network systems. The detection method is the brain of IDS and it can perform either anomaly-based or misuse-based detection. The misuse mechanism aims to detect predefined attack scenarios in the audit trails, whereas the anomaly detection mechanism aims to detect deviations from normal user behaviour. In this paper, we deal with misuse detection. We propose two approaches to solve the NP-hard security audit trail analysis problem. Both rely on the Manhattan distance measure to improve the intrusion detection quality. The first proposed method, named Enhanced Binary Bat Algorithm (EBBA), is an improvement of Bat Algorithm (BA) that uses a binary coding and the fitness function defined as the global attacks risks. This fitness function is used in conjunction with the Manhattan distance. In this approach, new operators are adapted to the problem of our interest which are solution transformation, vertical permutation and horizontal permutation operators. The second proposed approach, named Enhanced Integer Ant Colony Optimisation (EIACS), is a combination of two metaheuristics: Ant Colony System (ACS), which uses a new pheromone update method, and Simulated Annealing (SA), which uses a new neighborhood generation mechanism. This approach uses an integer coding and a new fitness function based on the Manhattan distance measure. Experiments on different problem sizes (small, medium and large) are carried out to evaluate the effectiveness of the two approaches. The results indicate that for small and medium sizes the two algorithms have similar performance in term of detection quality. For large problem size the performance of EIACS is more significant than EBBA. Keywords: intrusion detection; security audit trail analysis; combinatorial optimisation problem; NP-hard; Manhattan distance; bat algorithm; ant colony system; simulated annealing. An approach for managing the dynamic reconfiguration of software architectures   by Abdelfetah Saadi, Mourad Chabane Oussalah, Abderrazak Henni Abstract: Currently, most software systems have a dynamic nature and need to evolve at runtime. The dynamic reconfiguration of software systems is a mechanism that must be dealt with to enable the creation and destruction of component instances and their links. To reconfigure a software system, it must be stopped, patched and restarted; this causes unavailability periods which are always a problem for highly available systems. In order to address these problems, this paper presents an approach called software architecture reconfiguration approach (SAREA). We define for this approach a set of intelligent agents, each of them has a precise role in the functioning and the control of software. Our approach implements a restoring mechanism of software architecture to a fully functional state after the failure of one or more reconfiguration operations; it also proposes a reconfiguration mechanism which describes the execution process of reconfigurations. Keywords: software architecture; dynamic reconfiguration; evolution; intelligent agents; component model; model driven architecture; MDA; meta-model. Special Issue on: New Techniques for Secure Internet and Cloud Computation Self and social network behaviours of users in cultural spaces   by Angelo Chianese, Salvatore Cuomo, Pasquale De Michele, Francesco Piccialli Abstract: Many cultural spaces offer their visitors the use of ICT tools to enhance their visit experience. Data collected within such spaces can be analysed in order to discover hidden information related to visitors behaviours and needs. In this paper, a computational model inspired by neuroscience simulating the personalised interactions of users with cultural heritage objects is presented. We compare a strengthened validation approach for neural networks based on classification techniques with a novel proposal one, based on clustering strategies. Such approaches allow us to identify natural users groups in data and to verify the model responses in terms of user interests. Finally, the presented model has been extended to simulate social behaviours in a community, through the sharing of interests and opinions related to cultural heritage assets. This data propagation has been further analysed in order to reproduce applicative scenarios on social networks. Keywords: social network; clustering techniques; cultural heritage, internet of things, user behaviours A perspective on applications of in-memory and associative approaches supporting cultural big data analytics   by Francesco Piccialli, Angelo Chianese Abstract: Business intelligence, advanced analytics, big data, in-memory database and associative technologies are actually the key enablers for enhanced business decision-making. In this paper, we provide a perspective on applications of in-memory approaches supporting analytics in the field of Cultural Heritage (CH), applied to information resources including structured and unstructured contents, geo-spatial and social network data, multimedia, multiple domain vocabularies, classifiers and ontologies. The proposed approach is implemented in an information system exploiting associative in-memory technologies in a cloud context, as well as integrating semantic technologies for merging and analysing information coming from heterogeneous sources. We analyse and describe the application of this system to trace a behavioral and interest profile of users and visitors for cultural events (exhibitions, museums, etc.) and territorial (touristic areas and routes including cultural resources, historical downtown, archaeological sites). The results of ongoing experimentation encourage a business intelligence approach that is suitable for supporting CH asset crowdsourcing, promotion, publication, management and usage. Keywords: in-memory database systems, big data , social analytics , business intelligence , cultural heritage , internet of things. Data security and privacy information challenges in cloud computing   by Weiwei Kong, Yang Lei, Jing Ma Abstract: Cloud computing has become a hotspot in the area of information technology. However, when indulging into its convenience and strong ability of the data processing, we also find that the great challenges also appear in terms of data security and privacy information protection. In this paper, summary of the current security and privacy information challenges is presented. The current security measures are summarized as well. Keywords: cloud computing; data security; privacy information; cloud computing provider TERS: a traffic-efficient repair scheme for repairing multiple losses in erasure-coded distributed storage systems   by Zheng Liming Abstract: Erasure coding has received considerable attention owing to the better tradeoff between the space efficiency and reliability. However, the high repair traffic and the long repair time of erasure coding have posed a new challenge: how to minimise the amount of data transferred among nodes and reduce the repair time when repairing the lost data. Existing schemes are mostly designed for single node failures, which incur high network traffic and result in low efficiency. In this paper, we propose a traffic-efficient repair scheme (TERS) suitable for repairing data losses when multiple nodes fail. TERS reduces the repair traffic by using the overlap of data accessing and computation between node repairs. To reduce the repair time, TERS uses multiple threads during the computation, and pipelines the data transmission during the repair. To evaluate the repair cost and the repair time, we provide an implementation of integrating TERS into HDFS-RAID. The numerical results confirm that TERS reduces the repair traffic by 44% on average compared with the traditional erasure codes and regenerating codes. Theoretical analysis shows that TERS effectively reduces the repair time. Moreover, the experimental results show that compared with current typical repair methods, such as TEC, MSR and TSR, the repair time of TERS is reduced by 25%, 20% and 16%, respectively. Keywords: distributed storage; erasure coding; repair traffic; repair time; multiple losses. A sound abstract memory model for static analysis of C programs   by Yukun Dong Abstract: Abstract memory model plays an important role in static analysis of program. This paper proposes a region-based symbolic three-valued logic (RSTVL) to guarantee the soundness of static analysis, which uses abstract regions to simulate blocks of the concrete memory. RSTVL applies symbolic expressions to express the value of memory objects, and the interval domain to describe the value of each symbol of symbolic expressions. Various operations for memory objects can be mapped to operations about regions. RSTVL can describe the shape information of data structure in memory and storage state of memory object, and a variety of associative addressable expressions, including the point-to relations, hierarchical and valued logic relations. We have built a prototype tool DTSC_RSTVL that detects code level defects in C programs. Five popular C programs are analysed, the results indicate that the analysis is sufficiently sound to detect code level defects with zero false negative rate. Keywords: software quality; static analysis; abstract memory model; memory object; defect detection. Load balancing algorithm based on multiple linear regression analysis in multi-agent systems   by Xiao-hui Zeng Abstract: With the increase of agents involved in applications of multi-agent systems (MAS), the problem of load balancing is more and more prominent. This paper proposes a novel load balancing algorithm based on multiple linear regression analysis (LBAMLR). By using parallel computing on all servers and using partial information about agents communication, our algorithm can effectively choose the optimal agents' set and the suitable destination servers. The simulation results show our proposed algorithm can shorten the computing time and increase the total performance in MAS. Keywords: distributed computing; multi-agent systems; load balancing; multiple linear regression analysis.DOI: 10.1504/IJCSE.2016.10008246  Special Issue on: Computational Imaging and Multimedia Processing Underwater image segmentation based on fast level set method   by Yujie Li, Huiliang Xu, Yun Li, Huimin Lu, Seiichi Serikawa Abstract: Image segmentation is a fundamental process in image processing that has found application in many fields, such as neural image analysis, underwater image analysis. In this paper, we propose a novel fast level set method (FLSM)-based underwater image segmentation method to improve the traditional level set methods by avoiding the calculation of signed distance function (SDF). The proposed method can speed up the computational complexity without re-initialisation. We also provide a fast semi-implicit additive operator splitting (AOS) algorithm to improve the computational complex. The experiments show that the proposed FLSM performs well in selecting local or global segmentation regions. Keywords: underwater imaging; level set; image segmentation Pseudo Zernike moments based approach for text detection and localisation from lecture videos   by Soundes Belkacem, Larbi Guezouli, Samir Zidat Abstract: Text information embedded in videos is an important clue for retrieval and indexation of images and videos. Scene text presents challenging characteristics mainly related to acquisition circumstances and environmental changes, resulting low quality videos. In this paper, we present a scene text detection algorithm based on Pseudo Zernike Moments (PZMs) and stroke features from low resolution lecture videos. The algorithm mainly consists of three steps: slide detection, text detection and segmentation and non-text filtering. In lecture videos, the slide region is a key object carrying almost all the important information; hence the slide region has to be extracted and segmented from other scene objects considered as background for later treatments. Slide region detection and segmentation is done by applying PZMs based on RGB frames. Text detection and extraction is performed using PZM segmentation over V channel of HSV colour space, and then stroke feature is used to filter out non-text regions and remove false positives. PZMs are powerful shape descriptors; they present several strong advantages such as robustness to noise, rotation invariants, and multilevel feature representation. The PZMs based segmentation process consists of two steps: feature extraction and clustering. First, a video frame is partitioned into equal size windows, then the coordinates of each window are normalised to a polar system, then PZMs are computed over the normalised coordinates as region descriptors. Finally, a clustering step using K-means is performed in which each window is labelled for text/non-text region. The algorithm is shown to be robust to illumination, low resolution and uneven luminance from compressed videos. The effectiveness of the PZM description leads to very few false positives compared with other approaches. Moreover, resultant images can be used directly by OCR engines and no more processing is needed. Keywords: text localisation, text detection, pseudo Zernike moments, slide region detection. Special Issue on: Advanced Cooperative Computing Towards optimisation of replicated erasure codes for efficient cooperative repair in cloud storage systems   by Guangping Xu, Qunfang Mao, Huan Li Abstract: The study of erasure codes in distributed storage systems has two aspects: one is to reduce the data redundancy and the other one is to save the bandwidth cost during repair process. Repair-efficient codes are investigated to improve the repair performance. However, the researches are mostly at the theoretical stage and hardly applied in the practical distributed storage systems such as cloud storage. In this paper, we present a unified framework to describe some repair-efficient regenerating codes in order to reduce the bandwidth cost in regenerating the lost data. We build an evaluation system to measure the performance of these codes during file encoding, file decoding and individual failure repairing with given feasible parameters. By the experimental comparison and analysis, we validate that the repair-efficient regenerating codes can significantly save much more repair time than traditional erasure codes during the repair process at the same storage cost; in particular, some replication-based erasure codes can perform better than others in some cases. Our experiments can help researchers to decide which kind of erasure codes to use in building distributed storage systems. Keywords: erasure codes; distributed storage systems; data recovery; repair-efficient codes A new model of vehicular ad hoc networks based on artificial immune theory   by Yizhe Zhou, Depin Peng Abstract: Vehicular ad hoc networks (VANETs) are highly mobile and wireless networks intended to aid vehicular safety and traffic monitoring. To achieve these goals, we propose a VANET model based on immune network theory. Our model outperforms the Delay Tolerant Mobility Sensor Network (DTMSN) model over a range of node numbers in terms of data packet arrival delay, arrival ratio, and throughput. These findings held true for the on-demand distance vector and connection-based restricted forwarding routing protocols. The model performed satisfactorily on a real road network. Keywords: networking model; vehicular ad hoc networks; artificial immune theory; real-time capacity. Feature binding pulse-coupled neural network model using a double color space   by Hongxia Deng, Han Li, Sha Chang, Jie Xu, Haifang Li Abstract: The feature binding problem is one of the central issues in cognitive science and neuroscience. To implement a bundled identification of colour and shape of one colour image, a double-space vector feature binding PCNN (DVFB-PCNN) model was proposed based on the traditional pulse-coupled neural network (PCNN). In this model, the method of combining RGB colour space with HSI colour space successfully solved the problem that all colours can not always be separated completely. Through the first pulse emission time of the neurons, the different characteristics were separated successfully. Through the colour sequence produced by this process, the different characteristics belonging to the same perceived object were bound together. Experiments showed that the model can successfully achieve separation and binding of image features and will be a valuable tool for PCNN in the feature binding of colour images. Keywords: feature binding; double-space; pulse emission time. Signal prediction based on boosting and decision stump   by Lei Shi Abstract: Signal prediction has attracted more and more attention from data mining and machine learning communities. Decision stump is a one-level decision tree, and it classifies instances by sorting them based on feature values. The boosting is a kind of powerful ensemble method and can improve the performance of prediction significantly. In this paper, boosting and decision stump algorithm are combined to analyse and predict the signal data. An experimental evaluation is carried out on the public signal dataset and the experimental results show that the boosting and decision stump-based algorithm improves the performance of signal prediction significantly. Keywords: decision stump; boosting; signal prediction.DOI: 10.1504/IJCSE.2016.10006637  A matching approach to business services and software services   by Junfeng Zhao Abstract: Recent studies have shown that Service-Oriented Architecture (SOA) has the potential to revive enterprise legacy systems [1-10], making their continued service in the corporate world viable. In the process of reengineering legacy systems to SOA, some software services extracted in legacy system can be reused to implement business services in target systems. In order to achieve efficient reuse of software services, a matching approach is proposed to extract the software services related to specified business services, where service semantics and structure similarity measures are integrated to evaluate the similarity degree between business service and software services. Experiments indicate that the approach can efficiently map business services to relevant software services, and then legacy systems can be reused as much as possible. Keywords: software service; business service; matching approach; semantics similiarity measure; structure similarity measure.DOI: 10.1504/IJCSE.2016.10009239  Using online dictionary learning to improve Bayer pattern image coding   by Tingyi Zheng, Li Wang Abstract: Image quality is a fundamental concern in image compression. There is a lot of noise in the image compression process, which may impact on users not getting precise identification. It has, thus, always been neglected in image compression in past researches. In fact, noise takes a beneficial role in image reconstruction. In this paper, we choose noise as considered and recommended as a coding method for Bayer pattern image based on online dictionary learning. Investigations have depicted that the proposed method in Bayer pattern image coding might develop the rate of distortion performance of Bayer pattern image coding at any rate. Keywords: Bayer pattern image; online dictionary learning; rate distortion. Special Issue on: ICNC-FSKD'15 Machine Learning, Data Mining and Knowledge Management An improved ORNAM representation of gray images   by Yunping Zheng, Mudar Sarem Abstract: An efficient image representation can save space and facilitate the manipulation of the acquired images. In order to further enhance the reconstructed image quality and reduce the number of the homogeneous blocks of the overlapping rectangular non-symmetry and anti-packing model (ORNAM) representation, in this paper we propose an improved overlapping rectangular non-symmetry and anti-packing model representation (IORNAM) of gray images. Compared with most of the up-to-date and the state-of-the-art hierarchical representation methods, the new IORNAM representation is characterised by two properties. (1) It adopts a ratio parameter of the length and the width of a homogenous block to improve the reconstructed image quality. (2) It uses a new expansion method to anti-pack the subpatterns of gray images to further decrease the number of homogenous blocks, which is important for improving the compression ratios of image representation and reducing the complexities of many image manipulation algorithms. The experimental results presented in this paper demonstrate that (1) the new IORNAM representation is able to achieve high representation efficiency for gray images and (2) the new IORNAM representation outperforms most of the up-to-date and the state-of-the-art hierarchical representation methods of gray images. Keywords: gray image representation; extended Gouraud shading approach; overlapping rectangular NAM; ORNAM; spatial data structures; S-Tree coding; spatial- and DCT-based. Genetic or non-genetic prognostic factors for colon cancer classification   by Meng Pan, Jie Zhang Abstract: Many researches have addressed patient classification using prognostic factors or gene expression profiles (GEPs). This study tried to identify whether a prognostic factor was genetic by using GEPs. If significant GEP difference was observed between two statuses of a factor, the factor might be genetic. If the GEP difference was largely less significant than the survival difference, the survival difference might not be due to the genes; then, the factor might be non-genetic or partly non-genetic. A practice was made in this study using public dataset GSE40967, which contains GEP data of 566 colon cancer patients, messages of tumor-node-metastasis (TNM) staging, etc. Prognostic factors T, N, M, and TNM were observed being non-genetic or partly non-genetic, which should be complement for future gene expression classifiers. Keywords: gene expression profiles; prognostic factor; colon cancer; classification; survival A medical training system for the operation of heart-lung machine   by Ren Kanehira Abstract: There has been a strong tendency to use Information Communication Technology (ICT) to construct various education/training systems to help students or other learners master necessary skills more easily. Among such systems the ability to obtain operational practice is particularly welcome in addition to the conventional e-learning ones mainly for obtaining textbook-like knowledge only. In this study, we propose a medical training system for the operation of heart-lung machine. Two training contents, one for basic operation and another for troubleshooting, are constructed in the system and their effects are tested. Keywords: computer-aided training; skill science; medical training; heart-lung machine; operation supporting; e-learning; clinic engineer. Special Issue on: BDA 2014 and 2015 Conferences and DNIS 2014 and 2015 Workshops Data Modelling and Information Infrastructure in Big Data Analytics Automatic identification and classification of Palomar Transient Factory astrophysical objects in GLADE   by Weijie Zhao, Florin Rusu, John Wu, Peter Nugent Abstract: Palomar Transient Factory is a comprehensive detection system for the identification and classification of transient astrophysical objects. The central piece in the identification pipeline is represented by an automated classifier that distinguishes between real and bogus objects with high accuracy. The classifier consists of two components, namely real-time and offline. Response time is the critical characteristic of the real-time component, whereas accuracy is representative for the offline in-depth analysis. In this paper, we make two significant contributions. First, we present an experimental study that evaluates a novel implementation of the real-time classifier in GLADE, a parallel data processing system that combines the efficiency of a database with the extensibility of Map-Reduce. We show how each stage in the classifier - candidate identification, pruning, and contextual realbogus - maps optimally into GLADE tasks by taking advantage of the unique features of the system range-based data partitioning, columnar storage, multi-query execution, and in-database support for complex aggregate computation. The result is an efficient classifier implementation capable of processing a new set of acquired images in a matter of minutes, even on a low-end server. For comparison, an optimised PostgreSQL implementation of the classifier takes hours on the same machine. Second, we introduce a novel parallel similarity join algorithm for advanced transient classification. This algorithm operates offline and considers the entire candidate dataset consisting of all the objects extracted over the lifetime of the Palomar Transient Factory survey. We implement the similarity join algorithm in GLADE and execute it on a massive supercomputer with more than 3000 threads. We achieve more than three orders of magnitude improvement over the optimised PostgreSQL solution. Keywords: parallel databases; multi-query processing; scientific data analysis; similarity join; astronomical surveys; transient identification Trust and reputation based multi-agent recommender system   by Punam Bedi, Sumit Agarwal, Richa Singh Abstract: User profile modelling for the domain of tourism is different from most of the other domains, such as books or movies. The structure of a tourist product is more complex than a movie or a book. Moreover, the frequency of activities and ratings in the tourism domain is also smaller than the other domains. To address these challenges, this study proposes a Trust and Reputation based Collaborative Filtering (TRbCF) algorithm. It augments a notion of dynamic trust between users and reputation of items to an existing collaborative approach for generating relevant recommendations. A Multi-Agent Recommender System for e-Tourism (MARST) for recommending tourism services using the TRbCF algorithm is designed and a prototype is developed. TRbCF also helps to handle the new user cold-start problem. The developed system can generate recommendations for hotels, places to visit and restaurants at a single place, whereas most of the existing recommender systems focus on one service at a time. Keywords: multi-agent system, recommender system, e-tourism, trust, reputation Anomaly-free search using multi-table entity attribute value data model   by Shivani Batra, Shelly Sachdeva Abstract: This paper proposes a principled extension of Dynamic Tables (DT). It is termed as the Multi-Table Entity Attribute Value (MTEAV) model, which offers a search-efficient avenue for storing a database. The paper presents precise semantics of MTEAV and demonstrates the following aspects: (1) MTEAV possesses consistency and availability; (2) MTEAV outperforms other existing models (Entity Attribute Value Model, Dynamic Tables, Optimized Entity Attribute Value and Optimized Column Oriented Model) under various query scenarios and varying datasets size; (3) MTEAV retains the flavour of EAV in terms of handling sparseness and self-adapting schema-changing capability. To heighten the adaptability of MTEAV, a translation layer is implemented over existing SQL engine in a non-intrusive way. The translation layer transforms conventional a SQL query (as per horizontal row representation) to a new SQL query (as per MTEAV structure) to maintain user friendliness. The translation layer makes users feel as if they are interacting with the conventional horizontal row approach. The paper also critically analyses the maximum percentage of non-null density appropriate for choosing MTEAV as a storage option. Keywords: database, dynamic tables, entity attribute value model, optimised entity attribute value, optimised column-oriented model, search efficiency, storage efficiency. Secure k-objects selection for a keyword query based on MapReduce skyline algorithm   by Asif Zaman, Md. Anisuzzaman Siddique, Annisa, Yasuhiko Morimoto Abstract: Keyword query interface has become a de-facto standard in information retrieval and such systems have been used by the community for decades. The user gives a keyword, and objects that are closely related to that keyword are returned to the user. The process of selecting necessary objects for a keyword query has been considered as one of the most precious query problems. Top-k query is one of the popular methods to select important objects from a large number of candidates. A user specifies scoring functions and k, the number of objects to be retrieved. Based on the user's scoring function, k objects are then selected by the top-k query. However, the user's scoring function may not be identical, which implies that the top-k objects are valuable only for users whose scoring functions are similar. Meanwhile, the privacy of data during the selection processing is also a burning issue. In some cases, especially in multi-party computing, parties may not want to disclose any information during the processing. In this paper, we propose a k-object selection procedure that selects various k objects that are preferable for all users whose scoring functions are not identical. During the selection of k-objects, the proposed method prevents disclosures of sensitive values. The idea of skyline and top-k query along with perturbed cipher has been used to select the k objects securely. We propose such efficient secure computation by using MapReduce framework. Keywords: skyline query; top-k Query; data privacy; MapReduce ; mobile phone interface. High performance adaptive traffic control for efficient response in vehicular ad hoc networks   by Vinita Jindal, Punam Bedi Abstract: Nowadays, with the invention of CUDA, a parallel computing platform and programming model, there is a dramatic increase in computing performance by harnessing the power of the GPU. GPU computing with CUDA can be used to find efficient solutions for many real-world complex problems. One such is the traffic signal control problem, which takes care of conflicting movements at the intersections to avoid accidents and ensure smooth flow of traffic in a safe and efficient manner. Adaptive Traffic Control (ATC) algorithm is used in the literature to reduce the average queue length at the intersections. This algorithm has serial implementation on a single CPU and hence takes large computation time. In this paper, we propose a high performance ATC for proving efficient responses and hence reducing average queue length that results in a decrease in the overall waiting time at the intersections. We tested our proposed approach with varying numbers of vehicles for two real world networks. The performance of the proposed algorithm is compared with its serial counterpart. Keywords: VANETs; GPU; CUDA; adaptive control; traffic signals. Smart city workflow patterns for qualitative aggregate information retrieval from distributed public information resources   by Wanming Chu Abstract: We examine a workflow pattern system for public information from multiple resources. This system aggregates timetable information from bus companies, city information from the internet, and the public facilities map of the city, to generate geographic data. Multiple query methods are used to obtain the target information. For example, one of the search results can be set as the origin or destination of a bus route. Next, the shortest bus route with the minimum number of bus stops between the origin and destination can be found by using the bus routing function. The query results and the shortest bus route are visualised on the embedded map. The detailed search information is shown in the side-bar. This system finds city information and transportation routes. It is helpful for residents and visitors. They can use the city public transportation more efficiently for their daily life, business, and travel planning. Keywords: GIS; query interface; routing query over heterogeneous information resources. Computational intelligence methods for data mining of causality extent in time series   by Lukas Pichl, Taisei Kaizoji Abstract: Data mining of causality extent in the time series of economic data is an important area of computational intelligence research with direct applications to algorithmic trading or risk diversification strategies. Based on the particular market and the time scale employed, the causal rates are expected to vary widely. In this work we adopt the Support Vector Machine (SVM) and Artificial Neural Network (ANN) for causality rate extraction. The dataset records all details of the futures contracts on the commodity of gasoline traded in Japan. By sampling the tick data at 1 min, 5 min, 10 min, 30 min, 1 hour and 1 day scales, we derive time series of varying causal degree. Trend predictions are computed by using the SVM binary classifier trained on 66.6% of the data using a five-step-back moving window, which samples the log returns as the predictor data. From the testing data we extract varying rates of causality degree, starting from the borderline of 50% up to the order of 60% in rare cases. The trend prediction analysis is complemented by the ANN method with four hidden layers. We find that whereas the SVM outperforms the ANN in most cases, the opposite may also be true on occasions. In general, whereas considerable causality rates are observed at some high-frequency sampled data segments, returns at the longer time scales are predictable to a lesser extent. Overall, the market of the gasoline futures in Japan is found to be rather close to the efficient market hypothesis in comparison with other commodities markets. Keywords: financial futures; artificial neural network; support vector machine; trend prediction; causality extraction. A dataflow platform for applications based on linked data   by Miguel Ceriani, Paolo Bottoni Abstract: Modern software applications increasingly benefit from accessing the multifarious and heterogeneous Web of Data, thanks to the use of web APIs and linked data principles. In previous work, the authors proposed a platform to develop applications consuming linked data in a declarative and modular way. This paper describes in detail the functional language the platform gives access to, which is based on SPARQL (the standard query language for linked data) and on the dataflow paradigm. The language features interactive and meta-programming capabilities so that complex modules/applications can be developed. By adopting a declarative style, it favours the development of modules that can be reused in various specific execution contexts. Keywords: linked data; Semantic Web; SPARQL; RDF; dataflow; declarative programming. Special Issue on: ICICS 2016 Next Generation Information and Communication Systems Is a picture worth a thousand words? A computational investigation of the modality effect   by Naser Al Madi, Javed Khan Abstract: The modality effect is a term that refers to differences in learning performance in relation to the mode of presentation. It is an interesting phenomenon that impacts education, online-learning, and marketing, among many other areas of life. In this study, we use Electroencephalography (EEG Alpha, Beta, and Theta) and computational modelling of comprehension to study the modality effect in text and multimedia. First, we provide a framework for evaluating learning performance, working memory, and emotions during learning. Second, we apply these tools to investigate the modality effect computationally focusing on text in contrast to multimedia. This study is based on a dataset that we have collected through a human experiment involving 16 participants. Our results are important for future learning systems that incorporate learning performance, working memory, and emotions in a continuous feedback system that measures and optimises learning during and not after learning. Keywords: modality effect; comprehension; electroencephalography; learning; education; text; multimedia; semantic networks; recall; emotions. Automated labelling and severity prediction of software bug reports   by Ahmed Otoom, Doaa Al-Shdaifat, Maen Hammad, Emad Abdallah, Ashraf Aljammal Abstract: We target two research problems that are related to bug tracking systems: bug severity prediction and automated bug labelling. Our main aim is to develop an intelligent classifier that is capable of predicting the severity and label (type) of a newly submitted bug report through a bug tracking system. For this purpose, we build two datasets that are based on 350 bug reports from the open-source community (Eclipse, Mozilla, and Gnome). These datasets are characterised by various textual features that are extracted from the summary and description of bug reports of the aforementioned projects. Based on this information, we train a variety of discriminative models that can be used for automated labelling and severity prediction of a newly submitted bug report. A boosting algorithm is also implemented for an enhanced performance. The classification performance is measured using accuracy and a set of other measures including: precision, recall, F-measure and the area under the Receiver Operating Characteristic (ROC) curve. For automated labelling, the accuracy reaches around 91% with the AdaBoost algorithm and cross-validation test. On the other hand, for severity prediction, our results show that the proposed feature set has proved successful with a classification performance accuracy of around 67% with the AdaBoost algorithm and cross-validation test. Experimental results with the variation of training set size are also presented. Overall, the results are encouraging and show the effectiveness of the proposed feature sets. Keywords: severity prediction; software bugs; machine learning; bug labeling. Special Issue on: Recent Innovations in Cloud Computing and Big Data Allocation of energy-efficient tasks in cloud using dynamic voltage frequency scaling   by Sambit Kumar Mishra, Md Akram Khan, Sampa Sahoo, Bibhudatta Sahoo, Sanjay Kumar Jena Abstract: Nowadays, the expanding computational capabilities of the cloud system rely on the minimisation of the absorbed power to make them sustainable and economically productive. Power management of cloud data centres has received great attention from industry and academia since their operational cost is expensive owing to their high energy consumption. Issues about the distribution of energy in the system, such as energy saving and energy consumption, have been found to be crucial. One of the core approaches for the conservation of energy in the cloud data centres is the task scheduling. This task allocation in a heterogeneous environment is a well known NP-hard problem owing to which researchers pay attention for proposing various heuristic techniques for the problem. In this paper, a technique is proposed based on dynamic voltage frequency scaling (DVFS) for optimising the energy consumption in the cloud environment. The basic idea is a compromise between energy consumption and configures different types of hosts or virtual machines. Here, we formally describe a model that includes various subsystems and assess the implementation of the algorithm in the heterogeneous environment. The resulting analysis is discussed after comparing the proposed method with some standard algorithms. Keywords: cloud computing; big data; dynamic voltage frequency scaling; task allocation; energy consumption; virtual machine; virtualisation. Special Issue on: ISTA'16 Intelligent Systems Technologies and Applications FS-CARS: fast and scalable context-aware news recommender system using tensor factorisation   by Anjali Gautam, Punam Bedi Abstract: Matrix factorisation is a widely adopted approach to collaborative filtering that factorises user-item rating matrix to generate recommendations. The useritem rating matrix can be extended to incorporate users context, resulting in a rating tensor that can be factorised to generate better quality context-aware recommendations. Tensor factorisation is a computationally intensive task, and computational time can be significantly reduced using a distributed and scalable framework. This paper proposes a context-aware news recommender system that classifies news items into different categories and incorporates the users context, resulting in a rating tensor that is then factorised to generate recommendations. The news items are highly dynamic and are generated in large numbers, which can further greatly increase the computational time. To stabilise the computation time of the process, the proposed system is implemented on a distributed and scalable framework of Apache Spark using MLlib library. The proposed recommender system is evaluated for performance and computational time. Keywords: context-aware RS; tensor factorisation; matrix factorisation; Apache Spark. Breast abnormality detection using combined texture and vascular features   by Sourav Pramanik, Debotosh Bhattacharjee, Mita Nasipuri Abstract: This work presents an integrated approach that combines texture and vascular features for distinguishing malignancy and benignity of breast abnormalities using thermal breast images. Typically, the asymmetric isothermal pattern between right and left breasts in the thermal breast image is an indicator of the presence of an abnormality. Therefore, we have investigated the potential of the proposed integrated feature set in asymmetry analysis. A local texture descriptor, called block variance (BV), is used here to extract the texture features. Block variance (BV) uses the variation of intensities in a local region to identify the contrast-texture in the thermal breast image. On the other hand, thermo-vascular pattern based features are identified by using a series of morphological operations. Then, these two feature sets are fused together to make a final feature vector. A five-layer, feed forward, back propagation neural network (FBNN) has been implemented here as a classifier. A dataset containing 60 benign and 40 malignant cases of DMR-IR database is used here for the evaluation of the system performance. The effectiveness of the proposed fused feature set is compared against the feature sets used by Acharya et al. (2012) and Sathish et al. (2016) in terms of classification accuracy, sensitivity, specificity, PPV, and NPV. We have also investigated the potential of the lateral view breast thermal images in conjunction with a frontal view for the diagnosis of the breast abnormalities. Experimental results have shown that the proposed method detected malignant cases with 94% accuracy, while benign cases are detected with 100% accuracy. The overall system accuracy is obtained as 97.2%, which is comparatively better than other existing methods. Keywords: thermal breast image; texture feature; vascular feature; FBNN; lateral view breast thermogram. Missing value imputation in DNA microarray gene expression data: a comparative study of an improved collaborative filtering method with decision tree based approach   by Sujay Saha, Anupam Ghosh, Saikat Bandopadhyay, Kashi Nath Dey Abstract: A DNA microarray is used to study the expression levels of thousands of genes under various conditions simultaneously. Gene expression profiles generated by the high-throughput microarray experiments are usually in the form of large matrices with high dimensions. Unfortunately, microarray experiments can generate datasets with multiple missing values, which significantly affects the performance of subsequent statistical analysis and machine learning algorithms. Several algorithms already exist to estimate those missing values. In this work, at first we have proposed a modification to the existing imputation approach named Collaborative Filtering Based on Rough-Set Theory (CFBRST) (Wang and Tseng, 2012). This proposed approach (CFBRSTFDV) uses Fuzzy Difference Vector (FDV) along with rough set based collaborative filtering to analyse historical interactions and helps to estimate the missing values. This is a suggestion-based system that works on the principle of how the suggestion of items or products arrives at an individual while using Facebook, Twitter or looking for books in Amazon. Later on, we also propose a decision tree based approach combined with a genetic algorithm (GADTreeImpute) to impute the same missing values. We have applied our proposed algorithms on three benchmark datasets, yeast gene expression dataset of Spellman et al. (1998), human tumour cell dataset (GDS2932) and human prostate cancer dataset (GDS4824). We have first compared the performances of these two proposed approaches along with some existing state-of-the art methods by using an RMSE measure. Later on the estimation is also validated by using classification process, and the performance is measured by the metrics such as the percentage of classification accuracy, precision, recall, etc. Experiments show that the proposed approaches outperform those existing methods, particularly when we increase the number of missing values. Keywords: missing value estimation; DNA microarray; collaborative filtering; fuzzy set theory; rough set theory; decision tree; genetic algorithm. Influence maximisation in social networks   by P.V. Bindu, V. Tejaswi, P. Santhi Thilagam Abstract: Social networks have become a strong means of communication in the past decade owing to the large number of mobile users and easily accessible internet connectivity. Social network analysis deals with studying the structure, relationship and other attributes of the network that help to provide solutions to real world problems. Some of the significant research areas under social network analysis include recommendation systems, link prediction, community detection, and influence maximisation. Influence maximisation helps in finding a few influential entities from large social networks that can be used in marketing, election campaigns, outbreak detection, and so on. Influence maximisation deals with the problem of finding a subset of nodes, called seeds, in the given social network such that it will eventually spread maximum influence in the network. This is an NP hard problem. The aim of this paper is to provide a complete understanding of the influence maximisation problem. This paper focuses on providing a complete survey on the influence maximisation problem and covers three major aspects: i) different types of input required; ii) influence propagation models that map the spread of influence in the network; and iii) the approximation algorithms suggested for seed set selection. We also provide the state of the art and describe the open problems in this domain. Keywords: social networks; influence maximisation; information diffusion; approximation algorithms. Special Issue on: Novel Strategies for Programming Accelerators Evaluating attainable memory bandwidth of parallel programming models via BabelStream   by Tom Deakin, James Price, Matt Martineau, Simon McIntosh-Smith Abstract: Many scientific codes consist of memory bandwidth bound kernels - the dominating factor of the runtime is the speed at which data can be loaded from memory into the arithmetic logic units, before results are written back to memory. One major advantage of many-core devices such as General Purpose Graphics Processing Units (GPGPUs), and the Intel Xeon Phi is their focus on providing increased memory bandwidth over traditional CPU architectures. However, as with CPUs, this peak memory bandwidth is usually unachievable in practice and so benchmarks are required to measure a practical upper bound on expected performance. We augment the standard set of STREAM kernels with a dot product kernel to investigate the performance of simple reduction operations on large arrays. Such kernels are usually present in scientific codes and are still memory-bandwidth bound. The choice of one programming model over another should ideally not limit the performance that can be achieved on a device. BabelStream (formally GPU-STREAM) has been updated to incorporate a wide variety of the latest parallel programming models, all implementing the same parallel scheme. As such this tool can be used as a kind of 'Rosetta Stone' that provides both a cross-platform and cross-programming model array of results of achievable memory bandwidth. Keywords: performance portability; many-core; parallel programming models; memory bandwidth benchmark. Array streaming for array programming   by Mads Kristensen, James Avery Abstract: A barrier to efficient array programming, for example in Python/NumPy, is that algorithms written as pure array operations completely without loops, while most efficient on small input, can lead to explosions in memory use. The present paper presents a solution to this problem using array streaming, implemented in the automatic parallelisation high-performance framework Bohrium. This makes it possible to use array programming in Python/NumPy code directly, even when the apparent memory requirement exceeds the machine capacity, since the automatic streaming eliminates the temporary memory overhead by performing calculations in per-thread registers. Using Bohrium, we automatically fuse, JIT-compile, and execute NumPy array operations on GPGPUs without modification to the user programs. We present performance evaluations of three benchmarks, all of which show dramatic reductions in memory use from streaming, yielding corresponding improvements in speed and use of GPGPU-cores. The streaming-enabled Bohrium effortlessly runs programs on input sizes much beyond sizes that crash on pure NumPy owing to exhausting system memory. Keywords: JIT-compilation; high productivity; Python; OpenCL; OpenMP; Bohrium; Numpy; GP-GPU. Applicability of the software cost model COCOMO II to HPC projects   by Julian Miller, Sandra Wienke, Michael Schlottke-Lakemper, Matthias Meinke, Matthias S. Müller Abstract: The complexity of parallel computer architectures continuously increases with the pursuit of exaflop computing, which makes accurate development effort estimation and modelling more important than ever. While sophisticated cost models are widely used in traditional software engineering, they have rarely been investigated for the performance-oriented HPC domain. Therefore, we evaluate the fit and accuracy of the popular COCOMO II model to HPC setups. We lay out a general methodology to evaluate HPC projects with COCOMO II and analyse its cost parameters for the investigated parallelisation projects with OpenACC on NVIDIA GPUs. Further, we evaluate the accuracy of the model in comparison with the reported efforts of the projects, and investigate the impact of inaccuracies in the cost parameter ratings by means of a global sensitivity analysis. Keywords: COCOMO; OpenACC; GPU; development effort; effort estimation; sensitivity analysis. Porting the MPI-parallelized LES model PALM to multi-GPU systems and many integrated core processors: an experience report   by Helge Knoop, Tobias Gronemeier, Matthias Sühring, Peter Steinbach, Matthias Noack, Florian Wende, Thomas Steinke, Christoph Knigge, Siegfried Raasch, Klaus Ketelsen Abstract: The computational power and availability of graphics processing units (GPUs), such as the Nvidia Tesla, and Many Integrated Core (MIC) processors, such as the Intel Xeon Phi, on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to take advantage of such hardware. This paper is a report on our experience of porting the MPI+OpenMP parallelised large-eddy simulation model (PALM) to multi-GPU as well as to MIC processor environments using the directive-based high level programming paradigm OpenACC and OpenMP, respectively. PALM is a Fortran-based computational fluid dynamics software package, used for the simulation of atmospheric and oceanic boundary layers to answer questions linked to fundamental atmospheric turbulence research, urban modelling, aircraft safety and cloud physics. Development of PALM started in 1997, the project currently entails 140 kLOC and is used on HPC farms of up to 43,200 cores. The main challenges we faced during the porting process are the size and complexity of the PALM code base, its inconsistent modularisation and the complete lack of a unit-test suite. We report the methods used to identify performance issues as well as our experiences with state-of-the-art profiling tools. Moreover, we outline the required porting steps in order to properly execute our code on GPUs and MIC processors, describe the problems and bottlenecks that we encountered during the porting process, and present separate performance tests for both architectures. These performance tests, however, do not provide any benchmark information that compares the performance of the ported code between the two architectures. Keywords: computational fluid dynamics; graphics processing unit; many integrated core processors; Xeon Phi; high performance computing; large-eddy simulation; MPI; OpenMP; OpenACC; porting. Task-based Cholesky decomposition on Xeon Phi architectures using OpenMP   by Joseph Dorris, Asim YarKhan, Jakub Kurzak, Piotr Luszczek, Jack Dongarra Abstract: The increasing number of computational cores in modern many-core processors, as represented by the Intel Xeon Phi architectures, has created the need for an open-source, high-performance and scalable task-based dense linear algebra package that can efficiently use this type of many-core hardware. In this paper, we examine the design modifications necessary when porting PLASMA, a task-based dense linear algebra library, to run effectively on two generations of Intels Xeon Phi architecture, known as Knights Corner (KNC) and Knights Landing (KNL). First, we modified PLASMAs tiled Cholesky decomposition to use OpenMP tasks for its scheduling mechanism to enable Xeon Phi compatibility. We then compared the performance of our modified code with that of the original dynamic scheduler running on an Intel Xeon Sandy Bridge CPU. Finally, we looked at the performance of the our OpenMP tiled Cholesky decomposition on Knights Corner and Knights Landing processors. We detail the optimisations required to obtain performance on these platforms and compare with the highly tuned Intel MKL math library. Keywords: Task-based programming; tile algorithms; Xeon Phi Knights Landing; KNL; Cholesky decomposition; linear algebra; OpenMP. Special Issue on: IEEE ISPA-16 Parallel and Distributed Computing and Applications Method of key node identification in command and control networks based on level flow betweenness   by Wang Yunming, Pan Cheng-Sheng, Chen Bo, Zhang Duo-Ping Abstract: Key node identification in command and control (C2) networks is an appealing problem that has attracted increasing attention. Owing to the particular nature of C2 networks, the traditional algorithms for key node identification have problems with high complexity and unsatisfactory adaptability. A new method of key node identification based on level flow betweenness (LFB) is proposed, which is suitable for C2 networks. The proposed method first proved the definition of LFB by analysing the characteristics of a C2 network. Then, this method designs algorithms for key node identification based on LFB, and theoretically derives the complexity of this algorithm. Finally, a number of numerical simulation experiments are carried out, and the results demonstrate that this method reduces algorithm complexity, improves identification accuracy and enhances adaptability for C2 networks. Keywords: command and control network; complex network; key node identification; level flow betweenness. CODM: an outlier detection method for medical insurance claims fraud   by Yongchang Gao, Haowen Guan, Bin Gong Abstract: Data is high dimensional in medical insurance claims management, and there are both dense and sparse regions in these datasets, so traditional outlier detection methods are not suitable for these data. In this paper, we propose a novel method to detect the outliers for abnormal medical insurance claims. Our method consists of three core steps feature bagging to reduce the dimensions of data, calculating the core of the objects k-nearest neighbours, and computing the outlier score for each object by measuring the amount of movement of the core by sequentially increasing k. Experimental results demonstrate our method is promising to tackle this problem. Keywords: data mining; outlier detection; medical insurance claims fraud.DOI: 10.1504/IJCSE.2017.10008174  Special Issue on: Advanced Computer Science and Information Technology MigrateSDN: efficient approach to integrate OpenFlow networks with STP-enabled networks   by Po-Wen Chi, Ming-Hung Wang, Jing-Wei Guo, Chin-Laung Lei Abstract: Software defined networking (SDN) is a paradigm-shifting technology in networking. However, in current network infrastructures, removing existing networks to build pure SDN networks or replacing all operating network devices with SDN-enabled devices is impractical because of the time and cost involved in the process. Therefore, SDN migration, which implies the use of co-existing techniques and a gradual move to SDN, is an important issue. In this paper, we focus on how SDN networks can be integrated with legacy networks that use spanning tree protocol (STP). Our approach demonstrates three advantages. First, our approach does not require an SDN controller to apply the STP exchange on all switches but only on boundary switches. Second, our approach enables legacy networks to concurrently use multiple links that used to be blocked except one for avoiding loops. Third, our approach decreases bridge protocol data unit (BPDU) frames used in STP construction and topology change. Keywords: software defined networking; spanning tree protocol; network migration. Special Issue on: PDCAT 2016 Parallel and Distributed Algorithms and Applications Data grouping scheme for multi-request retrieval in MIMO wireless communication   by Ping He, Zheng Huo Abstract: The multi-antenna data retrieval problem refers to findng an access pattern (to retrieve multiple requests by using multiple antennae, where each request has multiple data items) such that the access latency of some requests retrieved by each antenna is minimised and the total access latency of all requests retrieved by all antennae keeps balance. So it is very important that these requests are divided into multiple groups for achieving the retrieval by using each antenna in MIMO wireless communication, called the data grouping problem. There are few studies focused on data grouping schemes applied to the data retrieval problem when the clients equipped with multi-antenna send multiple requests. Therefore, this paper proposes two data grouping algorithms (HOG and HEG) that are applied in data retrieval such that the requests can be reasonably classified into multiple groups. Through experiments, the proposed schemes have currently better efficiency compared with some existing schemes. Keywords: mobile computing; data broadcast; indexing; data scheduling; data retrieval; data grouping. Improved user-based collaborative filtering algorithm with topic model and time tag   by Liu Na, Lu Ying, Tang Xiao-jun, Li Ming-xia, Chunli Wang Abstract: Collaborative filtering algorithms make use of interactions rates between users and items for generating recommendations. Similarity among users is calculated based on rating mostly, without considering explicit properties of users involved. Considering the number of tags of users can direct response the user preference to some extent, we propose a collaborative filtering algorithm using the topic model called UITLDA in this paper. UITLDA model consists of two parts. The first part is active user with its item. The second part is active user with its tag. We form the topic model from these two parts. The two topics constrain and integrate into a new topic distribution. This model not only increases the user's similarity, but also reduces the density of the matrix. In prediction computation, we also introduce time delay function to increase the precision. The experiments showed that the proposed algorithm achieved better performance compared with baseline on MovieLens datasets. Keywords: collaborative filtering; LDA; topic model; time tag. Special Issue on: Smart X 2016 Smart Everything A New Wolf Colony Search Algorithm Based on Search Strategy for Solving Traveling Salesman Problem   by Yang Sun, Shoulin Yin, Hang Li, Lin Teng Abstract: Generally, wolf colony search algorithm is abstracted from the behaviour feature of the wolf pack, which shows wonderful skills and amazing strategies. However, there are some disadvantages in traditional wolf colony search algorithms, such as slow convergence, easily falling into local optimal value with low efficiency and accuracy. Though many intelligence algorithms are used for travelling salesman problem (TSP), the main objective of this paper is to execute a new approach to obtain significant improvements. To overcome the shortcomings of the classic wolf colony search algorithm, this paper proposes an improved wolf colony search algorithm based on search strategy. First, we introduce interaction strategy into the travel behaviour and calling behaviour to promote the communication between artificial wolves, which can improve the information acquirement for wolves and enhance the exploring ability of wolves. Second, we present adaptive siege strategy for siege behaviour, which guarantees that the new algorithm can obtain better collaborative search features. Therefore, the range of wolf siege constantly decreases and the mining ability of wolf algorithm increases with the new strategy. Finally, experiments are carried out to verify the effectiveness and performance of our new method by comparing with other swarm intelligence algorithms for some TSP problems in TSP library (TSPLIB) database. The results show that the improved wolf colony search algorithm has higher solving accuracy and faster convergence speed. Furthermore, it has more advantages with better accurate rate, computational robustness and iteration number than other wolf colony search algorithms. Keywords: wolf algorithm; search strategy; interaction strategy; adaptive siege strategy; siege behaviour; travelling salesman problem. New intelligent interface study based on K-means gaze tracking   by Jing Yu, Hang Li, Shoulin Yin Abstract: User Interface (UI) is an interaction and information exchange medium between a system and its users. It is designed for communication with each other, which can enable users to operate hardware easily and effectively to achieve bidirectional interaction. Traditional UI is difficult in satisfying requirements of users. Therefore, this paper proposes a new intelligent interface scheme based on K-means gaze tracking. First, it uniformly describes the user, interface and system in an intelligent interface interaction framework based on a visual attention selection mechanism. Second, it uses the K-means method to calculate the attention degree value on the interface on the basis of the mapping relation of user, interface and system. Third, it adopts a visual attention allocation strategy to predict the users interest degree on the interface. It conducts experiments to verify the performance of our new scheme. The results show that the accuracy of the intelligent interface predicting user decision intention is very high. This method is a kind of selectable solution for marking intelligent interface of user interest goal automatically based on K-means gaze tracking. Whats more, it can effectively improve the quality of gaze tracking. Keywords: UI; K-means method; gaze tracking; attention degree; mapping relation. The wisdom of the few: a provable approach   by Xiao-Yu Huang, Xian-Hong Xiang Abstract: In recent years, the Wisdom Of the Few (WOF) model has attracted substantial research interest. The WOF refers to the findings that in some collaborative prediction tasks, e.g., Collaborative Filtering (CF), with only the ratings from a small set of expert users, it nearly suffices to predict a much larger number of other users' unobserved ratings. In this paper, we propose a WOF algorithm for the CF problem, and prove that under some mild statistical assumptions, the algorithm can predict the users' missing ratings correctly with high probability guaranteed. We also conduct CF experiments with the proposed algorithm on real datasets, and the results show that our algorithm is competitive with the conventional CF algorithm. Keywords: collaborative filtering; crowdsourcing; expert systems; wisdom of the crowd. Robust and graph regularised non-negative matrix factorisation for heterogeneous co-transfer clustering   by Yu Ma, Zhikui Chen, Xiru Qiu, Liang Zhao Abstract: Transferring learning is proposed to tackle the problem where target instances are scarce to train an accurate model. Most existing transferring learning algorithms are designed for supervised learning and cannot obtain transferring results on multiple heterogeneous domains simultaneously. Moreover, the performance of transfer learning can be seriously degraded with the appearance of noises and corruptions. In this paper, a robust non-negative collective matrix factorisation model is proposed for heterogeneous co-transfer clustering, which introduces the error matrices to capture the sparsely distributed noises. The heterogeneous clustering tasks are handled simultaneously and the graph regularisation is enforced on the collective matrix factorisation model to keep the intrinsic geometric structure of different domains. Experiment results on a real-world dataset show the proposed algorithm outperforms the baselines. Keywords: transfer learning; non-negative matrix factorisation; error matrix; graph regularisation; clustering. A risk analysis and prediction model of electric power GIS based on deep learning   by Jianyong Xue, Kehe Wu, Yan Zhou Abstract: In the distribution and supplying of electric power, the regional-based grids and users are diverse and complicated, so it leads to the association between operation of the power systems and its geographic information much more closely. Geographic Information Systems (GIS) are becoming an indispensable part of the Power Information Management System (PIMS). Combined with the aid from equipment dynamic analysing in GIS and with the deep learning of nonlinear network structure, complex functional models are able to simulate the situation of power grid equipment more efficiently. Based on these models, we are able to predict the risk of the entire power grid and provide decision support for the grid management. We have collected multiple sets of historical grid-runtime data that come from provincial power grid systems as the input of the model, and combined them with the prior standard training data to improve the accuracy of the risk prediction model. The methods demonstrate that the model has a high prediction accuracy and full capability of achieving better results than other modern optimisation algorithms. Keywords: electric power GIS; risk analysis; deep learning; prediction model. CDLB: a cross-domain load balancing mechanism for software-defined networks in cloud data centres   by Weiyang Wang, Mianxiong Dong, Kaoru Ota, Jun Wu, Jianhua Li, Gaolei Li Abstract: Currently, cross-domain load balancing is one of the core issues for software-defined networks (SDN) in cloud data centres, which can optimise resource allocation. In this paper, we propose a cross-domain load balancing mechanism, CDLB, based on Extensive Messaging and Presence Protocol (XMPP) for SDN in cloud data centres. Different from poll method, XMPP-based push model is introduced in the proposed scheme, which can avoid wasting network and computing resources in large-scale distributed network environment. The proposed scheme enables all the controllers in the flat distributed control plane to share the same consistent global-view network information in real time through XMPP and XMPP publish/subscribe extension. Thus, the problem of non-real time information synchronisation can be resolved, and cross-domain load balancing can be realised. The simulations show the efficiency of the proposed scheme. Keywords: cloud data centre; XMPP; push model. Logistic regression for imbalanced learning based on clustering   by Huaping Guo, Tao Wei Abstract: Class imbalance is very common in the real world. For the imbalanced class distribution, traditional state-of-the-art classifiers do not work well on imbalanced datasets. In this paper, we apply the well known statistical model logistic regression to the imbalanced learning problem and, in order to improve its performance, we use cluster algorithms as the data pre-processing approach to partition majority class data to clusters. Then the logistic regression is learned on the corresponding rebalanced datasets. Experimental results show that, compared with other state-of-the art methods, the proposed one shows significantly better performance on measures of recall, g-mean, f-measure, AUC and accuracy. Keywords: class imbalance; logistic regression; clustering. An adaptive feature combination method based on ranking order for 3D model retrieval   by Qiang Chen, Bin Fang, Yinong Chen, Yan Tang Abstract: Directly combining several complementary features may increase the retrieval precision for 3D models. However, in most cases, we need to set the weights manually and empirically. In this paper, we propose a new schema for automatically choosing the proper weights for different features on each database. The proposed schema uses the ranking order of the retrieval results, and it is invariant to the magnitude scaling. We choose the best feature as the standard one, and the relevance values between the standard and other features are the weights for feature combination. Furthermore, we propose an improved re-ranking algorithm for further improving the retrieval performance. Experiment shows the proposed method can automatically choose the proper weights for different features, and the experiment results on the existing features exceed the benchmark. Keywords: 3D retrieval; re-ranking; ranking order; feature combination. Chinese entity attributes extraction based on bidirectional LSTM networks   by Zhonghe He, Zhongcheng Zhou Abstract: For the low performance of the slot filling method applied in Chinese entity-attribute extraction at present, this paper presents a distant supervision relation extraction method based on a bidirectional long short-term memory (LSTM) neural network. First, we get the Infobox of Baidu baike, using relation triples of Infobox to get the training corpus from the internet, and then we train the classifier based on bidirectional LSTM networks. Compared with classical methods, this method is fully automatic in the aspect of data annotation and feature extraction. Experiment results show that the proposed method is effective and suitable for information extraction in high dimensional space. Compared with the SVM algorithm, the accuracy rate is significantly improved. Keywords: LSTM; information extraction; deep learning; entity relation extraction.