International Journal of High Performance Systems Architecture (13 papers in press)
Big Data Analytics in the Context of Internet of Things and the Emergence of Real-Time Systems: A Systematic Literature Review
by Tahereh Saheb
Abstract: The way modern enterprises and industries are behaving has been disrupted with the advent of IoT devices generating massive amounts of real-time event and stream data at a very high pace, called Big Data. This paper is a systematic review of the papers on the field of IoT Big Data Analytics (IoT BDA) with a concentration on real-time feature of the IoT systems. This paper shows that IoT BDA have challenged the relational databases in various forms, such as in terms of their flexibility, anomaly detection, real-time response, next-generation of hardware-software installment, and interoperability of multitude systems. This paper also explores the new feature of IoT BDA: real-time analysis of events and streams and explores real-world applications of these new formats of analysis in order to explore various forms of insight generated by predictive and real-time analysis of IoT big data. This paper also reviews challenges of security, privacy and interoperability within an IoT BDA system. This paper also reviews IoT BDA platforms, new and advanced analytical methods, and new system architectures and frameworks that are designed and developed by the papers. This paper explores two main application of a mobile sensor and app in an IoT BDA system: the ECG real-time monitoring and the real time tracking of things and dangerous behaviors. This paper triggers broader discussion regarding future research agenda in the field of real time analysis of IoT Big Data both in practice and in theory.
Keywords: big data analytics; internet of things; real time analysis; streaming analysis.
Soft Skills Requirements in Mobile Applications Development Employment Market
by JIngdong Jia, Zupeng Chen, Xi Liu
Abstract: The soft skills of developers have a major influence on the quality of software product and project. However, which soft skills are important for mobile applications development remains unknown. Additionally, it is necessary to examine the differences of soft skills requirements between traditional software and mobile applications development. In this article, based on text mining including word segmentation, similarity calculation and clustering analysis, we analyse lots of advertisements, and extract 13 categories of soft skills requirements for mobile applications development. We also compare the categories with those for traditional software development. We find that communication and teamwork are still the most important two soft skills. However, fast learning is more important for mobile developers, and we identified four soft skills that are not proposed before. Additionally, season has a minor impact on soft skills requirements of mobile applications development.
Keywords: soft skill; mobile application development; job advertisement; text mining; cluster analysis.
Energy Optimized Cryptography (EOC)for Low Power Devices in Internet of Things
by RAJESH G, Vamsi Krishna C, Christopher Selvaraj B, Roshan Karthik S, Arun Kumar Sangaiah
Abstract: Internet of Things(IoT) has a plethora of devices ranging from high capacity servers to low powered devices that works with Bluetooth, ZigBee, GPRS, RFID and WiFi etc,. These the low power devices are constrained to security, power management, reliability and privacy limitations. The existing traditional security algorithms could not be applied to these low power devices, due tothe high processing and battery power requirements. Here proposed an Energy Optimized Cryptography (EOC) for low power devices in IoT. Here the security of the low power devices are providedby two light weight security techniques called R2CV, a sub key generation method and Optimized Message Authentication Code Generation Function (OMGF) tomaintain security without compromising energy and processing power consumption. The proposed security algorithms reduce the computational requirements for sub key generation and MAC generation in low power devices. The experimental results are compared with the existing security algorithms like RC5 and SHA, and is proven that R2CV and OMGF reduce the time consumed, increase battery life and in turn it extends the network life time.
Keywords: IoT Security; low-power devices; Message authentication code; Energy efficiency; Internet of Things.
Special Issue on: Data Streams Mining and Processing Methodologies, Architectures, and Applications
Evaluation of Dispersed Effect based on Social Force Based Vehicle Model and Emotional Infection Model: A Data Simulation Approach
by Ling Lu, Hongwei Zhuang, Zhiqiang Gao
Abstract: At present, police often use Tear-gas Bomb to disperse people when dealing with illegal gatherings. In order to reduce the number of Tear-gas Bombs used and ensure to achieve the goals, it is necessary to make full use of the efficiency of the stimulation gas, which can reduce not only environmental pollution, but also the unnecessary harm to people. When people are strongly stimulated, they will produce different reactions, one of which is to run away to avoid stimuli. The trajectories of avoidance motion are different when people are subjected to different forces. The data of trajectories is large and continuously so it can be considered as a data stream. The exact value of this data stream will vary depending on the different forces which people are exposed to. Therefore, this paper discussed the trajectory data flow form physiological and psychological factors of people. In this paper, the major stimulants are strong sound and smoke which come from Exploding Tear-gas Bomb. Combining the psychology of the crowd, characteristics of reaction behaviour and environment, the psychological dispersed force and physiological dispersed force are analysed about Exploding Tear-gas Bomb. The model about explosive of Exploding Tear-gas Bomb is established combined with the social force model of traffic flow. Comparing the data flow between the experimental and simulated experiments, the error between the simulated value and the measured value is less than 6%. The built-up dispersed model can basically reflect the actual dispersal effect of Exploding Tear-gas Bomb.
Keywords: Non-lethal weapons; Exploding Tear Gas; modelling; Physiological effects; Psychological effects.
Research and Analysis of Video image Target Tracking Algorithm Based on Significance
by Heshuai Shao
Abstract: A target tracking algorithm based on joint probabilistic data association encounters the problems of target loss and combinatorial explosion in cluttered and target-intensive tracking environments. In order to address these challenges, this paper proposes a video target tracking algorithm based on significance joint probabilistic data association. This algorithm detects the moving target in a video sequence through significance calculation. The detection results are classified, and the returned classes are then used as valid echoes for data association. Meanwhile, the validation matrix and joint probability in the joint probabilistic data association algorithm are redefined based on the significance information, and a validation matrix and association probability based on significance are proposed. The association results are compensated and corrected using the color association probability. Experiments in real scenes demonstrate that the proposed algorithm can suppress clutter in the background, simplify interconnected events, and solve the problem of computational combinatorial explosion.rn
Keywords: Target Tracking,Data association; significance; validation matrix; association probability; color association probability.
Type-2 Fuzzy Logic Based Multi-threaded Time Sequence Analysis
by Lu YANG, Zhi-Qiang Liu, Jian-Feng Yan
Abstract: In big data parallel processing, parallel defects, e.g., data race and deadlock, are common causes that affect reliability of programs. Uncertainty in parallel processing
characterizes parallel defects, for which fuzziness of time sequence analysis plays an important role. To improve the performance of big data processing, we propose a multithreaded time sequence analysis approach based on Type-2 Fuzzy Logic and hidden Markov model in this paper.
Firstly, we collect a sample set of training data by carrying out extensive experiments for the target multi-threaded program with given observations. Secondly, we establish a time sequence analysis model to describe the inner relationship between the observations and time sequence of the target multi-threaded program. Thirdly, using this model we estimate the probability of each state sequence in all the target defect positions, with which we estimate the probability of defects for the corresponding observation sequence. To prove the scalability in a big data environment, we also use our approach to analyze a real concurrency defect in real world large scale multi-thread programs. Our experiment results show that the average deviation using Type-2 Fuzzy Logic is less than one fourth of the average deviation using Type-1 Fuzzy Logic.
Keywords: time sequence analysis; Type-2 Fuzzy Logic; Hidden Markov Model; big data.
Analysis of Physico-chemical Variables and their Influence on Water Quality of the Bogota River using Data Mining
by Jairo Rojas, Julian Forero, Paulo Gaona, Carlos Montengro Marin, Ruben Gonzalez Crespo
Abstract: Variation of the flow rate and the concentration of different elements within a flow of water in a river must be important factors for the discovery of patterns of behavior and predictive in terms of space and time models. Based on this, this article presents data analysis carried out based on a historical compendium of measurements on the Bogota River between the years 2008-2015, given in the results of the campaigns of monitoring provided by the Regional Autonomous Corporation CAR, through the scan tool Weka and data the use of J48 algorithm for the generation of decision trees in order to establish the influence of the physical and chemical variables in the water quality of this source, within a process of identification and interpretation at the environmental level of these factors.
Keywords: Water Quality Indicator; WQI; Data Analysis;.
Research on data mining technology for the connotation and measurement of uncertainty for reassembly dimensions
by Conghu Liu, Kang He
Abstract: The uncertainty of remanufactured parts is a key factor in the stability of remanufacturing systems. Therefore, the purpose of this paper is to identify these uncertainties and measure them to improve the optimization management level of remanufacturing production process. Contrasting the ideal dimensional accuracy, manufactured dimensional accuracy and remanufactured dimensional accuracy, we analyses connotation of uncertainty for reassembly dimensions. Then, we constructs the uncertainty measurement model for reassembly dimensions to realize quantitative measurement by entropy. So the coupling mechanism of uncertainty for reassembly dimensions is studied, and the corollary is in conformity with the reality. It can use data mining technology to optimize remanufacturing process management. Finally, the feasibility and effectiveness of the model are verified in grading selection of remanufacturing enterprise parts. This research provides support for the uncertain optimization decision for lean remanufacturing from both theoretical and practical aspects by uncertain data mining techniques.
Keywords: remanufacturing; data mining; uncertainty; entropy.
Extending the Common Information Model for Smart Grids operational computations based on bus-branch models
by Mariacristina Gallo, Antonio Celotto, Massimo De Falco, Alfredo Vaccaro
Abstract: In the modern power systems, known as Smart Grids (SGs), the heterogeneity of elements makes their interoperability difficult. The goal is to integrate the different types of elements, to build a common remote control system which allows the interaction between the different parts. In this sense, the International Electrotechnical Commission (IEC) has introduced some standards (i.e., IEC 61970, IEC 61968, CIM, SCL), aimed at defining a common language for the communication among the different elements of a power system. However, without a harmonization of these standards, the development and implementation of systems and applications will result in a noticeable amount of single engineering design schemes. An ontology-based approach could collect knowledge from different applications, bridging the gap of harmonization among such models. This paper deals with the implementation of a CIM ontology based on a bus-branch model by adopting the Ontology Development Cycle (ODC) process and that aims to support the power system state estimation problem. A bus-branch model is a logical representation of the connections among the elements of the grid that can also support analysis of its data stream.
Resulting ontology has been instantiated by using a case study on a real power flow problem, and evaluated by applying some well-known metrics.
Keywords: Ontology; Common Information Model (CIM); Bus-branch model; Smart Grid; State Estimation.
Resource scheduling optimisation algorithm for containerised microservice architecture in cloud computing
by Peng Li, Jinquan Song, He Xu, Lu Dong, Yang Zhou
Abstract: Currently, the containerised microservice architecture has aroused great concern. The single application is developed as a suite of small services to facilitate the application deployment, expansion and management. The traditional scheduling of microservice tends to focus on the load balancing of cluster, ignoring the quality of service (QoS). Therefore, this paper proposes a prediction model of component relevance, by adopting the optimised artificial bee colony algorithm (OABC) on the containerised microservice scheduling. Different assessment strategies are adopted according to the differences in the correlation among components. Twopoint crossover operator is introduced to improve the exploration ability of the algorithm. The mutation operator is added to enhance the local search ability, and the mutation probability is set to the dynamic value which varies with the number of iterations to speed up the convergence of the algorithm. The experimental results show that the OABC is preferable to the artificial bee colony algorithm (ABC) and the greedy algorithm as to the cluster load balancing and service response time aspects.
Keywords: cloud computing; microservice; container; load balancing; artificial bee colony; ABC; quality of service; QoS.
Dynamic Bayesian Network Threat Assessment for Warship Formation: a Data Analysis Method
by Haiwen SUN
Abstract: In the target threat assessment of maritime formation air defense, the observation data are easy to be missing, and existing data analysis methods are difficult to carry out dynamic assessment in time series. In order to solve these problems, a data analysis method about threat assessment is proposed, which is based on Discrete Dynamic Bayesian Networks (DDBN) and the utility theory. Firstly, the data characteristics of the target threat assessment are analyzed, and a two-stage dynamic Bayesian network structure evaluation system is constructed. Secondly, the continuous variable in the network structure is transformed into a discrete variable, which can avoid the repeated calculation caused by the continuous change of the node threat attribute value in a small range. Then, the prior probability of the credibility of the uncertainty node to make the Bayesian network parameters more realistic, and the utility theory is introduced to carry out the threat ranking. Finally, the simulation results show that the data analysis method is in good agreement with the artificial judgment. This proposed method has a certain practical significance, which realizes the data processing of dynamic threat assessment.
Keywords: DDBN; Data analysis; Discrete variable; Credibility; Utility theory; Threat assessment;.
Functional Encryption with Efficient Verifiable Outsourced Decryption for Secure Data Acess Control in Social Network
by Li Cong, Yang Xiaoyuan, Liu Yudong, Cao Yunfei
Abstract: For social network, the uer's data streams needs to be securely shared. Attribute-based functional encryption (ABFE) implements fine-grained access control for sensitive data and implements different functional encryption systems with multiple access policies. Attribute-based functional encryption, as a new type of encryption scheme, whose user private key and ciphertext are associated with attributes, is very suitable for data security sharing and fine-grained access control in social network environments. However, the main disadvantages of the attribute-based functional encryption are that the size and decryption time of the ciphertext increase in the complexity of the access formula. In order to reduce the burden of decryption for users, this paper proposed to outsource the computation of functional encryption to cloud server, by providing a transformation key, we show how a user can provide a single transformation key to the cloud, which enables the cloud to convert any ABE ciphertext into a (constant-size) ElGamal-style ciphertext, at the same time through the efficient verification to ensure the correctness of the outsourcing computing. It saves a lot of bandwidth and decryption time for users without increasing the number of transfers data streams.
Keywords: Functional Encryption; Outsourced Decription; Verifiability; ABE; Cloud Computing; Social Network.
Weighting schemes based on EM algorithm for LDA
by Yaya Ju, Jianfeng Yan, Zhiqiang Liu, Lu Yang
Abstract: Latent Dirichlet allocation (LDA) is a popular probabilistic topic modelling method, which automatically finds latent topics from a corpus. LDA users often encounter two major problems: first, LDA treats each word equally, and common words tend to scatter across almost all topics without reason, thereby leading to bad topic interpretability, consistency, and overlap. Second, an appropriate way to distinguish low-dimensional topic features for better classification performance is lacking. To overcome these two shortcomings, we propose two novel weighting schemes: a word-weighted scheme, which is realised by introducing a weight factor during the iterative process, and a topic-weighted scheme, which is realised by combining the Jenson-Shannon (JS) distance and the entropy of the generated low-dimensional topic features as a weight coefficient, using expectation-maximisation (EM). Experimental results show that the word-weighted scheme can find better topics for improving the clustering performance effectively, and the topic-weighted scheme has a larger effect on text classification than traditional methods.
Keywords: latent Dirichlet allocation; LDA; expectation-maximisation; word-weighted scheme; topic-weighted scheme.