International Journal of High Performance Computing and Networking (16 papers in press)
Wavelet-based arrhythmia detection of ECG signals and performance measurement using diverse classifiers
by Ritu Singh, Rajesh Mehta, Navin Rajpal
Abstract: The diagnosis of cardiovascular arrhythmias needs accurate predictive models to test abnormalities in the functioning of the heart. The proposed work manifests a comparative analysis of different classifiers, such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Back Propagation Neural Network (BPNN), Feed Forward Neural Network (FFNN) and Radial Basis Function Neural Network (RBFNN) with Discrete Wavelet Transform (DWT) to assess an electrocardiogram (ECG). The ECG record sets of MIT-BIH dataset are employed to test the efficacy of different classifiers. For DWT, different wavelets such as daubechies, haar, symlet, biorthogonal, reverse biorthogonal and coiflet are used for feature extraction, and their performances are compared. The foremost daubechie wavelet is demonstrated in detail in this paper. SVM and RBFNN have shown 100% accuracy with reduced dataset testing time of 0.0025 s and 0.0174 s, respectively, whereas BPNN, FFNN and KNN provided 95.5%, 97.7% and 84.0% accuracy with 0.0176 s, 0.0189 s and 0.0033 s of testing time, respectively. This proposed scheme builds an efficient selection of wavelet with best-suited classifier for timely perusal of cardiac disturbances.
Keywords: ECG; MIT-BIH; DWT; BPNN; FFNN; KNN; RBFNN; SVM.
A novel fuzzy convolutional neural network for recognition of handwritten Marathi numerals
by Deepak Mane, Uday Kulkarni
Abstract: Pattern classification is the approach of designing a method to map the
inputs to the matching output classes. A Novel Fuzzy Convolutional Neural Network (FCNN) is proposed in this paper for recognition of handwritten Marathi numerals. FCNN uses fuzzy set hypersphere as a pattern classifier to map inputs to classes represented by the combination of the fuzzy set hypersphere. Given labelled classes, the model designed proved efficient with 100% accuracy on the training set. The two major factors that improve the learning algorithm of FCNN are: first, extract the dominant features from numeral image patterns using customised Convolutional Neural Network (CCNN); second, use supervised clustering to create a new fuzzy hypersphere based on the distance measurement learning rules of Fuzzy Hypersphere Neural Network (FHSNN) and pattern classification done by the fuzzy membership function. Performance evaluation of model is done on large datasets of Marathi numerals and its performance is found to be superior to the traditional CNN model. The obtained results demonstrate the fact that FCNN learning rules can be used as a useful representation for different classification pattern problems.
Keywords: fuzzy hypersphere neural network; convolutional neural network; pattern classification; supervised clustering.
ContraMax: accelerating maximum-flow calculation in large and sparse graph
by Wei Wei, Yongxin Zhang
Abstract: Maximum flow (max-flow) problem is important in graph theory, the corresponding max-flow algorithm has many application in cyberspace security, but its acceleration in large-scale graph is still an open issue. Existing acceleration methods mainly evolve in two directions: one is preprocessing based problem reduction and the other is algorithm parallelization. However, existing preprocessing methods provide nearly no support for subsequent parallel computation, thus cannot take full advantage of underlying parallel computing infrastructure. We propose a novel acceleration method that incorporates graph preprocessing and parallel computing together, where the bi-connected component is used to preprocess the large-scale graph, which results in well-divided sub-graphs facilitating algorithm parallelisation. Different from existing methods, the sub-problem in each sub-graph is solved on demand, as to further save computation time. Experiments using benchmark graphs show that in large and sparse graphs, the proposed method can reduce the computation time of fastest max-flow implementation by at most five orders of magnitude, and also outperforms existing preprocessing methods significantly.
Keywords: maximum-flow; graph contraction; bi-connected component; graph shrink; parallel algorithms.
Congestion management in overlay networks
by Fouad Benamrane, Ali Sanhaji, Philippe Niger, Philippe Cadro
Abstract: Network congestion is a problem that could affect seriously the network performance if it is not taken into consideration especially in overlays network used in cloud environments. While congestion in these environments happens in the underlay network, the source of the congestion may come from virtual machines located in the overlay. Our contribution is to provide real-time mechanisms to manage congestion from overlay networks. The steps of congestion reaction are as follow. First, we monitor and identify a congestion event using Explicit Congestion Notification (ECN) mechanism. Second, we take a decision to react to congestion when the estimated congestion during monitoring exceeds a tolerated threshold. Last, we react to the congestion to improve network performance by limiting the throughput of virtual machines in the overlay network. Through a proof of concept, we show the efficiency of our implementation to react to the congestion when the threshold exceeded the defined limits.
Keywords: overlays; performance; ECN; congestion; Openstack.
Comparative analysis of real-time messages in big data pipeline Architecture
by Thandar Aung, Hla Yin Min, Aung Htein Maw
Abstract: Nowadays, real time messaging systems are the essential thing in enabling time-critical decision making in many applications where it is important to deal with real-time requirements and reliability requirements simultaneously. For dependability reasons, we intend to maximise the reliability requirement of real time messaging systems. To develop a real time messaging system, we create a real time big data pipeline by using Apache Kafka and Apache Storm. This paper focuses on analysing the performance of producer and consumer in Apache Kafka processing. Apache Kafka is the most popular framework used to ingest the data streams into the processing platforms. The comparative analysis of Kafka processing is more efficient to get reliable data on the pipeline architecture. Then, an experiment will be conducted on the processing time in the performance of the producer and consumer on various partitions and many servers. The performance analysis of Kafka can impact the messaging system in real time big data pipeline architecture.
Keywords: messaging; real-time processing; Apache Kafka; Apache Storm.
Checkpointing distributed computing systems: an optimisation approach
by Houssem Mansouri, Al-Sakib Khan Pathan
Abstract: The intent of this paper is to propose an optimisation approach for a new coordinator blocking type checkpointing algorithm to ensure reliability and fault tolerance in distributed computing systems. More precisely, we have undertaken an exhaustive study of the reference coordinator blocking checkpointing algorithms proposed in the literature. This study enables us to characterise them in order to benefit from their positive aspects and guides us to put forward a new optimisation approach based on dependency matrices offering the advantage of distribution. Therefore, we can optimise the checkpointing execution/blocking time to the strict necessity compared with the message computation overhead. The simulation studies prove the effectiveness of our optimisation compared with other referenced algorithms.
Keywords: distributed computing; reliability; fault tolerance; checkpointing algorithm; consistent global checkpoint; optimisation.
Modelling shared resource competition for multicores using adapted Tilman model
by Preeti Jain, Sunil Surve
Abstract: The need to meet the high computing demands under constraints of power, low latency, inter-process interference, etc., has led to a shift in paradigm from uni-processor systems to multicore systems. The challenge in these multicore systems arises from the fact that these cores are not independent in functioning, rather they share a few on-chip and off-chip resources. This resource sharing also exacerbates performance due to cross-core interference. Different workloads running on these cores demand different resources for their growth in performance. In this work, we examine the contention due to resource sharing amongst co-runners using the adapted multi-species Tilman model. The effect of two shared resources cache and DRAM bus bandwidth on co-runners is investigated on the basis of limiting resource for each of the applications. Alternative solutions, such as application scheduling and resource partitioning, are simulated using the built model. Based on the simulation results a comparison between solo and co-running systems of different application programs under both the solution regimes is conducted at various resource levels. The performance of co-runners under two conditions is analysed, first when workloads are scheduled together using common resources concurrently and then when each workload is allocated static resource based on its consumption characteristics. The outcomes depict performance due to pairing various classes of workloads. The observed phenomenon can be used for prior study or prediction of performance of applications when co-run to mitigate contention due to shared resources.
Keywords: multicore; constrained resources; Tilman model; competition; LLC; bandwidth use; scheduling; resource segregation.
Distributed privacy-preserving technology in dynamic networks
by Zhuolin Li, Xiaolin Zhang, Haochen Yuan, Yongping Wang, Jian Li
Abstract: With the development of information technology, large-scale social network graph data has been produced, while traditional network privacy protection technology does not meet the actual requirements. In this paper, we address the privacy risks of link disclosure in sequential release of a dynamic network. To prevent privacy breaches, we proposed the privacy model km-Number of Mutual Friend, where k indicates the privacy level and m is a time period that an adversary can monitor a victim to collect the attack knowledge. We present a distributed algorithm to generate releases by adding nodes in parallel. Further, in order to improve availability of anonymous graphs, distributed greedy merge noise node algorithm (DGMNNA) is designed to reduce the number of nodes added under the premise of satisfying the anonymous model. The experimental results show that the proposed algorithm can efficiently handle large-scale social network data while ensuring the availability of anonymous data
Keywords: dynamic; large scale graph; link disclosure; distributed; anonymization; availability.
A survey of load balancing in distributed systems
by Abderraziq Semmoud, Mourad Hakem, Badr Benmammar
Abstract: With technological progress, distributed systems are widely deployed for parallel processing of computationally intensive applications with heterogeneous computing needs. Such environments require effective load balancing strategies that consider both algorithmic and architectural constraints. Indeed, the efficient load balancing of applications is crucial in order to reach high performance in parallel and distributed systems. By and large, the objective of load balancing is to find a judicious and a suitable workload distribution in order to reduce as much as possible the load difference between the computational resources of the network. The proposed work presents a rigorous survey of the relevant existing load balancing techniques in several types of distributed system. Using a detailed classification, the strengths and weaknesses of these techniques have been investigated according to the general characteristics of the underlying systems. We also present the main issues and features of fault tolerance and reliability for load balancing in distributed systems.
Keywords: load balancing; distributed systems; cloud computing; grid computing; wireless sensor networks; dependability.
Distributed software defined information Ccntric networking
by Rihab Jmal, Lamia Chaari Fourati
Abstract: Recently, a new trend has emerged based on combining Software Defined Networking (SDN) and Information Centric Networking (ICN) as a promising approach for the future Internet. More serious control plane problems related to scalability, fault-tolerance and consistency may confront Software Defined Information Centric Networking (SD-ICN) compared with the traditional SDN environment, regarding new augmented features such as content name based communication and in-network caching. In this paper, we propose a Distributed Software Defined Information Centric Networking (DSD-ICN) that provides ICN features over SDN network with multiple controllers. We address in our design the fault-tolerant and strong consistency of the control plane, which allows the transparent distribution of the content over different network domains.
Keywords: software defined networking; information-centric networking; multiple controllers; inter-domain; distributed.
Designing a new job scheduling model for grid computing environment based on jobs' categorical variables and linear regression model
by Hazem Al-najjar, Syed Alhady, Junita Saleh
Abstract: This paper presents two linear regression prediction models for the run time of the jobs, which are continuous and categorical predictors. User ID, group ID and executable ID are used to build a categorical predictor, where number of CPU, average CPU speed and memory size are used to build a continuous predictor. The results show that the prediction rates for continuous and categorical predictors are equal to 1% and 61%, respectively. This is an improvement equal to 60 times compared with the previous models, which considered continuous variables as a basic model to be used to calculate the weight and the complexity of the job. After that, the categorical predictor is used with three proposed job scheduling algorithms, to check the efficiency of the predictor on improving the job scheduling problem. The proposed algorithms used combined metrics to choose the smallest jobs, those metrics are predicted run time, waiting time of the job, and the resources requirement of the job. The results indicate that Algorithm 3 (which uses predicted run time and the resource requirement) outperforms previous models in both performance metrics, in which the improvement is between 1.14 and 1.76 in total execution time and between 1.21 and 4.5 in average waiting time. Algorithms 1 and 2 show better performance in all cases except one case of average waiting time compared with LJF. This indicates that using categorical linear regression predictor can improve and enhance the performance of the job scheduling models. Besides that, the categorical variables can be used as indicators of the job's weight.
Keywords: job scheduling; grid computing; linear regression; categorical variables; prediction model.
Multi-model coupling method for imbalanced network traffic classification based on clustering
by Zhengzhi Tang
Abstract: The identification of network traffic is of great significance for traffic management, billing and security detection. However, the imbalanced category of traffic in network poses a challenge to the current identification methods based on machine learning, because the unbalanced data structure affects the performance of machine learning algorithms. In this paper, we propose a multi-model coupling approach to address the imbalanced data problem in network traffic classification. In the training state, we used a clustering algorithm to process the major class and the major class can be categorised into some clusters. Then, we used these clusters and the minor class to form the training dataset for training machine learning model respectively, and finally the corresponding different trained models were obtained. In the test state, the test dataset was input into the previously trained models, and the identification results of the respective models are coupled to obtain the final identification result. We tested our method on two well-known network traffic datasets and the results showed that our proposed method achieved better performance and in shorter time compared with recent proposed methods for handling imbalance problem in network traffic classification in the case where the ratio of minor to major classes is very small.
Keywords: machine learning; imbalanced network traffic classification; clustering algorithm; multi-model coupling.
Special Issue on: Recent Advances in Security and Privacy for Big Data
A mathematical model for intimacy-based security protection in social networks without violation of privacy
by Hui Zheng, Jing He, Yanchun Zhang, Junfeng Wu
Abstract: Protection against spam, fraud and phishing becomes increasingly important in the applications of social networks. Online social network providers such as Facebook and MySpace collect data from users including their relation and education statuses. While these data are used to provide users with convenient services, improper use of these data such as spam advertisement can be annoying and even harmful. Even worse, if these data are somehow stolen or illegally gathered, the users might be exposed to fraud and phishing. To further protect individual privacy, we employ an intimacy algorithm without the violation of privacy. Also, we explore spammers through detecting unusual intimacy phenomenon. We, therefore, propose a mathematical model for intimacy based security protection in a social network without the violation of privacy in this paper. Moreover, the feasibility and the effectiveness of our model is testified theoretically and experimentally.
Keywords: social network; privacy protection; intimacy; spam detection.
Special Issue on: CloudTech'17 Advances in Big Data and Cloud Computing
Adaptive and concurrent negotiation for an efficient cloud provisioning
by Aya Omezzine, Narjès Bellamine, Said Tazi, Gene Cooperman
Abstract: Business providers offer highly scalable applications to end-users. To run the users' requests efficiently, business providers must take the right decision about requests placement on virtual resources. An efficient provisioning that satisfies users and optimises the providers profit becomes a challenging task owing to the dynamicity of the cloud. An efficient provisioning becomes harder when considering inflexible take-it-or-leave-it service level agreement. Negotiation-based approaches are promising solutions when dealing with conflicts. Using negotiation, the users and providers may find a satisfactory schedule. However, reaching a compromise between the two parties is a cumbersome task owing to workload constraints at negotiation time. The majority of elaborated approaches reject the users' requests when negotiation fails. In this paper, we propose a novel adaptive negotiation approach that keeps renegotiating concurrently with those users based on workload changes. rnExperiments show that our approach maximises the provider's profit, increases the number of accepted users, and improves the customer satisfaction.
Keywords: cloud computing; cloud provisioning; service level agreement; user satisfaction; adaptive negotiation; renegotiation.
Special Issue on: CSS 2018 Smart Monitoring and Protection of Data-Intensive Cyber-Physical Critical Infrastructures
Security in the internet of things: botnet detection in software-defined networks by deep learning techniques
by Ivan Letteri, Giuseppe Della Penna, Giovanni De Gasperis
Abstract: The diffusion of the Internet of Things (IoT) is making cyber-physical smart devices an element of everyone's life, but also exposing them to malware designed for conventional web applications, such as botnets. Botnets are one of the most widespread and dangerous malwares, so their detection is an important task. Many works in this context exploit general malware detection techniques and rely on old or biased traffic samples, making their results not completely reliable. Moreover, software-defined networking (SDN), which is increasingly replacing conventional networking, especially in the IoT, limits the features that can be used to detect botnets. We propose a botnet detection methodology based on deep learning techniques, tested on a new, SDN-specific dataset with a high (up to 97%) classification accuracy. Our algorithms have been implemented on two state-of-the-art frameworks, i.e., Keras and TensorFlow, so we are confident that our results are reliable and easily reproducible.
Keywords: cyber-physical devices; internet of things; software-defined networking; botnet detection; machine learning; neural networks; deep learning; network security.
Special Issue on: Advances in Information Security and Networks
Dynamic combined with static analysis for mining network protocols' hidden behaviour
by YanJing Hu
Abstract: Unknown protocols' hidden behaviour is becoming a new challenge in network security. This paper takes both the captured messages and the binary code that implement the protocol as the studied objects. Dynamic Taint Analysis combined with Static Analysis is used for protocol analysing. Firstly, we monitor and analyse the process of protocol program that parses the message in the virtual platform HiddenDisc prototype system developed by ourselves, and record the protocols public behaviour, then based on our proposed hidden behaviour perception and mining algorithm, we perform static analysis of the protocols hidden behaviour trigger conditions and hidden behaviour instruction sequences. According to the hidden behaviour trigger conditions, new protocol messages with the sensitive information are generated, and the hidden behaviours are executed by dynamic triggering. HiddenDisc prototype system can sense, trigger and analyse the protocols hidden behaviour. According to the statistical analysis results, we propose the evaluation method of protocol execution security. The experimental results show that the present method can accurately mining the protocols hidden behaviour, and can evaluate an unknown protocols execution security.
Keywords: protocol reverse analysis; protocols' hidden behaviour; protocol message; protocol software.