Forthcoming articles

International Journal of High Performance Computing and Networking

International Journal of High Performance Computing and Networking (IJHPCN)

These articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Register for our alerting service, which notifies you by email when new issues are published online.

Open AccessArticles marked with this Open Access icon are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.
We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of High Performance Computing and Networking (22 papers in press)

Regular Issues

  • Greedily assemble tandem repeats for next generation sequences   Order a copy of this article
    by Yongqing Jiang, Jinhua Lu, Jingyu Hou, Wanlei Zhou 
    Abstract: Eukaryotic genomes contain high volumes of intronic and intergenic regions in which repetitive sequences are abundant. These repetitive sequences represent challenges in genomic assignment of short read sequences generated through next generation sequencing and are often excluded in analysis thus losing valuable genomic information. Here we present a method, known as TRA (Tandem Repeat Assembler), for the assembly of repetitive sequences by constructing contigs directly from paired-end reads. Using an experimentally acquired dataset for human chromosome 14, tandem repeats > 200 bp were assembled. Alignment of the contigs to the human genome reference (GRCh38) revealed that 84.3% of tandem repetitive regions were correctly covered. For tandem repeats, this method outperformed state-of-the-art assemblers by generating correct N50 of contigs up to 512 bp.
    Keywords: tandem repeat; assembly; NGS.

  • GeaBase: a high-performance distributed graph database for industry-scale applications   Order a copy of this article
    by Zhisong Fu, Zhengwei Wu, Houyi Li, Yize Li, Xiaojie Chen, Xiaomeng Ye, Benquan Yu, Xi Hu 
    Abstract: Graph analytics has been gaining traction rapidly in the past few years. It has a wide array of application areas in industry, ranging from e-commerce, social network and recommendation systems to fraud detection and virtually any problem that requires insights into data connections, not just data itself. In this paper, we present {GeaBase}, a new distributed graph database that provides the capability to store and analyse graph-structured data in real-time at massive scale. We describe the details of the system and the implementation, including a novel update architecture, called {Update Center} (UC), and a new language that is suitable for both graph traversal and analytics. We also compare the performance of GeaBase to a widely used open-source graph database {Titan}. Experiments show that GeaBase is up to 182x faster than Titan in our testing scenarios. We also achieved 22x higher throughput on social network workloads in comparison.
    Keywords: graph database; distributed database; high performance.

  • Parallel big image data retrieval by conceptualised clustering and un-conceptualised clustering   Order a copy of this article
    by Ja-Hwung Su, Chu-Yu Chin, Jyun-Yu Li, Vincent S. Tseng 
    Abstract: Content-based image retrieval is a hot topic which has been studied for few decades. Although there have been a number of recent studies proposed on this topic, it is still hard to achieve a high retrieval performance for big image data. To aim at this issue, in this paper, we propose a parallel content-based image retrieval method that efficiently retrieves the relevant images by un-conceptualised clustering and conceptualised clustering. For un-conceptualised clustering, the un-conceptualised image data is automatically divided into a number of sets, while the conceptualised image data is divided into multiple sets by conceptualised clustering. Based on the clustering index, the depth-first-search strategy is performed to retrieve the relevant images by parallel comparisons. Through experimental evaluations on a large image dataset, the proposed approach is shown to improve the performance of content-based image retrieval substantially in terms of efficiency.
    Keywords: content-based image retrieval; un-conceptualised clustering; conceptualised clustering; big data; parallel computation.

  • Fault-tolerant flexible lossless cluster compression method for monitoring data in smart grid   Order a copy of this article
    by Zhijian Qu, Hanlin Wang, Xiang Peng, Ge Chen 
    Abstract: Big data in smart grid dispatch monitoring systems is susceptible to interference from processing delays and slow response times.Hence, a new fault-tolerant flexible lossless cluster compression method is proposed. This paper presents the five-tuples (S, D, O, T, M) model, and builds a monitoring data processing platform based on Hive. By deploying the dispatch host and monitoring servers under the cloud computing environment, where data nodes are respectively transformed by Deflate, Gzip, BZip2 and Lzo lossless compression method. Taking the power dispatch automation system of Long-hai line as example, experimental results show that the cluster lossless compression ratio of BZip2 is greater than 81%; when data records reach twelve million, the compression ratio can be further improved to certain extent by using RCFile storage Hive format,which has significant flexible features. Therefore, the new method proposed in this paper can improve the flexibility and fault-tolerant ability of big monitoring data processing in smart grid.
    Keywords: cloud computing; smart grid; cluster lossless compression; fault-tolerant.

  • Combined bit map representation and its applications to query processing of resource description framework on GPU   Order a copy of this article
    by Chantana Chantrapornchai, Chidchanok Choksuchat 
    Abstract: Resource Description Framework (RDF) is a common representation in semantic web context, including the web data sources and their relations in the URI form. With the growth of data accessible on the internet, the RDF data currently contains millions of relations. Thus, answering a semantic query requires going through large amounts of data relations, which is time consuming. In this work, we present a representation framework, Combined Bit Map (CBM) representation, which compactly represents RDF data while helping to speed up semantic query processing using Graphics Processing Units (GPUs). Since GPUs have limited memory size, without compaction the RDF data cannot be entirely stored in the GPU memory; the CBM structure enables more RDF data to reside in the GPU memory. Since GPUs have many processing elements, their parallel use speeds up RDF query processing. The experimental results show that the proposed representation can reduce the size of RDF data by 70%. Furthermore, the search time on this representation using the GPU is 60% faster than with conventional implementation.
    Keywords: graphic processing unit; semantic web; query processing; parallel processing; bit map.

  • A DSL for elastic component-based cloud application   Order a copy of this article
    by Saddam Hocine Hiba, Meriem Belguidoum 
    Abstract: The deployment of component-based applications in cloud system environments is becoming more and more complex. It is expected to provide elasticity in order to allow a deployed application to scale dynamically and meet variation in demand while ensuring a certain level of Quality of Service (QoS). However, there are still some open issues associated with the elasticity management. A conceptual model of elasticity management enabling the description of deployment and application constraints, properties and elasticity strategies at different levels (depending on the internal application architecture or on the cloud infrastructure and platform) in an automatic way is needed. In this paper, we propose a domain-specific language (DSL) based on a metamodel, which precisely specifies three main views: the cloud service models, the automatic elasticity management strategies and the internal cloud application architecture. We illustrate, through a case study, the MAPE-K based approach using different scenarios of automatic elasticity management.
    Keywords: cloud computing; elasticity management; component-based application; MDA; DSL; MAPE-K.

  • Selection of effective probes for an individual to identify P300 signal generated from P300 BCI speller   Order a copy of this article
    by Weilun Wang, Goutam Chakraborty 
    Abstract: P300 is a strong Event Related Potential (ERP) generated in the brain and observed on the scalp when an unusual event happens. To decipher the P300 signal, we have to use the property of P300 to distinguish P300 signal from non-P300 signal. In this work, we used data collected from P300 BCI speller with 128 probes. Conventional BCI speller uses eight probes at pre-defined locations on the skull. Though P300 is strong in the parietal region of the brain, location of the strongest signal varies from person to person. The idea is that, if we optimise probe locations for an individual, we could reduce the number of probes required. In fact, the process mode for the raw brain wave signals also will affect the classification accuracy. We designed an algorithm to analyse the raw signals. We achieved over 81% classification accuracy on average with only three probes from only one target stimulus and one non-target stimulus.
    Keywords: event related potential; probes reduction; P300 amplitude; brain computer interface.

  • An efficient approach to optimise I/O cost in data-intensive applications using inverted indexes on HDFS splits   Order a copy of this article
    by Narinder Seera, S. Taruna 
    Abstract: Hadoop is prominent for its scalable and distributed computing capabilities coupled with Hadoop Distributed File System (HDFS). Hadoop MapReduce framework is extensively used for exploratory big data analytics by business-intelligence applications and machine learning tools. The analytic queries executed by these applications often include multiple ad hoc queries and aggregate queries with some selection predicates. The cost of executing these queries grow incredibly as the size of dataset grows. The most effective strategy to improve query performance in such applications is to process only relevant data keeping irrelevant data aside, which can be done using index structures. This strategy reduces the overall cost of running applications which comes from amount of I/O to be processed or amount of data to be transferred among nodes of the cluster. This paper is an attempt to improve query performance by avoiding full scans on data files - which can be done by creating custom indexes on HDFS data. The algorithms used in this paper create inverted indexes on HDFS input splits. We show how query processing in MapReduce jobs can benefit in terms of performance by employing these custom indexes. The experiments demonstrate that queries executed using indexed data execute 1.5 times faster than the traditional queries which do not use any index structures.
    Keywords: inverted index; MapReduce; I/O cost; HDFS; input splits.

  • Generic data storage-based dynamic mobile app for standardised electronic health records database   Order a copy of this article
    by Shivani Batra, Shelly Sachdeva, Subhash Bhalla 
    Abstract: Standardisation plays an important role in making healthcare application worldwide adaptable. It uses archetypes for semantic interoperability. In addition to the interoperability, a mechanism to handle future evolution is the primary concern for market sustainability. An application should possess dynamism in terms of the front end (user interface) as well as the back end (database) to build a future proof system. Current research aims to extend the functionality of prior work on Healthsurance with a search efficient generic storage and validation support. At application level, graphical user interface is dynamically build using knowledge provided by standards in terms of archetypes. At the database level, generic storage structure is provided with improved searching capabilities to support faster access, to capture dynamic knowledge evolution and to handle sparseness. A standardised format and content helps to uplift the credibility of data and maintains a uniform, and specific set of constraints used to evaluate users health. Architecture proposed in current research enables implementation of mobile app based on an archetype paradigm that can avoid reimplementation of the systems, supports migrating databases and allows the creation of future-proof systems.
    Keywords: standardised electronic health records; generic database; sparseness; frequent evolution; mobile application.

  • A novel ECC-based lightweight authentication protocol for internet of things devices   Order a copy of this article
    by Aakanksha Tewari, Brij Gupta 
    Abstract: In spite of being a promising technology which will make our lives a lot easier, we cannot be oblivious to the fact the internet of things (IoT) is not safe from online threat and attacks. Thus, along with the growth of IoT, we also need to work on these aspects. Taking into account the limited resources that these devices have, it is important that the security mechanisms should also be less complex and do not hinder the actual functionality of the device. In this paper, we propose an ECC-based lightweight authentication for IoT devices which deploy RFID tags at the physical layer. ECC is a very efficient public key cryptography mechanism as it provides privacy and security with less computation overhead. We also present a security and performance analysis to verify the strength of our proposed approach. We have verified the security and authentication session execution of our protocol using the Promela model and the SPIN tool.
    Keywords: security; authentication; internet of things; RFID.

  • ContraMax: accelerating maximum-flow calculation in large and sparse graph   Order a copy of this article
    by Wei Wei, Yongxin Zhang 
    Abstract: Maximum flow (max-flow) problem is important in graph theory, the corresponding max-flow algorithm has many application in cyberspace security, but its acceleration in large-scale graph is still an open issue. Existing acceleration methods mainly evolve in two directions: one is preprocessing based problem reduction and the other is algorithm parallelization. However, existing preprocessing methods provide nearly no support for subsequent parallel computation, thus cannot take full advantage of underlying parallel computing infrastructure. We propose a novel acceleration method that incorporates graph preprocessing and parallel computing together, where the bi-connected component is used to preprocess the large-scale graph, which results in well-divided sub-graphs facilitating algorithm parallelisation. Different from existing methods, the sub-problem in each sub-graph is solved on demand, as to further save computation time. Experiments using benchmark graphs show that in large and sparse graphs, the proposed method can reduce the computation time of fastest max-flow implementation by at most five orders of magnitude, and also outperforms existing preprocessing methods significantly.
    Keywords: maximum-flow; graph contraction; bi-connected component; graph shrink; parallel algorithms.

  • Congestion management in overlay networks   Order a copy of this article
    by Fouad Benamrane, Ali Sanhaji, Philippe Niger, Philippe Cadro 
    Abstract: Network congestion is a problem that could affect seriously the network performance if it is not taken into consideration especially in overlays network used in cloud environments. While congestion in these environments happens in the underlay network, the source of the congestion may come from virtual machines located in the overlay. Our contribution is to provide real-time mechanisms to manage congestion from overlay networks. The steps of congestion reaction are as follow. First, we monitor and identify a congestion event using Explicit Congestion Notification (ECN) mechanism. Second, we take a decision to react to congestion when the estimated congestion during monitoring exceeds a tolerated threshold. Last, we react to the congestion to improve network performance by limiting the throughput of virtual machines in the overlay network. Through a proof of concept, we show the efficiency of our implementation to react to the congestion when the threshold exceeded the defined limits.
    Keywords: overlays; performance; ECN; congestion; Openstack.

  • Comparative analysis of real-time messages in big data pipeline Architecture   Order a copy of this article
    by Thandar Aung, Hla Yin Min, Aung Htein Maw 
    Abstract: Nowadays, real time messaging systems are the essential thing in enabling time-critical decision making in many applications where it is important to deal with real-time requirements and reliability requirements simultaneously. For dependability reasons, we intend to maximise the reliability requirement of real time messaging systems. To develop a real time messaging system, we create a real time big data pipeline by using Apache Kafka and Apache Storm. This paper focuses on analysing the performance of producer and consumer in Apache Kafka processing. Apache Kafka is the most popular framework used to ingest the data streams into the processing platforms. The comparative analysis of Kafka processing is more efficient to get reliable data on the pipeline architecture. Then, an experiment will be conducted on the processing time in the performance of the producer and consumer on various partitions and many servers. The performance analysis of Kafka can impact the messaging system in real time big data pipeline architecture.
    Keywords: messaging; real-time processing; Apache Kafka; Apache Storm.

  • Checkpointing distributed computing systems: an optimisation approach   Order a copy of this article
    by Houssem Mansouri, Al-Sakib Khan Pathan 
    Abstract: The intent of this paper is to propose an optimisation approach for a new coordinator blocking type checkpointing algorithm to ensure reliability and fault tolerance in distributed computing systems. More precisely, we have undertaken an exhaustive study of the reference coordinator blocking checkpointing algorithms proposed in the literature. This study enables us to characterise them in order to benefit from their positive aspects and guides us to put forward a new optimisation approach based on dependency matrices offering the advantage of distribution. Therefore, we can optimise the checkpointing execution/blocking time to the strict necessity compared with the message computation overhead. The simulation studies prove the effectiveness of our optimisation compared with other referenced algorithms.
    Keywords: distributed computing; reliability; fault tolerance; checkpointing algorithm; consistent global checkpoint; optimisation.

  • Modelling shared resource competition for multicores using adapted Tilman model   Order a copy of this article
    by Preeti Jain, Sunil Surve 
    Abstract: The need to meet the high computing demands under constraints of power, low latency, inter-process interference, etc., has led to a shift in paradigm from uni-processor systems to multicore systems. The challenge in these multicore systems arises from the fact that these cores are not independent in functioning, rather they share a few on-chip and off-chip resources. This resource sharing also exacerbates performance due to cross-core interference. Different workloads running on these cores demand different resources for their growth in performance. In this work, we examine the contention due to resource sharing amongst co-runners using the adapted multi-species Tilman model. The effect of two shared resources cache and DRAM bus bandwidth on co-runners is investigated on the basis of limiting resource for each of the applications. Alternative solutions, such as application scheduling and resource partitioning, are simulated using the built model. Based on the simulation results a comparison between solo and co-running systems of different application programs under both the solution regimes is conducted at various resource levels. The performance of co-runners under two conditions is analysed, first when workloads are scheduled together using common resources concurrently and then when each workload is allocated static resource based on its consumption characteristics. The outcomes depict performance due to pairing various classes of workloads. The observed phenomenon can be used for prior study or prediction of performance of applications when co-run to mitigate contention due to shared resources.
    Keywords: multicore; constrained resources; Tilman model; competition; LLC; bandwidth use; scheduling; resource segregation.

  • Distributed privacy-preserving technology in dynamic networks   Order a copy of this article
    by Zhuolin Li, Xiaolin Zhang, Haochen Yuan, Yongping Wang, Jian Li 
    Abstract: With the development of information technology, large-scale social network graph data has been produced, while traditional network privacy protection technology does not meet the actual requirements. In this paper, we address the privacy risks of link disclosure in sequential release of a dynamic network. To prevent privacy breaches, we proposed the privacy model km-Number of Mutual Friend, where k indicates the privacy level and m is a time period that an adversary can monitor a victim to collect the attack knowledge. We present a distributed algorithm to generate releases by adding nodes in parallel. Further, in order to improve availability of anonymous graphs, distributed greedy merge noise node algorithm (DGMNNA) is designed to reduce the number of nodes added under the premise of satisfying the anonymous model. The experimental results show that the proposed algorithm can efficiently handle large-scale social network data while ensuring the availability of anonymous data
    Keywords: dynamic; large scale graph; link disclosure; distributed; anonymization; availability.

Special Issue on: Recent Advances in Security and Privacy for Big Data

  • A mathematical model for intimacy-based security protection in social networks without violation of privacy   Order a copy of this article
    by Hui Zheng, Jing He, Yanchun Zhang, Junfeng Wu 
    Abstract: Protection against spam, fraud and phishing becomes increasingly important in the applications of social networks. Online social network providers such as Facebook and MySpace collect data from users including their relation and education statuses. While these data are used to provide users with convenient services, improper use of these data such as spam advertisement can be annoying and even harmful. Even worse, if these data are somehow stolen or illegally gathered, the users might be exposed to fraud and phishing. To further protect individual privacy, we employ an intimacy algorithm without the violation of privacy. Also, we explore spammers through detecting unusual intimacy phenomenon. We, therefore, propose a mathematical model for intimacy based security protection in a social network without the violation of privacy in this paper. Moreover, the feasibility and the effectiveness of our model is testified theoretically and experimentally.
    Keywords: social network; privacy protection; intimacy; spam detection.

Special Issue on: CloudTech'17 Advances in Big Data and Cloud Computing

  • Adaptive and concurrent negotiation for an efficient cloud provisioning   Order a copy of this article
    by Aya Omezzine, Narjès Bellamine, Said Tazi, Gene Cooperman 
    Abstract: Business providers offer highly scalable applications to end-users. To run the users' requests efficiently, business providers must take the right decision about requests placement on virtual resources. An efficient provisioning that satisfies users and optimises the providers profit becomes a challenging task owing to the dynamicity of the cloud. An efficient provisioning becomes harder when considering inflexible take-it-or-leave-it service level agreement. Negotiation-based approaches are promising solutions when dealing with conflicts. Using negotiation, the users and providers may find a satisfactory schedule. However, reaching a compromise between the two parties is a cumbersome task owing to workload constraints at negotiation time. The majority of elaborated approaches reject the users' requests when negotiation fails. In this paper, we propose a novel adaptive negotiation approach that keeps renegotiating concurrently with those users based on workload changes. rnExperiments show that our approach maximises the provider's profit, increases the number of accepted users, and improves the customer satisfaction.
    Keywords: cloud computing; cloud provisioning; service level agreement; user satisfaction; adaptive negotiation; renegotiation.

Special Issue on: ICCIDS 2018 High-Performance Computing for Computational Intelligence

  • Wavelet-based arrhythmia detection of ECG signals and performance measurement using diverse classifiers   Order a copy of this article
    by Ritu Singh, Rajesh Mehta, Navin Rajpal 
    Abstract: The diagnosis of cardiovascular arrhythmias needs accurate predictive models to test abnormalities in the functioning of the heart. The proposed work manifests a comparative analysis of different classifiers, such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Back Propagation Neural Network (BPNN), Feed Forward Neural Network (FFNN) and Radial Basis Function Neural Network (RBFNN) with Discrete Wavelet Transform (DWT) to assess an electrocardiogram (ECG). The ECG record sets of MIT-BIH dataset are employed to test the efficacy of different classifiers. For DWT, different wavelets such as daubechies, haar, symlet, biorthogonal, reverse biorthogonal and coiflet are used for feature extraction, and their performances are compared. The foremost daubechie wavelet is demonstrated in detail in this paper. SVM and RBFNN have shown 100% accuracy with reduced dataset testing time of 0.0025 s and 0.0174 s, respectively, whereas BPNN, FFNN and KNN provided 95.5%, 97.7% and 84.0% accuracy with 0.0176 s, 0.0189 s and 0.0033 s of testing time, respectively. This proposed scheme builds an efficient selection of wavelet with best-suited classifier for timely perusal of cardiac disturbances.

  • A novel fuzzy convolutional neural network for recognition of handwritten Marathi numerals   Order a copy of this article
    by Deepak Mane, Uday Kulkarni 
    Abstract: Pattern classification is the approach of designing a method to map the inputs to the matching output classes. A Novel Fuzzy Convolutional Neural Network (FCNN) is proposed in this paper for recognition of handwritten Marathi numerals. FCNN uses fuzzy set hypersphere as a pattern classifier to map inputs to classes represented by the combination of the fuzzy set hypersphere. Given labelled classes, the model designed proved efficient with 100% accuracy on the training set. The two major factors that improve the learning algorithm of FCNN are: first, extract the dominant features from numeral image patterns using customised Convolutional Neural Network (CCNN); second, use supervised clustering to create a new fuzzy hypersphere based on the distance measurement learning rules of Fuzzy Hypersphere Neural Network (FHSNN) and pattern classification done by the fuzzy membership function. Performance evaluation of model is done on large datasets of Marathi numerals and its performance is found to be superior to the traditional CNN model. The obtained results demonstrate the fact that FCNN learning rules can be used as a useful representation for different classification pattern problems.
    Keywords: fuzzy hypersphere neural network; convolutional neural network; pattern classification; supervised clustering.

Special Issue on: CSS 2018 Smart Monitoring and Protection of Data-Intensive Cyber-Physical Critical Infrastructures

  • Security in the internet of things: botnet detection in software-defined networks by deep learning techniques   Order a copy of this article
    by Ivan Letteri, Giuseppe Della Penna, Giovanni De Gasperis 
    Abstract: The diffusion of the Internet of Things (IoT) is making cyber-physical smart devices an element of everyone's life, but also exposing them to malware designed for conventional web applications, such as botnets. Botnets are one of the most widespread and dangerous malwares, so their detection is an important task. Many works in this context exploit general malware detection techniques and rely on old or biased traffic samples, making their results not completely reliable. Moreover, software-defined networking (SDN), which is increasingly replacing conventional networking, especially in the IoT, limits the features that can be used to detect botnets. We propose a botnet detection methodology based on deep learning techniques, tested on a new, SDN-specific dataset with a high (up to 97%) classification accuracy. Our algorithms have been implemented on two state-of-the-art frameworks, i.e., Keras and TensorFlow, so we are confident that our results are reliable and easily reproducible.
    Keywords: cyber-physical devices; internet of things; software-defined networking; botnet detection; machine learning; neural networks; deep learning; network security.

Special Issue on: Advances in Information Security and Networks

  • Dynamic combined with static analysis for mining network protocols' hidden behaviour
    by YanJing Hu 
    Abstract: Unknown protocols' hidden behaviour is becoming a new challenge in network security. This paper takes both the captured messages and the binary code that implement the protocol as the studied objects. Dynamic Taint Analysis combined with Static Analysis is used for protocol analysing. Firstly, we monitor and analyse the process of protocol program that parses the message in the virtual platform HiddenDisc prototype system developed by ourselves, and record the protocols public behaviour, then based on our proposed hidden behaviour perception and mining algorithm, we perform static analysis of the protocols hidden behaviour trigger conditions and hidden behaviour instruction sequences. According to the hidden behaviour trigger conditions, new protocol messages with the sensitive information are generated, and the hidden behaviours are executed by dynamic triggering. HiddenDisc prototype system can sense, trigger and analyse the protocols hidden behaviour. According to the statistical analysis results, we propose the evaluation method of protocol execution security. The experimental results show that the present method can accurately mining the protocols hidden behaviour, and can evaluate an unknown protocols execution security.
    Keywords: protocol reverse analysis; protocols' hidden behaviour; protocol message; protocol software.