International Journal of Big Data Intelligence (6 papers in press)
A survey about legible Arabic fonts for young readers
by Anoual El Kah, Abdelhak Lakhouaja
Abstract: Reading is an interconnected cognitive process including recognition and comprehension. The objective of the reading act could not be achieved unless the text is legible enough to interpret. For that reason, legibility is crucial for the reading mechanism, it will affect reading speed and the recognition of the graphs in the right way. Based on the fact that fonts and the way the text is presented influence childrens reading performance and fluency, the current paper investigates different Arabic fonts in order to determine the optimal font for a fluent reading for children with a low rate of errors in both printed and on-screen texts. This study recruits 33 primary Moroccan school students of third grade and investigates the reading fluency and error rates for five Arabic font types. This paper recommends, as a result, the use of Simplified Arabic font for reducing reading errors due to graphs presentation for either printed or on-screen texts.
Keywords: reading; legibility; Arabic language; fonts; primary schools; Simplified Arabic.
Optimized Parallel Implementation With Dynamic Programming Technique For The Multiple Sequence Alignment
by Gururaj T, Siddesh G M
Abstract: Gene sequencing techniques are very useful in analyzing various diseases, especially cancer. The various techniques have been applied for the gene sequence for effective analysis. These technique help also in reducing the computation time. Most existing methods are of low efficiency in the gene sequence alignment due to lack of proper technique to reduce the gap penalty. In this research, the Optimized Needleman Wunsch algorithm (ONW) is applied for Multiple Sequence Alignment (MSA). The ONW technique uses Needleman Wunsch (NW) algorithm in parallel implementation for multiple genes. The dynamic programming technique such as the backtracking algorithm is applied for reducing the gap penalty in the gene alignment. The proposed ONW algorithm is applied in the case study and being analyzed for its performance. This proves that the proposed ONW algorithm has higher performance compared to the other existing method in the MSA techniques. The proposed method has an average similarity of 88.85%, while the existing method has a similarity of 60.23 %.
Keywords: Backtracking Algorithm; Gap Penalty; Multiple Sequence Alignment; Optimized Needleman Wunsch Algorithm; Parallel Implementation.
Modeling the dynamics of acoustic gaps between speakers during Business-to-Business sales calls
by Anat Lerner, Vered Silber-Varod, Nehoray Carmi, Yonathan Guttel, Omri Allouche
Abstract: The value of Conversation Intelligence as a means of deepening the insights of authentic conversations is a common ground nowadays between researchers and the business community. The rapid development of big data processing algorithms and technology enables companies to process massive amounts of data and meta-data about the conversation flow, that combines content, vocal features and even body gestures.
This study is based on the analysis of 358 Business-to-Business (B2B) sales calls at the Discovery stage. We propose a model to capture the dynamics of acoustic gaps between the sales representatives and customers by relying solely on the acoustic signal. In order to model the conversations we extract a basic set of features from the acoustic signal: speech proportion, fundamental frequency (F0), intensity, harmonics-to-noise ratio (HNR), jitter and shimmer. We focus on the differences between four groups of conversations defined by the speakers' gender pairing (Female-Female , Male-Male, Male-Female, Female-Male). We found significant differences in the behavioral patterns of the dynamics between these four groups. The study demonstrates that using delta metrics to assess the interactions leads to new insights.
Keywords: Conversation Intelligence; conversation modelling; acoustic features; speech data; sales calls.
Distributed Log Management for Dynamically Changing Computing Environments on Cloud
by Takayuki Kushida
Abstract: The logging service for cloud is a core component for operation and management on the production system. The service is usually a central server deployment which the dedicated central servers accept all log messages from leaf computing nodes. As the number of applications and solutions is dynamically changed on cloud, the amount of log messages which are forwarded to the logging service also changed. The paper proposes Distributed Logging Service (DLS) which distributes log messages to multiple leaf computing nodes. There is no central server to manage the logging service. DLS also provides alert notification, authentication, lifetime management and resilience which are required for the production system. The evaluation results on the emulated environment show that DLS provides the logging service for applications and solutions which are used for production usages.
Keywords: Distributed Logging; Logging Service; Cloud Management; Distributed Hash Table.
Transaction Sampling Algorithms for Real-time Crypto Block Dependability
by Abhilash Kancharla, Hye-Young Kim, Nohpill Park
Abstract: This paper presents various transaction sampling algorithms for the proposed real-time crypto computing, and analytical model to assure their dependability. The analytical model is employed in order to guide the design of a crypto computing (Ethereum blockchain-based) under stringent real-time requirement. Based on the preliminary analytical results, a prototype has been built in order to demonstrate the efficacy of the transaction sampling algorithms. The efficacy of the algorithms is assessed in terms of the block-dependability. The block-dependability precisely expresses the probability for the pending transactions to be posted within the current or the target block delay, namely, to have their expected execution time be within the temporal range of the deadline imposed on the transaction. Various algorithms on how to prioritize and sample transactions in the pending transaction pool in order to facilitate execution of those transactions within their deadline requirements, such as the normal, random, sorted, and stratified, are proposed and simulated numerically as a baseline. A set of performance variables such as the number of pending transactions in the pool, the average speed of the transactions, gas fees, deadlines as well as the number of miners, are identified and taken into the block-dependability in order to reveal the influence of those variables on the block-dependability. Extensive parametric simulations results are presented and discussed in the cases of the random transaction sampling algorithm and the sorted. Finally, a prototype for the series of transaction sampling algorithms is built based on the Ethereum open source and a demonstration of each of those algorithms is presented.
Keywords: Blockchain; Ethereum; real-time; dependability; crypto computing.
DeepICU: Imbalanced Classification by using Deep Neural Networks for Network Intrusion Detection
by Allen Yang, Boxiang Dong, Dawei Li, Weifeng Sun, Bharath K. Samanthula
Abstract: Cyber intrusions are becoming more commonplace, more dangerous and more sophisticated. These offensive maneuvers disrupt the work of hospitals, banks, and governments around the world. Therefore, there is a desperate need for a robust intrusion detection system that can accurately distinguish network attacks from normal traffic. However, this is an extremely challenging task. In a healthy network environment, a majority of the connections are initiated by benign behaviors. Despite a wide variety of attacks, they only occupy a limited fraction of the observed network traffic. The imbalanced class distribution implicitly forces conventional classifiers to be biased toward the majority/benign class, thus leave many attack incidents undetected. In this paper, we design a new intrusion detection system named DeepICU based on deep neural networks. To address the class imbalance issue, we design two novel loss functions, i.e., attack-sharing loss and attack-discrete loss, that can effectively move the decision boundary towards the attack classes. In particular, attack-sharing loss takes the discrepancy in penalty for different types of mis-classification into consideration, while attack-discrete loss absorbs class-level recognition error for every minority class. rnExtensive experimental results on three benchmark datasets demonstrate the high detection accuracy of DeepICU. In particular, compared with eight state-of-the-art approaches, DeepICU always provides the best class-balanced accuracy.
Keywords: Intrusion detection; Deep learning; Imbalanced classification; Hard sample mining.