Forthcoming articles

International Journal of Big Data Intelligence

International Journal of Big Data Intelligence (IJBDI)

These articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Register for our alerting service, which notifies you by email when new issues are published online.

Open AccessArticles marked with this Open Access icon are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.
We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Big Data Intelligence (10 papers in press)

Regular Issues

  • A Survey of Computation Techniques on Time Evolving Graphs   Order a copy of this article
    by Shalini Sharma, Jerry Chou 
    Abstract: Time Evolving Graph (TEG) refers to graphs whose topology or attribute values change over time due to update events, including edge addition/deletion, vertex addition/deletion and attributes changes on vertex or edge. Driven by the Big Data paradigm, the ability to process and analyze TEG in a timely fashion is critical in many application domains, such as social network, web graph, road network trac, etc. Recently, many research e orts have been made with the aim to address the challenges of volume and velocity from dealing with such datasets. However it remains to be an active and challenged research topic. Therefore, in this survey, we summarize the state- of-art computation techniques for TEG. We collect these techniques from three di erent research communities: i)The data mining community for graph analysis; ii)The theory community for graph algorithm; iii)The computation community for graph computing framework. Based on our study, we also propose our own computing framework DASH for TEG. We have even performed some experiments by comparing DASH and Graph Processing System (GPS).We are optimistic that this paper will help many researchers to understand various dimensions of problems in TEG and continue developing the necessary techniques to resolve these problems more eciently.
    Keywords: Big Data; Time evolving graphs; Computing framework; Algorithm; Data Mining.

  • Uncovering data stream behavior of automated analytical tasks in edge computing   Order a copy of this article
    by Lilian Hernandez, Monica Wachowicz, Robert Barton, Marc Breissinger 
    Abstract: Massive volumes of data streams are expected to be generated by the Internet of Things (IoT). Due to their dispersed and mobile nature, they need to be processed using automated analytical tasks. The research challenge is to uncover whether the data streams, which are being generated by billions of IoT devices, actually conform to a data flow that is required to perform streaming analytics. In this paper, we propose process discovery and conformance checking techniques of Process Mining in order to expose the flow dependency of IoT data streams between automated analytical tasks running at the edge of a network. Towards this end, we have developed a Petri Net model to ensure the optimal execution of analytical tasks by finding path deviations, bottlenecks, and parallelism. A real-world scenario in smart transit is used to evaluate the full advantage of our proposed model. Uncovering the actual behavior of data flows from IoT devices to edge nodes has allowed us to detect discrepancies that have a negative impact on the performance of automated analytical tasks.
    Keywords: streaming analytics; process mining; Petri Net; smart transit; Internet of Things; edge computing.

  • Combining the Richness of GIS Techniques with Visualization Tools to Better Understand the Spatial Distribution of Data- A Case Study of Chicago City Crime Analysis   Order a copy of this article
    by Omar Bani Taha, M. Omair Shafiq 
    Abstract: This study aims to achieve the following objective: (1) To explore the benefits of adding a Spatial GIS layer of analysis to other existing visualization techniques. (2) To identify and evaluate the patterns in selected crime data by analysing Chicagos open dataset and examine related existing literature on crime trends in this city. Some of the motivations for this study include the magnitude and scale of crime incidents across the world as well as the need for a better understanding of patterns and prediction of crime trends within the selected geographical location. We conclude that Chicago seems to be on course to have both the lowest violent crime rate since 1972, and the lowest murder frequency since 1967. Chicago has witnessed a vigorous drop in most crimes types over the last few years in compares to the previous crime index data. Also, Chicago crime naturally upsurges during summer months and declines during winter months. Our study results align with previous several decades of studies and analysis of Chicago crimes, in which the same communities of highest crime rates still experience the mainstream of crime. One may go back and compare the crime pattern of those 1930s study and will find it very typical. The present study confirmed the efficiency of the Geographic Information System and other visualization techniques as a tool in scrutinizing crimes in Chicago city.
    Keywords: spatial analysis; geographic information system (GIS); human-centred data science; visualization tools; traditional qualitative techniques; data visualization; spatial and crime mapping.

  • Improving collaborative filterings rating prediction coverage in sparse datasets by exploiting the friend of a friend concept   Order a copy of this article
    by Dionisis Margaris, Costas Vassilakis 
    Abstract: Collaborative filtering computes personalized recommendations by taking into account ratings expressed by users. Collaborative filtering algorithms firstly identify people having similar tastes, by examining the likeness of already entered ratings. Users with highly similar tastes are termed near neighbours and recommendations for a user are based on her near neighbours ratings. However, for a number of users no near neighbours can be found, a problem termed as the gray sheep problem. This problem is more intense in sparse datasets, i.e. datasets with relatively small number of ratings, compared to the number of users and items. In this work, we propose an algorithm for alleviating this problem by exploiting the friend of a friend (FOAF) concept. The proposed algorithm, CFfoaf, has been evaluated against eight widely used sparse datasets and under two widely used collaborative filtering correlation metrics, namely the Pearson Correlation Coefficient and the Cosine Similarity and has been proven to be particularly effective in increasing the percentage of users for which personalized recommendations can be formulated in the context of sparse datasets, while at the same time improving rating prediction quality.
    Keywords: collaborative filtering; recommender systems; sparse datasets; friend-of-a-friend; Pearson correlation coefficient; cosine similarity; evaluation.

  • Improving collaborative filterings rating prediction accuracy by considering users dynamic rating variability   Order a copy of this article
    by Dionisis Margaris, Costas Vassilakis 
    Abstract: Users that populate ratings databases, follow different marking practices, in the sense that some are stricter, while others are more lenient. Similarly, users rating practices may also differ in rating variability, in the sense that some users may be entering ratings close to their mean, while other users may be entering more extreme ratings, close to the limits of the rating scale. While this aspect has been recently addressed through the computation and exploitation of an overall rating variability measure per user, the fact that user rating practices may vary along the users rating history time axis may render the use of the overall rating variability measure inappropriate for performing the rating prediction adjustment. In this work, we: 1) propose an algorithm that considers two variability metrics per user, the global (overall) and the local one, with the latter representing the users variability at prediction time; 2) present alternative methods for computing a users local variability; 3) evaluate the performance of the proposed algorithm in terms of rating prediction quality and compare it against the state-of-the-art algorithm that employs a single variability metric in the rating prediction computation process.
    Keywords: collaborative filtering; users’ ratings dynamic variability; Pearson correlation coefficient; cosine similarity; evaluation; prediction accuracy.

  • Towards a systematic collect data process   Order a copy of this article
    by Iman Tikito, Nissrine Souissi 
    Abstract: Big data has become a known topic by a large number of researchers in different areas. Actions to improve data lifecycle in Big Data context was conduct in different phases and focused mainly on problems such as storage, security, analysis and visualization. In this paper, we focus basically on improvement of collect phase, which make the other phases more efficient and effective. We propose in this paper a process to follow to resolve the problematic of collecting a huge amount of data and as a result, optimize data lifecycle. To do this, we analyze different data collect processes present in literature and identify the similitude with the process of Systematic Literature Review. We apply our process by mapping the seven characteristics of Big Data with the sub-processes of proposed collect data process. This mapping provides a guide for the customer to have a clear decision of the need to use the proposed process by answering a set of questions.
    Keywords: Big Data; Data Collect; Data Lifecycle; Systematic Literature Review; Process; SLR.

  • Real-time Maritime Anomaly Detection: Detecting intentional AIS switch-off   Order a copy of this article
    by Ioannis Kontopoulos, Konstantinos Chatzikokolakis, Dimitrios Zissis, Konstantinos Tserpes, Giannis Spiliopoulos 
    Abstract: Today, most of the maritime surveillance systems rely on the Automatic Identification System (AIS), which is compulsory for vessels of specific categories to carry. Anomaly detection typically refers to the problem of finding patterns in data that do not conform to expected behaviour. AIS switch-off is such a pattern that refers to the fact that many vessels turn off their AIS transponder in order to hide their whereabouts when travelling in waters with frequent piracy attacks or potential illegal activity, thus deceiving either the authorities or other piracy vessels. Furthermore, fishing vessels switch off their AIS transponders so as other fishing vessels do not fish in the same area. To the best of our knowledge limited work has focused on AIS switch-off in real-time. We present a system that detects such cases in real-time and can handle high velocity, large volume of streams of AIS messages received from terrestrial base stations. We evaluate the proposed system in a real-world dataset collected from AIS receivers and show the achieved detection accuracy.
    Keywords: distributed stream processing; big data; AIS vessel monitoring; anomaly detection.

  • A survey about legible Arabic fonts for young readers   Order a copy of this article
    by Anoual El Kah, Abdelhak Lakhouaja 
    Abstract: Reading is an interconnected cognitive process including recognition and comprehension. The objective of the reading act could not be achieved unless the text is legible enough to interpret. For that reason, legibility is crucial for the reading mechanism, it will affect reading speed and the recognition of the graphs in the right way. Based on the fact that fonts and the way the text is presented influence childrens reading performance and fluency, the current paper investigates different Arabic fonts in order to determine the optimal font for a fluent reading for children with a low rate of errors in both printed and on-screen texts. This study recruits 33 primary Moroccan school students of third grade and investigates the reading fluency and error rates for five Arabic font types. This paper recommends, as a result, the use of Simplified Arabic font for reducing reading errors due to graphs presentation for either printed or on-screen texts.
    Keywords: reading; legibility; Arabic language; fonts; primary schools; Simplified Arabic.

  • A Survey on Context-Aware Monitoring in Healthcare with Big Data   Order a copy of this article
    by Reeja S R, Murthuza Ali, Rino Cherian 
    Abstract: Context aware monitoring are the three words which are building a rapt in healthcare with help of emerging technologies like BIG DATA, Cloud Computing, IoT etc. through which the healthcare had reached to advance level. With advance to the emerging technologies Wireless Sensor Network and Body Sensor Networks are playing a prominent role in healthcare through which the data is collected and send to the cloud for better analysis. The services with context-aware are built in mobile services and applications so that they can offer contextually needed data to the developer. The word context recognizes the information in automatic manner and respond anticipated according to the needs and helps the people to be aware of social surroundings based on the contextual information. As various mobile devices, technologies, application and networks are developing in huge number the efforts to make a usable application is more important in achieve success in the industry. As context-aware is spreading in different technologies with Big Data, Cloud, IoT, Machine Learning is making adverse effects with these Technologies its gaining boom in market which helps the context aware to develop further in technology to make people life easy and finding a solution to a appropriate problem.
    Keywords: Context-aware monitoring; Big Data; Cloud Computing; Iot; Machine Learning.

  • To beat or not to beat: uncovering the world social battles in Wikipedia   Order a copy of this article
    by Massimo Marchiori, Enrico Bonetti Vieno 
    Abstract: The online world has deeply changed the rules with which we engage with information. We have at our disposal a huge amount of information, growing every single day, and as such with the increasing need to wisely access it. Because of this evolution, a few selected system have emerged as information centralizers, providing easy and seamless access to information: on the one side we have the search engines, which try to compress the number of pages to interact with, and on the other side we have systems like Wikipedia, that try to compress information being an online encyclopaedia. These two systems (search engine and Wikipedia) have together had an enormous success, simplifying the process of information foraging. However, success also brings problems. In the case of Wikipedia, these problems are due to its distributed nature: everybody can access and contribute. As such the Wikipedia system, playing a primary role of information distribution, has been subject to "attacks", in the form of attempts to manipulate information for the most various reasons. This extra layer of information manipulation is however practically invisible to the general public, that only sees the final outcome, usually taking it as a reference source. In this paper we describe the Negapedia system, which is an attempt to actually provide the general public with a more complete picture of what is actually going on with information. We describe the challenges and choices that had to be made, coming not only from the point of big data analysis, but also and foremost from the problem of potential information overload, given the general target audience. Along this journey, we also provide some novel insights on the important issue of Wikipedia categorization, analysing the problem of presenting general users with easy and meaningful category information, thus helping users (and scholars) to better tame the multitude of information topics present nowadays in Wikipedia.rn
    Keywords: Social data; big data analysis; Wikipedia; categorization; online information; bias; data science.