Forthcoming articles

International Journal of Data Science

International Journal of Data Science (IJDS)

These articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Register for our alerting service, which notifies you by email when new issues are published online.

Open AccessArticles marked with this Open Access icon are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.
We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Data Science (12 papers in press)

Regular Issues

  • Fake News and Misinformation Detection on Headlines of COVID-19 Using Deep Learning Algorithms   Order a copy of this article
    by Xin Wang, Peng Zhao, Xi Chen 
    Abstract: This article proposed a deep learning algorithm system to fulfill fake news and misinformation detection on COVID-19 related headlines. LSTM, CNN, and DBNs are performed in order to determine the optimal algorithm. Based on the model performance measures, such as accuracy, AUC score, and F1 score, this study figures out the optimal models, which are CNN and LSTM with an accuracy of up to 94%, for the COVID-19 fake news detection. Finally, this paper provides an algorithm-based ranking method for mainstream media credibilities. The result indicates that mainstream media channels in the U.S. are reliable for reporting COVID-19 related news and information.rn
    Keywords: COVID-19; fake news detection; deep learning algorithms; big data analytics; mainstream media credibility.

  • Application of nonlinear stochastic single source of error state space models in the forecasting of mobile subscribers in India   Order a copy of this article
    by Prabir Kumar Das 
    Abstract: The nonlinear stochastic single source of error state space model with error, trend, and seasonality (ETS) was employed and found appropriate for modeling mobile subscriber time series data for individual metro cities, total mobile subscribers in all metro cities, and subscribers in all of India using monthly data from March 1997 to December 2018. Out of the different ETS models, the multiplicative error, additive trend, and no seasonality (M, A, N) models were appropriate for all series. These models were compared to the autoregressive integrated moving average model. The final model was identified based on the DieboldMariano test and time series cross-validation. The performance of the final model was compared to the long short-term memory (LSTM) Model. The mean absolute error and root mean squared error showed that the ETS (M, A, N) performed superior over the standard LSTM. The ETS (M, A, N) model was used for computing the point forecast and 95% confidence intervals of the forecast values for the next 24 months. The subscribers of Delhi, Mumbai, Kolkata, and India are projected, at 95% probability, to have a high of 100 million, 70 million, 60 million, and 2000 million subscribers, respectively, by December 2020.
    Keywords: Mobile subscribers; State space model; ETS; LSTM; Time series cross-validation.

  • Managing Employee Turnover: Machine Learning to the Rescue   Order a copy of this article
    by Owen Hall 
    Abstract: Organizations continue to face ongoing employee retention and recruiting challenges, which have become even more acute due to the COVID-19 pandemic. In todays unstable economy, employee retention is still one of the hot button issues facing many HR managers. Employee turnover has cost organizations billions of dollars each year. The empirical results from the current study, which included employee demographic, preference, and performance data, suggests that machine learning-based predictive models can provide automatic and timely employee assessments, which allow for both the identification of employees that may be planning to leave and the implementation of appropriate amelioration initiatives. Job engagement, work satisfaction, experience, and compensation are but four of the factors found to be closely aligned with an employees decision to leave. The primary purpose of this article is to highlight how machine learning can reduce employee turnover through early detection and intervention.
    Keywords: Machine learning; human resource management; employee turnover; actionable knowledge discovery; intervention strategies.

  • Bayesian Survival Analysis of Under-Five Pneumonia Patients in Tercha General Hospital, Dawro Zone, South West Ethiopia   Order a copy of this article
    by Lema Abate, Megersa Tadesse 
    Abstract: Pneumonia is among the major killer diseases in under-five children in the world. In developing countries, 3 million children die each year due to pneumonia. Ethiopia is one of the 15 pneumonia high burden countries. The aim of this study was to examine the risk factors of the survival time of under-five pneumonia patients using Bayesian approach analysis. A total of 281 under-five pneumonia patients were included in this study. The parametric survival models such as Weibull, Lognormal, and Log-logistic baseline distributions were used to fit the datasets by introducing prior distributions. The DIC value was used to compare the baseline distributions and based on the DIC value the Weibull baseline distribution was selected as a good model to fit the under-five pneumonia dataset. The results obtained from the Weibull survival model showed that patients from urban residence and patients who admitted during a minimum number of patient nurse ratio (PNR) prolong timing death of under-five pneumonia patients, while patients admitted during spring and summer season, patients suffered co-morbidity and severe acute malnutrition (SAM) were shorten the timing of the death of patients. Factors such as sex, residence, Season of Diagnosis, Comorbidity, Severe Acute Malnutrition (SAM), Patient refer status, and Patient to Nurse Ratio (PNR) associated with the survival time of under-five pneumonia in this study. The concerned body should give attention to the factors identified in this study to prevent the mortality of under-five children due to pneumonia.
    Keywords: Pneumonia; Under-Five; Parametric Models; Risk Factors; Bayesian approach; WinBUGs.

Special Issue on: Scalable Provision of Semantically Relevant Web Content on Big Data Frameworks

  • Dynamic sorting and average skyline method for query processing in Spatial-Temporal Data   Order a copy of this article
    by John A, Shubham Kumar Singh, Adimoolam M, Ananth Kumar T 
    Abstract: With the continuous advancement in mobile computing and the development of positioning in devices, querying of moving objects on road networking is an important task in the internet world. As a result of this development, a huge amount of data management and query processing plays a vital role in spatial and temporal applications. A large amount of data that is being coupled with different query processing requires efficient indexing. The main problems in spatiotemporal are managing data indexing, update, and query processing. This work-related to query processing in Spatio-temporal data to update different dynamic queries of users. The previous work of query processing will not support all the end-users. The proposed dynamic sorting and average skyline method will support different kinds of queries. This method is dynamic sorting and average skyline (DSAS) and it produces effective query processing to different users at different locations. The skyline query processing technique produces the result for the dominating objects when compared with the other query processing techniques.
    Keywords: Spatial-Temporal Data- Query processing – skyline Query Processing.

    by T. Ananth Kumar, R. Raj Mohan, M. Adithya, R. Sunder 
    Abstract: A Wireless specially appointed system is a self-sorting out, self-arranging confederation of remote frameworks. WANET gadgets will interface and leave the system non-concurring freely, and there are no predefined customers or server. The dynamic topologies, portable correspondences structure, decentralized control, and namelessness makes numerous difficulties to the security of frameworks and system foundation in a WANET domain. Therefore, this outrageous type of dynamic and circulated model requires a revaluation of customary ways to deal with security implementations. Kill the spillage intrigue happened by the at least two Wireless gadgets imparting by means of ideal hand-off with decentralized Wireless hubs utilizing Wireless specially appointed system. The general deferral is decreased with increment in throughput. We propose a Deep Learning based Low Overhead Localized Flooding (DL-LOLF) strategy dependent on Query Localization system. The directing packets, which proliferate back to a source, are disposed of to lighten superfluous rebroadcasting. This venture contemplates the significant of remote correspondence under Attacker plot where identifying the dropper (spillage) hub. To give information about the security improvement in remote correspondence organizes by utilizing Network Layer calculation. Re-enactment results show that our proposed technique can decrease steering overhead and MAC impact rate without giving up parcel conveyance proportion contrasted with existing conventions
    Keywords: WANET; Deep learning; Bait message; LOLF; Dropper node;.

  • Proficient Approaches for Scalability and Security in IoT through Edge/Fog/Cloud Computing: A Survey   Order a copy of this article
    by SURESH KUMAR K, Radhamani A.S, Sundaresan S 
    Abstract: Cloud computing has become an advanced computing standard which came into existence by the introduction of the technologies like 5G and Internet of Things (IoT). For data warehousing, cloud computing paves an important role in processing and implementation. The security related issues are identified when the information is stored into the cloud. As of now, the data available in the cloud is enormous, so it founds to be difficult in accessing and analysing the generated datas because of the existence of problems like limited bandwidth, limited resources, inactivity and more security challenges. For solving these kind of issues some of the other technologies like fog computing and edge computing is introduced along with the cloud computing. In recent days, security has become more challenging task in accessing real time datas. In this paper, the IoT secrecy is analysed with the usage of it along with Edge/Cloud/Fog Computing. The various algorithms used, objectives, the proposed methodologies and its advantages are discussed in this paper.
    Keywords: Internet of Things (IoT); 5G; Data Ware Housing; Security Challenges; Cloud computing; Fog Computing and Edge computing.

    by S. Theetchenya, Ramasubbareddy Somula, S. Sankar, Syed Muzamil Basha 
    Abstract: Content-based Image Retrieval is one of the vital research areas in image processing. The Content-based image retrieval (CBIR), also known as query by image content, i.e., the problem searching for similar digital images in a large database. The existing content-based image retrieval system used to retrieve the relevant images lack inaccuracy. To improve the accuracy level of content-based image retrieval, the proposed system introduces an unsupervised Hybrid Approach. The proposed system gets the input images as color image. The preprocessing is performed using the Median filter. This system is extracted the feature such as color, Texture, Brightness Distribution, Euclidean Distance from Hybrid Approach using Color Histogram Algorithm, Texture Feature Detection Algorithm, Brightness Distribution Algorithm Euclidean Distance Algorithm. The entire algorithm is worked parallel manner and extracted the feature stored in database. While the user gives the query on this system, the real time comparison is made with feature stored database. Finally the proposed system retrieves the related image from database. The proposed system compared with various dataset Corel 10000, IMAGENET1M and also provides the betterment result.
    Keywords: Content Based Image Retrieval; Query by Image Content; Color Histogram; Texture Detection Algorithm; Brightness Distribution Algorithm.

Special Issue on: ETMS2018 and ETMS2019 Data Analytics in Engineering and Management

    by Selim COREKCIOGLU, Bekir POLAT 
    Abstract: Small and medium-sized enterprises (SMEs) have an important place in the economy due to the fact that 99.8% of businesses in Turkey are SMEs. It is important to survive for SMEs, especially newly founded enterprises. In order to help SMEs survive, KOSGEB which is SME development organization in Turkey provides the entrepreneurs with 3 year-support. However, the supported entrepreneurship projects still fail and cause to the waste of allocated resources for these projects. This study aimed to prevent waste of resource and to estimate the success and failure of proposed entrepreneurship projects with data mining algorithms. Thereby, the accuracy of the estimates increased and decisions about the projects were based on a scientific approach. As data of the study, the projects evaluated by KOSGEB Gaziantep Directorate between 2012-2014 were analyzed by taking some features such as age, gender, experience, education, partnership structure, market, location, sector, personnel, and capital into consideration. As a result of the analysis of the data, it has been examined whether entrepreneurial projects were successful or not. The data obtained from the entrepreneurship projects were pre-processed and adapted to WEKA 3.9.2 software. The dataset was classified using 10-fold cross-validation with C4.5, Naive Bayes, Logistic Regression, Random Forest and Support Vector algorithms. The results of the classification were compared and the C4.5 algorithm was found as the most successful algorithm with 70.75% prediction accuracy. In consequence of the C4.5 algorithm, the features affecting the tree were found as capital, partner, location, and age, respectively. The features that did not affect the tree were gender, education, market, sector, and personnel.
    Keywords: Entrepreneurship; SME; Data Mining; Classification.

Special Issue on: Healthcare Evolution in Big Data Analytics Challenges, Trends and Applications

  • A top-down Outlook on Artificial Intelligence applied to Healthcare Systems and possible Advantage of an Unsupervised Learning Tool to medical Issues   Order a copy of this article
    by Vicente Gonzalez-Prida, Jesús P. Zamora 
    Abstract: The objective of this paper is to discuss the implementation of AI in a sector like healthcare, providing a methodology based on Machine Learning for the pattern recognition in the behavior of a specific outbreak. With this purpose, this paper starts defining a conceptual map on Artificial Intelligence, making a difference between that area and the Machine Learning field. Afterwards, the document deepens in Machine Learning concepts linking them with the usual learning processes and their corresponding applications, providing a view as well on Artificial Neural Network, in order to observe advantages of this useful tool. Subsequently, possible applications for medical issues are depicted. With that, a case study is described proposing an unsupervised learning process for the pattern recognition applying the Jensen-Shannon Divergence. This example is based on the behavior of the reproductive number of a specific outbreak among different areas. Some conclusions are commented at the end, indicating as well possible future research lines on this field
    Keywords: Artificial Intelligence; Healthcare System; Jensen-Shannon Divergence; Machine Learning; Medical Issues; Neural Network; Unsupervised Learning.

  • Mining the Irish Hip Fracture Database: Learning Factors Contributing to Care Outcomes   Order a copy of this article
    by Mahmoud Elbattah, Owen Molloy 
    Abstract: Data Analytics has opened the door for improving many aspects pertaining to the delivery of healthcare. This study avails of unsupervised Machine Learning to extract knowledge from the Irish Hip Fracture Database. The dataset under consideration contained patient records over three years 2013-2015. The process of knowledge discovery included using data clustering and Rule Mining. With cluster analysis, possible correlations were explored related to patient characteristics, care-related factors or outcomes. Further, association rules were discovered to learn the potential factors leading to a prolonged length of stay (LOS). In essence, our results highlight the significant impact of the pre-surgery waiting time on the LOS. The cluster analysis and association rules consistently emphasised that patients who experienced longer periods of pre-surgery waiting time tended to have longer LOS periods. The insights delivered are believed to yield practical implications to be considered within the treatment of hip fractures, especially in the case of elderly patients.
    Keywords: Machine Learning; Unsupervised Learning; Clustering; Rule Mining; Hip Fracture Care.

  • Intelligent Technique for Human Authentication using Hand vein   Order a copy of this article
    by Mona Abdel Aziz, Mohamed Roushdy, Abdel-Badeeh M. Salem 
    Abstract: In this paper, we propose a new intelligent technique to authenticate human using dorsal hand vein (DHV) pattern. Recently, authentication was adopted by smart hospitals in many countries as an intelligent tool for patient identification to prevent insurance and can connect the patient with his/her medical record securely. In this paper we developed an image analysis technique to extract region of interest (ROI) from DHV image. After extracting ROI we design a sequence of preprocessing steps to improve hand vein images using Median filter, Wiener filter and Contrast Limited Adaptive Histogram Equalization (CLAHE) to enhance hand vein image. Our smart technique is based on the following intelligent algorithms, namely; principal component analysis (PCA) algorithm for feature extraction and k-Nearest Neighbors (K-NN) classifier for matching operation .This technique has been applied on the Bosphorus Hand Vein Database. The experimental results show that the result of (CRR) is 91.2 %
    Keywords: Biometric; dorsal hand vein; computational intelligence; feature extraction; PCA; K-NN; machine learning.