International Journal of Data Science (12 papers in press)
Fake News and Misinformation Detection on Headlines of COVID-19 Using Deep Learning Algorithms
by Xin Wang, Peng Zhao, Xi Chen
Abstract: This article proposed a deep learning algorithm system to fulfill fake news and misinformation detection on COVID-19 related headlines. LSTM, CNN, and DBNs are performed in order to determine the optimal algorithm. Based on the model performance measures, such as accuracy, AUC score, and F1 score, this study figures out the optimal models, which are CNN and LSTM with an accuracy of up to 94%, for the COVID-19 fake news detection. Finally, this paper provides an algorithm-based ranking method for mainstream media credibilities. The result indicates that mainstream media channels in the U.S. are reliable for reporting COVID-19 related news and information.rn
Keywords: COVID-19; fake news detection; deep learning algorithms; big data analytics; mainstream media credibility.
Application of nonlinear stochastic single source of error state space models in the forecasting of mobile subscribers in India
by Prabir Kumar Das
Abstract: The nonlinear stochastic single source of error state space model with error, trend, and seasonality (ETS) was employed and found appropriate for modeling mobile subscriber time series data for individual metro cities, total mobile subscribers in all metro cities, and subscribers in all of India using monthly data from March 1997 to December 2018. Out of the different ETS models, the multiplicative error, additive trend, and no seasonality (M, A, N) models were
appropriate for all series. These models were compared to the autoregressive integrated moving average model. The final model was identified based on the DieboldMariano test and time series cross-validation. The performance of the final model was compared to the long short-term memory (LSTM) Model. The mean absolute error and root mean squared error showed that the ETS (M, A, N) performed superior over the standard LSTM. The ETS (M, A, N) model was used for computing the point forecast and 95% confidence intervals of the forecast values for the next 24 months. The subscribers of Delhi, Mumbai, Kolkata, and India are projected, at 95% probability, to have a high of 100 million, 70 million, 60 million, and 2000 million subscribers, respectively, by December 2020.
Keywords: Mobile subscribers; State space model; ETS; LSTM; Time series cross-validation.
Managing Employee Turnover: Machine Learning to the Rescue
by Owen Hall
Abstract: Organizations continue to face ongoing employee retention and recruiting challenges, which have become even more acute due to the COVID-19 pandemic. In todays unstable economy, employee retention is still one of the hot button issues facing many HR managers. Employee turnover has cost organizations billions of dollars each year. The empirical results from the current study, which included employee demographic, preference, and performance data, suggests that machine learning-based predictive models can provide automatic and timely employee assessments, which allow for both the identification of employees that may be planning to leave and the implementation of appropriate amelioration initiatives. Job engagement, work satisfaction, experience, and compensation are but four of the factors found to be closely aligned with an employees decision to leave. The primary purpose of this article is to highlight how machine learning can reduce employee turnover through early detection and intervention.
Keywords: Machine learning; human resource management; employee turnover; actionable knowledge discovery; intervention strategies.
Bayesian Survival Analysis of Under-Five Pneumonia Patients in Tercha General Hospital, Dawro Zone, South West Ethiopia
by Lema Abate, Megersa Tadesse
Abstract: Pneumonia is among the major killer diseases in under-five children in the world. In developing countries, 3 million children die each year due to pneumonia. Ethiopia is one of the 15 pneumonia high burden countries. The aim of this study was to examine the risk factors of the survival time of under-five pneumonia patients using Bayesian approach analysis. A total of 281 under-five pneumonia patients were included in this study. The parametric survival models such as Weibull, Lognormal, and Log-logistic baseline distributions were used to fit the datasets by introducing prior distributions. The DIC value was used to compare the baseline distributions and based on the DIC value the Weibull baseline distribution was selected as a good model to fit the under-five pneumonia dataset. The results obtained from the Weibull survival model showed that patients from urban residence and patients who admitted during a minimum number of patient nurse ratio (PNR) prolong timing death of under-five pneumonia patients, while patients admitted during spring and summer season, patients suffered co-morbidity and severe acute malnutrition (SAM) were shorten the timing of the death of patients. Factors such as sex, residence, Season of Diagnosis, Comorbidity, Severe Acute Malnutrition (SAM), Patient refer status, and Patient to Nurse Ratio (PNR) associated with the survival time of under-five pneumonia in this study. The concerned body should give attention to the factors identified in this study to prevent the mortality of under-five children due to pneumonia.
Keywords: Pneumonia; Under-Five; Parametric Models; Risk Factors; Bayesian approach; WinBUGs.
Special Issue on: Scalable Provision of Semantically Relevant Web Content on Big Data Frameworks
Dynamic sorting and average skyline method for query processing in Spatial-Temporal Data
by John A, Shubham Kumar Singh, Adimoolam M, Ananth Kumar T
Abstract: With the continuous advancement in mobile computing and the development of positioning in devices, querying of moving objects on road networking is an important task in the internet world. As a result of this development, a huge amount of data management and query processing plays a vital role in spatial and temporal applications. A large amount of data that is being coupled with different query processing requires efficient indexing. The main problems in spatiotemporal are managing data indexing, update, and query processing. This work-related to query processing in Spatio-temporal data to update different dynamic queries of users. The previous work of query processing will not support all the end-users. The proposed dynamic sorting and average skyline method will support different kinds of queries. This method is dynamic sorting and average skyline (DSAS) and it produces effective query processing to different users at different locations. The skyline query processing technique produces the result for the dominating objects when compared with the other query processing techniques.
Keywords: Spatial-Temporal Data- Query processing – skyline Query Processing.
A NOVEL SECURITY SCHEME USING DEEP LEARNING BASED LOW OVERHEAD LOCALIZED FLOODING ALGORITHM FOR WIRELESS SENSOR NETWORK's
by T. Ananth Kumar, R. Raj Mohan, M. Adithya, R. Sunder
Abstract: A Wireless specially appointed system is a self-sorting out, self-arranging confederation of remote frameworks. WANET gadgets will interface and leave the system non-concurring freely, and there are no predefined customers or server. The dynamic topologies, portable correspondences structure, decentralized control, and namelessness makes numerous difficulties to the security of frameworks and system foundation in a WANET domain. Therefore, this outrageous type of dynamic and circulated model requires a revaluation of customary ways to deal with security implementations. Kill the spillage intrigue happened by the at least two Wireless gadgets imparting by means of ideal hand-off with decentralized Wireless hubs utilizing Wireless specially appointed system. The general deferral is decreased with increment in throughput. We propose a Deep Learning based Low Overhead Localized Flooding (DL-LOLF) strategy dependent on Query Localization system. The directing packets, which proliferate back to a source, are disposed of to lighten superfluous rebroadcasting. This venture contemplates the significant of remote correspondence under Attacker plot where identifying the dropper (spillage) hub. To give information about the security improvement in remote correspondence organizes by utilizing Network Layer calculation. Re-enactment results show that our proposed technique can decrease steering overhead and MAC impact rate without giving up parcel conveyance proportion contrasted with existing conventions
Keywords: WANET; Deep learning; Bait message; LOLF; Dropper node;.
Proficient Approaches for Scalability and Security in IoT through Edge/Fog/Cloud Computing: A Survey
by SURESH KUMAR K, Radhamani A.S, Sundaresan S
Abstract: Cloud computing has become an advanced computing standard which came into existence by the introduction of the technologies like 5G and Internet of Things (IoT). For data warehousing, cloud computing paves an important role in processing and implementation. The security related issues are identified when the information is stored into the cloud. As of now, the data available in the cloud is enormous, so it founds to be difficult in accessing and analysing the generated datas because of the existence of problems like limited bandwidth, limited resources, inactivity and more security challenges. For solving these kind of issues some of the other technologies like fog computing and edge computing is introduced along with the cloud computing. In recent days, security has become more challenging task in accessing real time datas. In this paper, the IoT secrecy is analysed with the usage of it along with Edge/Cloud/Fog Computing. The various algorithms used, objectives, the proposed methodologies and its advantages are discussed in this paper.
Keywords: Internet of Things (IoT); 5G; Data Ware Housing; Security Challenges; Cloud computing; Fog Computing and Edge computing.
HYBRID APPROACH BASED CONTENT BASED IMAGE RETRIEVAL
by S. Theetchenya, Ramasubbareddy Somula, S. Sankar, Syed Muzamil Basha
Abstract: Content-based Image Retrieval is one of the vital research areas in image processing. The Content-based image retrieval (CBIR), also known as query by image content, i.e., the problem searching for similar digital images in a large database. The existing content-based image retrieval system used to retrieve the relevant images lack inaccuracy. To improve the accuracy level of content-based image retrieval, the proposed system introduces an unsupervised Hybrid Approach. The proposed system gets the input images as color image. The preprocessing is performed using the Median filter. This system is extracted the feature such as color, Texture, Brightness Distribution, Euclidean Distance from Hybrid Approach using Color Histogram Algorithm, Texture Feature Detection Algorithm, Brightness Distribution Algorithm Euclidean Distance Algorithm. The entire algorithm is worked parallel manner and extracted the feature stored in database. While the user gives the query on this system, the real time comparison is made with feature stored database. Finally the proposed system retrieves the related image from database. The proposed system compared with various dataset Corel 10000, IMAGENET1M and also provides the betterment result.
Keywords: Content Based Image Retrieval; Query by Image Content; Color Histogram; Texture Detection Algorithm; Brightness Distribution Algorithm.
Special Issue on: ETMS2018 and ETMS2019 Data Analytics in Engineering and Management
ESTIMATION OF SUCCESS OF ENTREPRENEURSHIP PROJECTS WITH DATA MINING
by Selim COREKCIOGLU, Bekir POLAT
Abstract: Small and medium-sized enterprises (SMEs) have an important place in the economy due to the fact that 99.8% of businesses in Turkey are SMEs. It is important to survive for SMEs, especially newly founded enterprises. In order to help SMEs survive, KOSGEB which is SME development organization in Turkey provides the entrepreneurs with 3 year-support. However, the supported entrepreneurship projects still fail and cause to the waste of allocated resources for these projects. This study aimed to prevent waste of resource and to estimate the success and failure of proposed entrepreneurship projects with data mining algorithms. Thereby, the accuracy of the estimates increased and decisions about the projects were based on a scientific approach. As data of the study, the projects evaluated by KOSGEB Gaziantep Directorate between 2012-2014 were analyzed by taking some features such as age, gender, experience, education, partnership structure, market, location, sector, personnel, and capital into consideration. As a result of the analysis of the data, it has been examined whether entrepreneurial projects were successful or not. The data obtained from the entrepreneurship projects were pre-processed and adapted to WEKA 3.9.2 software. The dataset was classified using 10-fold cross-validation with C4.5, Naive Bayes, Logistic Regression, Random Forest and Support Vector algorithms. The results of the classification were compared and the C4.5 algorithm was found as the most successful algorithm with 70.75% prediction accuracy. In consequence of the C4.5 algorithm, the features affecting the tree were found as capital, partner, location, and age, respectively. The features that did not affect the tree were gender, education, market, sector, and personnel.
Keywords: Entrepreneurship; SME; Data Mining; Classification.
Special Issue on: Healthcare Evolution in Big Data Analytics Challenges, Trends and Applications
A top-down Outlook on Artificial Intelligence applied to Healthcare Systems and possible Advantage of an Unsupervised Learning Tool to medical Issues
by Vicente Gonzalez-Prida, Jesús P. Zamora
Abstract: The objective of this paper is to discuss the implementation of AI in a sector like healthcare, providing a methodology based on Machine Learning for the pattern recognition in the behavior of a specific outbreak. With this purpose, this paper starts defining a conceptual map on Artificial Intelligence, making a difference between that area and the Machine Learning field. Afterwards, the document deepens in Machine Learning concepts linking them with the usual learning processes and their corresponding applications, providing a view as well on Artificial Neural Network, in order to observe advantages of this useful tool. Subsequently, possible applications for medical issues are depicted. With that, a case study is described proposing an unsupervised learning process for the pattern recognition applying the Jensen-Shannon Divergence. This example is based on the behavior of the reproductive number of a specific outbreak among different areas. Some conclusions are commented at the end, indicating as well possible future research lines on this field
Keywords: Artificial Intelligence; Healthcare System; Jensen-Shannon Divergence; Machine Learning; Medical Issues; Neural Network; Unsupervised Learning.
Mining the Irish Hip Fracture Database: Learning Factors Contributing to Care Outcomes
by Mahmoud Elbattah, Owen Molloy
Abstract: Data Analytics has opened the door for improving many aspects pertaining to the delivery of healthcare. This study avails of unsupervised Machine Learning to extract knowledge from the Irish Hip Fracture Database. The dataset under consideration contained patient records over three years 2013-2015. The process of knowledge discovery included using data clustering and Rule Mining. With cluster analysis, possible correlations were explored related to patient characteristics, care-related factors or outcomes. Further, association rules were discovered to learn the potential factors leading to a prolonged length of stay (LOS). In essence, our results highlight the significant impact of the pre-surgery waiting time on the LOS. The cluster analysis and association rules consistently emphasised that patients who experienced longer periods of pre-surgery waiting time tended to have longer LOS periods. The insights delivered are believed to yield practical implications to be considered within the treatment of hip fractures, especially in the case of elderly patients.
Keywords: Machine Learning; Unsupervised Learning; Clustering; Rule Mining; Hip Fracture Care.
Intelligent Technique for Human Authentication using Hand vein
by Mona Abdel Aziz, Mohamed Roushdy, Abdel-Badeeh M. Salem
Abstract: In this paper, we propose a new intelligent technique to authenticate human using dorsal hand vein (DHV) pattern. Recently, authentication was adopted by smart hospitals in many countries as an intelligent tool for patient identification to prevent insurance and can connect the patient with his/her medical record securely. In this paper we developed an image analysis technique to extract region of interest (ROI) from DHV image. After extracting ROI we design a sequence of preprocessing steps to improve hand vein images using Median filter, Wiener filter and Contrast Limited Adaptive Histogram Equalization (CLAHE) to enhance hand vein image. Our smart technique is based on the following intelligent algorithms, namely; principal component analysis (PCA) algorithm for feature extraction and k-Nearest Neighbors (K-NN) classifier for matching operation .This technique has been applied on the Bosphorus Hand Vein Database. The experimental results show that the result of (CRR) is 91.2 %
Keywords: Biometric; dorsal hand vein; computational intelligence; feature extraction; PCA; K-NN; machine learning.