Template-Type: ReDIF-Article 1.0
Author-Name: Abasat Mirzaei
Author-X-Name-First: Abasat
Author-X-Name-Last: Mirzaei
Author-Name: Fatemeh Hoseini
Author-X-Name-First: Fatemeh
Author-X-Name-Last: Hoseini
Author-Name: Mehrshad Lalinia
Author-X-Name-First: Mehrshad
Author-X-Name-Last: Lalinia
Title: An optimisation approach for determining the efficiency of vital medical devices in intensive care units with COVID-19 patients using Apriori algorithm
Abstract:
Improving the process of strategic management in hospitals preparation and equipping the intensive care units (ICUs) and the availability of medical devices plays an important role for knowing consumer behaviour and need. This cross-sectional study was performed in the ICU of Farhikhtegan Hospital, Tehran, Iran for a period of six months. During these months, ten medical devices have been used 5,497 times. These devices include: ventilator, oxygen cylinder, infusion pump, electrocardiography machine, vital signs monitor, oxygen flowmeter, wavy mattress, ultrasound sonography machine, ultrasound echocardiography machine, and dialysis machine. The Apriori algorithm showed that four devices: ventilator, oxygen cylinder, vital signs monitoring device, oxygen flowmeter are the most used ones by patients. These devices are positively correlated with each other and their confidence is over 80% and their support is 73%. For validating the results, we have used equivalence class clustering and bottom-up lattice traversal (ECLAT) algorithm in our dataset.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 154-168
Issue: 2
Volume: 15
Year: 2023
Keywords: medical equipment; COVID-19; hospital; Apriori algorithm; technology management; healthcare equipment; medical devices; data mining; medical data; association rule; ECLAT algorithm.
File-URL: http://www.inderscience.com/link.php?id=131377
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:2:p:154-168

Template-Type: ReDIF-Article 1.0
Author-Name: Moustafa Sadek Kahil
Author-X-Name-First: Moustafa Sadek
Author-X-Name-Last: Kahil
Author-Name: Abdelkrim Bouramoul
Author-X-Name-First: Abdelkrim
Author-X-Name-Last: Bouramoul
Author-Name: Makhlouf Derdour
Author-X-Name-First: Makhlouf
Author-X-Name-Last: Derdour
Title: Big data visual exploration as a recommendation problem
Abstract:
Big data visual exploration is believed to be considered as a recommendation problem. This proximity concerns essentially their purpose: it consists in selecting among huge amount of data those that are the most valuable according to specific criteria, to eventually present it to users. On the other hand, the recommendation systems are recently resolved mostly using neural networks (NNs). The present paper proposes three alternative solutions to improve the big data visual exploration based on recommendation using matrix factorisation (MF) namely: conventional, alternating least squares (ALS)-based and NN-based methods. It concerns generating the implicit data used to build recommendations, and providing the most valuable data patterns according to the user profiles. The first two solutions are developed using Apache Spark, while the third one was developed using TensorFlow2. A comparison based on results is done to show the most efficient one. The results show their applicability and effectiveness.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 133-153
Issue: 2
Volume: 15
Year: 2023
Keywords: big data visualisation; recommendation systems; collaborative filtering; content-based filtering; matrix factorisation; alternating least square; machine learning; neural networks.
File-URL: http://www.inderscience.com/link.php?id=131378
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:2:p:133-153

Template-Type: ReDIF-Article 1.0
Author-Name: Isha Gupta
Author-X-Name-First: Isha
Author-X-Name-Last: Gupta
Author-Name: Indranath Chatterjee
Author-X-Name-First: Indranath
Author-X-Name-Last: Chatterjee
Author-Name: Neha Gupta
Author-X-Name-First: Neha
Author-X-Name-Last: Gupta
Title: Identification of relevant features influencing movie reviews using sentiment analysis
Abstract:
Sentiment analysis is a systematic text mining research that examines individuals' behaviour, approach, and viewpoint. This paper analyses viewers' sentiments towards the movies released during the pandemic. This study employs the sentiment analysis techniques on movie reviews' accessed in real-time from internet movie database (IMDb). The paper's main objective is to identify the potential words that contribute to the biases of the reviews and influence overall viewers. The proposed methodology has employed valence aware dictionary for sentiment reasoning based on sentiment analysis of overall reviews, followed by application to various movie genres. Finally, we have applied Pearson's correlation analysis to find the association between the words among the genres. The paper also calculates the sentiment scores of reviews using different sentiment analysis models. Our results showed a minimum of 17% features common genre-wise. It reveals sets of most distinct influential words, which may be vital for understanding the nature of the language used for a particular kind of movie.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 169-183
Issue: 2
Volume: 15
Year: 2023
Keywords: sentiment analysis; feature selection; sentiment scores; internet movie database; IMDb reviews; adjectives and adverbs features.
File-URL: http://www.inderscience.com/link.php?id=131395
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:2:p:169-183

Template-Type: ReDIF-Article 1.0
Author-Name: Ayşe Şenyürek
Author-X-Name-First: Ayşe
Author-X-Name-Last: Şenyürek
Author-Name: Selçuk Alp
Author-X-Name-First: Selçuk
Author-X-Name-Last: Alp
Title: Churn prediction in telecommunication sector with machine learning methods
Abstract:
The aim of this study is to construct a model in which the subscribers are able to cancel their subscriptions in the telecommunication sector. In this context, it was aimed to select data, to prepare the preliminary preparation, to use machine learning method, performance criteria and measurement processes. According to logistic regression, artificial neural network, random forest and boosting method, potential churn subscribers were estimated. When the results of the study are examined, it is seen that the boosting method gives more accurate and successful results than the other methods. The most important factors causing customer churn was the period remaining until the end of the contract, tenure, which operator preferred the close relatives and the quality of the network.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 184-202
Issue: 2
Volume: 15
Year: 2023
Keywords: churn analysis; telecommunication; customer relation management; CRM; machine learning.
File-URL: http://www.inderscience.com/link.php?id=131396
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:2:p:184-202

Template-Type: ReDIF-Article 1.0
Author-Name: Makhlouf Ledmi
Author-X-Name-First: Makhlouf
Author-X-Name-Last: Ledmi
Author-Name: Mohammed El Habib Souidi
Author-X-Name-First: Mohammed El Habib
Author-X-Name-Last: Souidi
Author-Name: Michael Hahsler
Author-X-Name-First: Michael
Author-X-Name-Last: Hahsler
Author-Name: Abdeldjalil Ledmi
Author-X-Name-First: Abdeldjalil
Author-X-Name-Last: Ledmi
Author-Name: Chafia Kara-Mohamed
Author-X-Name-First: Chafia
Author-X-Name-Last: Kara-Mohamed
Title: Mining association rules for classification using frequent generator itemsets in arules package
Abstract:
Mining frequent itemsets is an attractive research activity in data mining whose main aim is to provide useful relationships among data. Consequently, several open-source development platforms are continuously developed to facilitate the users' exploitation of new data mining tasks. Among these platforms, the R language is one of the most popular tools. In this paper, we propose an extension of &lt;i&gt;arules&lt;/i&gt; package by adding the option of mining frequent generator itemsets. We discuss in detail how generators can be used for a classification task through an application example in relation with COVID-19.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 203-221
Issue: 2
Volume: 15
Year: 2023
Keywords: frequent generator itemsets; FGIs; classification; association rules; data mining; R language.
File-URL: http://www.inderscience.com/link.php?id=131399
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:2:p:203-221

Template-Type: ReDIF-Article 1.0
Author-Name: Amina Madani
Author-X-Name-First: Amina
Author-X-Name-Last: Madani
Author-Name: Fatima Boumahdi
Author-X-Name-First: Fatima
Author-X-Name-Last: Boumahdi
Author-Name: Anfel Boukenaoui
Author-X-Name-First: Anfel
Author-X-Name-Last: Boukenaoui
Author-Name: Mohamed Chaouki Kritli
Author-X-Name-First: Mohamed Chaouki
Author-X-Name-Last: Kritli
Author-Name: Asma Ghribi
Author-X-Name-First: Asma
Author-X-Name-Last: Ghribi
Author-Name: Fatma Limani
Author-X-Name-First: Fatma
Author-X-Name-Last: Limani
Author-Name: Hamza Hentabli
Author-X-Name-First: Hamza
Author-X-Name-Last: Hentabli
Title: An ABC approach for depression signs on social networks posts
Abstract:
Mental health is considered as one of today's world's most prominent plagues. In this paper, we aim to solve one of mental health's biggest issues, which is depression. Using the potential of social media platforms, our ABC approach is based on a combination of different deep learning models that are autoencoder, BiLSTM and CNN. We test our approach and discuss our experiments on three datasets of Reddit posts provided by 2019, 2020 and 2021 Conference and Labs of the Evaluation Forum (CLEF).
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 275-296
Issue: 3
Volume: 15
Year: 2023
Keywords: depression signs; social networks; deep learning; convolutional neural network; CNN; BiLSTM; autoencoder.
File-URL: http://www.inderscience.com/link.php?id=132972
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:3:p:275-296

Template-Type: ReDIF-Article 1.0
Author-Name: Mohammed El Amine Laghzaoui
Author-X-Name-First: Mohammed El Amine
Author-X-Name-Last: Laghzaoui
Author-Name: Yahia Lebbah
Author-X-Name-First: Yahia
Author-X-Name-Last: Lebbah
Title: A constraint programming approach for quantitative frequent pattern mining
Abstract:
Itemset mining is the first pattern mining problem studied in the literature. Most of the itemset mining studies have considered only Boolean datasets, where each transaction can contain or not items. In practical applications, items appear in some transactions with some quantities. In this paper, we propose an extension of the current efficient constraint programming approach for itemset mining, to take into account quantitative items in order to find patterns with their quantities directly on the original quantitative dataset. The contribution is two folds. Firstly, we facilitate the modelling task of mining problems through a new constraint. Secondly, we propose a new filtering algorithm to handle the frequency and closeness constraints. Experiments performed on standard benchmark datasets with numerous mining constraints show that our approach enables to find more informative quantitative patterns, which are better in running time than quantitative approaches based on classical Boolean patterns.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 297-311
Issue: 3
Volume: 15
Year: 2023
Keywords: itemset mining; quantitative database; closed itemset mining; constraint programming.
File-URL: http://www.inderscience.com/link.php?id=132973
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:3:p:297-311

Template-Type: ReDIF-Article 1.0
Author-Name: Nouha Arfaoui
Author-X-Name-First: Nouha
Author-X-Name-Last: Arfaoui
Title: A new process for healthcare big data warehouse integration
Abstract:
Healthcare domain generates huge amount of data from different and heterogynous clinical data sources using different devices to ensure a good managing hospital performance. Because of the quantity and complexity structure of the data, we use big healthcare data warehouse for the storage first and the decision making later. To achieve our goal, we propose a new process that deals with this type of data. It starts by unifying the different data, then it extracts it, loads it into big healthcare data warehouse and finally it makes the necessary transformations. For the first step, the ontology is used. It is the best solution to solve the problem of data sources heterogeneity. We use, also, Hadoop and its ecosystem including Hive, MapReduce and HDFS to accelerate the treatment through the parallelism exploiting the performance of ELT to ensure the 'schema-on-read' where the data is stored before performing the transformation tasks.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 240-254
Issue: 3
Volume: 15
Year: 2023
Keywords: big healthcare data warehouse; BHDW; Hive; Hadoop; MapReduce; ontology; big data; ELT; ETL.
File-URL: http://www.inderscience.com/link.php?id=132974
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:3:p:240-254

Template-Type: ReDIF-Article 1.0
Author-Name: Wallace Anacleto Pinheiro
Author-X-Name-First: Wallace Anacleto
Author-X-Name-Last: Pinheiro
Author-Name: Ana Bárbara Sapienza Pinheiro
Author-X-Name-First: Ana Bárbara Sapienza
Author-X-Name-Last: Pinheiro
Title: Hierarchical&#43;&#43;: improving the hierarchical clustering algorithm
Abstract:
Hierarchical grouping is a widely used grouping strategy. However, this technique often provides lower results when compared to other approaches, such as K-means clustering. In addition, many algorithms try to correct hierarchical fails refactoring intermediate clustering combination actions, which may worsen performance. In this work, we propose a new set of procedures that alter the hierarchical technique to improve its results. The idea is to do it right the first time, avoiding refactoring previous steps. These modifications involve the concept of golden boxes, based on initial points named seeds, which indicate groups that must keep disconnected. To assess our strategy, we compare the results of some approaches: traditional hierarchical clustering (single-link, complete-link, average, weighted, centroid, and median), K-means, K-means&#43;&#43;, and the proposed method, named Hierarchical&#43;&#43;. An experimental evaluation indicates that our proposal far surpasses the compared strategies.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 223-239
Issue: 3
Volume: 15
Year: 2023
Keywords: clustering; grouping; similarity; golden boxes; complex distributions; dendrograms; hierarchical; K-means; seed; centroid.
File-URL: http://www.inderscience.com/link.php?id=132975
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:3:p:223-239

Template-Type: ReDIF-Article 1.0
Author-Name: Behnam Khamoushpour
Author-X-Name-First: Behnam
Author-X-Name-Last: Khamoushpour
Author-Name: Abbas Sheikh Aboumasoudi
Author-X-Name-First: Abbas Sheikh
Author-X-Name-Last: Aboumasoudi
Author-Name: Arash Shahin
Author-X-Name-First: Arash
Author-X-Name-Last: Shahin
Author-Name: Shakiba Khademolqorani
Author-X-Name-First: Shakiba
Author-X-Name-Last: Khademolqorani
Title: Designing a model for selecting, ranking and optimising service quality indicators using meta-heuristic algorithms
Abstract:
The purpose of this study is to select and rank the indicators affecting service quality and minimise the service quality gap. In this regards, two famous methods of meta-heuristic algorithms, one genetic algorithm and the other particle swarm optimisation, and their combination with support vector machine, namely 'GA-SVM and PSO-SVM' are used. Also, two macro quality indicators, including five performance indicators and five service quality gap indicators from the SERVQUAL model are considered. GA-SVM algorithm has been used to select the effective indicators in service quality and PSO-SVM has been implemented to rank these indicators. The efficiency and accuracy of the presented approach were confirmed through implementation on a manufacturing company. According to the obtained data, the two performance indicators of the final time of service level and the level of response do not play an important role in measuring and improving the quality of services provided in the company.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 255-274
Issue: 3
Volume: 15
Year: 2023
Keywords: service quality; information technology service management; ITSM; genetic algorithm; particle swarm optimisation; PSO; support vector machine; SVM; optimisation.
File-URL: http://www.inderscience.com/link.php?id=132981
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:3:p:255-274

Template-Type: ReDIF-Article 1.0
Author-Name: Neha Gupta
Author-X-Name-First: Neha
Author-X-Name-Last: Gupta
Title: Optimising data quality of a data warehouse using data purgation process
Abstract:
The rapid growth of data collection and storage services has impacted the quality of the data. Data purgation process helps in maintaining and improving the data quality when the data is subject to extract, transform and load (ETL) methodology. Metadata may contain unnecessary information which can be defined as dummy values, cryptic values or missing values. The present work has improved the EM algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics has been implemented to ensure dummy values, Wards algorithm with Minkowski distance has been applied to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics has been applied to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse. The proposed algorithms have helped in maintaining the accuracy, integrity, consistency, non-redundancy of data in a timely manner.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 102-131
Issue: 1
Volume: 15
Year: 2023
Keywords: data warehouse; DW; data quality; DQ; extract; transform and load; ETL; data purgation; DP.
File-URL: http://www.inderscience.com/link.php?id=129961
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:1:p:102-131

Template-Type: ReDIF-Article 1.0
Author-Name: Alfredo Cuzzocrea
Author-X-Name-First: Alfredo
Author-X-Name-Last: Cuzzocrea
Author-Name: Fabio Martinelli
Author-X-Name-First: Fabio
Author-X-Name-Last: Martinelli
Author-Name: Francesco Mercaldo
Author-X-Name-First: Francesco
Author-X-Name-Last: Mercaldo
Title: A deep-learning approach to game bot identification via behavioural features analysis in complex massively-cooperative environments
Abstract:
In the so-called &lt;i&gt;massively multiplayer online role-playing games&lt;/i&gt; (MMORPGs), malicious players have the possibility of obtaining some kind of gains from competitions, via easy victories achieved thanks to the introduction of game bots in the games. In order to maintain fairness among players, it is important to detect the presence of game bots during video games so that they can be expelled from the games. This paper describes an approach to distinguish human players from game bots based on behavioural analysis. This implemented via supervised &lt;i&gt;machine learning&lt;/i&gt; (ML) and &lt;i&gt;deep learning&lt;/i&gt; (DL) algorithms. In order to detect game bots, considered algorithms are first trained with labelled features and then used to classify unseen-before features. In this paper, the performance of our game bots detection approach is experimentally obtained. The dataset we use for training and classification is extracted from logs generated during online video games matches of a real-life MMORPG.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 1-29
Issue: 1
Volume: 15
Year: 2023
Keywords: game bot detection; complex massively-cooperative environments; machine learning; deep learning; massively multiplayer online role-playing games; MMORPGs.
File-URL: http://www.inderscience.com/link.php?id=129963
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:1:p:1-29

Template-Type: ReDIF-Article 1.0
Author-Name: Sima Hadadian
Author-X-Name-First: Sima
Author-X-Name-Last: Hadadian
Author-Name: Zahra Naji-Azimi
Author-X-Name-First: Zahra
Author-X-Name-Last: Naji-Azimi
Author-Name: Nasser Motahari Farimani
Author-X-Name-First: Nasser Motahari
Author-X-Name-Last: Farimani
Author-Name: Behrouz Minaei-Bidgoli
Author-X-Name-First: Behrouz
Author-X-Name-Last: Minaei-Bidgoli
Title: Application of rule-based data mining in extracting the rules from the number of patients and climatic factors in instantaneous to long-term spectrum
Abstract:
Predicting the number of patients helps managers to allocate resources in hospitals efficiently. In this research, the relationship between the number of patients with the temperature, relative humidity, wind speed, air pressure, and air pollution in instantaneous, short-, medium- and long-term indices was investigated. Genetic algorithm and ID3 decision tree have been used for feature selection, and classification based on multidimensional association rule mining algorithm has been applied for rule mining. The data have been collected for 19 months from a pediatric hospital whose wards are nephrology, hematology, emergency, and PICU. The results show that in the long-term index, all climatic factors are correlated with the number of patients in all wards. Also, several if-then rules have been obtained, indicating the relationship between climate factors in four indices with the number of patients in each hospital ward. According to if-then rules, optimal planning can be done for resource allocation in the hospital.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 30-52
Issue: 1
Volume: 15
Year: 2023
Keywords: temperature; relative humidity; wind speed; air pressure; air pollution; patients; hospital; association rule mining; classification; genetic algorithm; ID3 decision tree.
File-URL: http://www.inderscience.com/link.php?id=129964
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:1:p:30-52

Template-Type: ReDIF-Article 1.0
Author-Name: Khaled Bedjou
Author-X-Name-First: Khaled
Author-X-Name-Last: Bedjou
Author-Name: Faical Azouaou
Author-X-Name-First: Faical
Author-X-Name-Last: Azouaou
Title: Detection of terrorism's apologies on Twitter using a new bi-lingual dataset
Abstract:
A lot of terrorist apology content is being shared on social media without being detected. Therefore, the automatic and immediate detection of these contents is essential for people's safety. In this paper, we propose a language independent process to detect and classify terrorism's apologies on Twitter into three classes (apology, no apology, and neutral). We tested the process on a bi-lingual (Arabic and English) dataset of 12,155 manually annotated tweets. We conducted two sets of experiments, one with imbalanced data and the other with oversampled data. We compared the classification performances of four machine learning algorithms (RF, DT, KNN, and NB) and five deep learning algorithms (GRU, SimpleRNN, LSTM, BiLSTM, and BERT). Our comparative study concluded that BERT achieves better classification performance than the others do, with an accuracy of 0.84 for Arabic and 0.81 for English on imbalanced data, and 0.88 for Arabic and 0.91 for English on oversampled data.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 331-354
Issue: 4
Volume: 15
Year: 2023
Keywords: terrorism's apology; social network analysis; Twitter; NLP; sentiment analysis; machine learning; deep learning; transfer learning.
File-URL: http://www.inderscience.com/link.php?id=134581
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:4:p:331-354

Template-Type: ReDIF-Article 1.0
Author-Name: Alaa Khalaf Hamoud
Author-X-Name-First: Alaa Khalaf
Author-X-Name-Last: Hamoud
Author-Name: Ali Salah Alasady
Author-X-Name-First: Ali Salah
Author-X-Name-Last: Alasady
Author-Name: Wid Akeel Awadh
Author-X-Name-First: Wid Akeel
Author-X-Name-Last: Awadh
Author-Name: Jasim Mohammed Dahr
Author-X-Name-First: Jasim Mohammed
Author-X-Name-Last: Dahr
Author-Name: Mohammed B.M. Kamel
Author-X-Name-First: Mohammed B.M.
Author-X-Name-Last: Kamel
Author-Name: Aqeel Majeed Humadi
Author-X-Name-First: Aqeel Majeed
Author-X-Name-Last: Humadi
Author-Name: Ihab Ahmed Najm
Author-X-Name-First: Ihab Ahmed
Author-X-Name-Last: Najm
Title: A comparative study of supervised/unsupervised machine learning algorithms with feature selection approaches to predict student performance
Abstract:
The field of educational data mining (EDM) is one of the most growing fields that aims to improve the performance of students, academic staff, and overall institutional performance. The implementing process of data mining algorithms almost needs the feature selection process to find the most correlated features and improve the accuracy. In this paper, a comparative study is performed to study implementation of supervised/unsupervised algorithms in predicting the students' performance. The student's grade is classified using different fields of supervised and unsupervised algorithms such as decision trees, clustering, and neural networks. These algorithms were examined over the questionnaire dataset before/after feature selection to measure the effect of feature selection on the result accuracy. The results showed that the random forest decision tree outperformed other supervised/unsupervised algorithms. The results also showed that the performance evaluation of algorithms with the dataset after removing the less correlated attributes is enhanced for most of the algorithms.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 393-409
Issue: 4
Volume: 15
Year: 2023
Keywords: educational data mining; EDM; students' performance; supervised algorithms; unsupervised algorithms; feature selection.
File-URL: http://www.inderscience.com/link.php?id=134590
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:4:p:393-409

Template-Type: ReDIF-Article 1.0
Author-Name: Donald Douglas Atsa'am
Author-X-Name-First: Donald Douglas
Author-X-Name-Last: Atsa'am
Author-Name: Frank Adusei-Mensah
Author-X-Name-First: Frank
Author-X-Name-Last: Adusei-Mensah
Author-Name: Oluwafemi Samson Balogun
Author-X-Name-First: Oluwafemi Samson
Author-X-Name-Last: Balogun
Author-Name: Temidayo Oluwatosin Omotehinwa
Author-X-Name-First: Temidayo Oluwatosin
Author-X-Name-Last: Omotehinwa
Author-Name: Oluwaseun Alexander Dada
Author-X-Name-First: Oluwaseun Alexander
Author-X-Name-Last: Dada
Author-Name: Richard Osei Agjei
Author-X-Name-First: Richard Osei
Author-X-Name-Last: Agjei
Author-Name: Samuel Nii Odoi Devine
Author-X-Name-First: Samuel Nii Odoi
Author-X-Name-Last: Devine
Title: A novel taxonomy of natural disasters based on casualty and consequence using hierarchical clustering
Abstract:
Post-disaster management requires a proportional deployment of human and material resources. The number of resources required to manage a disaster cannot be known without first evaluating the extent of casualty and consequence. This study proposed a taxonomy for classifying natural disasters based on casualty and consequence. Using a secondary data on global disasters from 1900 to 2021, the hierarchical cluster analysis technique was deployed for taxonomy formation. The learning algorithm evaluated the similarities in numbers of deaths, injuries, and the cost of damaged property caused by disasters. Three clusters were extracted which sub-grouped historical disasters based on similarities in casualty and consequence. Further, a taxonomy that defines the ranges of what constitute low, average, and high deaths/injuries/damage was established. Classifying a future disaster with this taxonomy prior to the deployment of resources for rescue, resettlement, compensation, and other disaster management operations will guide efficient resource allocation on a case-by-case basis.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 313-330
Issue: 4
Volume: 15
Year: 2023
Keywords: disaster taxonomy; natural disasters; casualty and consequence; post-disaster management; hierarchical cluster analysis.
File-URL: http://www.inderscience.com/link.php?id=134591
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:4:p:313-330

Template-Type: ReDIF-Article 1.0
Author-Name: Mohammad Khanbabaei
Author-X-Name-First: Mohammad
Author-X-Name-Last: Khanbabaei
Author-Name: Pantea Parsi
Author-X-Name-First: Pantea
Author-X-Name-Last: Parsi
Author-Name: Najmeh Farhadi
Author-X-Name-First: Najmeh
Author-X-Name-Last: Farhadi
Title: Using data mining to integrate recency-frequency-monetary value analysis and credit scoring methods for bank customer behaviour analysis
Abstract:
Banks apply credit scoring to identify customers with low credit risk. Additionally, recency-frequency-monetary value (RFM) analysis method is suitable for identifying valuable bank customers. Data mining techniques can be used to discover useful patterns hidden in customer data. However, in previous research, data mining has been used separately in both credit scoring and RFM approaches. To evaluate customer behaviour, banks must employ credit scoring and RFM analysis method, simultaneously. This study proposes a framework for using data mining techniques to integrate credit scoring and RFM methods in the field of banking. In this framework, k-means had better performance than Kohonen network and DBSCAN to identify and cluster valuable customers based on the RFM and credit scoring indices. Moreover, the C5 decision tree, BN, and SVM with 94.10%, 92.71%, and 92.36% accuracy had better performance to classify valuable bank customers based on RFM and credit scoring indices.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 369-392
Issue: 4
Volume: 15
Year: 2023
Keywords: data mining; RFM method; credit scoring; banking; marketing.
File-URL: http://www.inderscience.com/link.php?id=134598
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:4:p:369-392

Template-Type: ReDIF-Article 1.0
Author-Name: You-Xuan Lin
Author-X-Name-First: You-Xuan
Author-X-Name-Last: Lin
Title: Adaptable address parser with active learning
Abstract:
Address parsing, decomposing address strings to semantically meaningful components, is a measure to convert unstructured or semi-structured address data to structured one. Flexibility and variability in real-world address formats make parser development a non-trivial task. Even after all the time and effort dedicated to obtaining a capable parser, updating or even re-training is required for out-of-domain data and extra costs will be incurred. To minimise the cost of model building and updating, this study experiments with active learning for model training and adaptation. Models composed of character-level embedding and recurrent neural networks are trained to parse address in Taiwan. Results show that by active learning, 420 additional instances to the training data are sufficient for a model to adapt itself to unfamiliar data while its competence in the original domain is retained. This suggests that active learning is helpful for model adaptation when data labelling is expensive and restricted.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 79-101
Issue: 1
Volume: 15
Year: 2023
Keywords: address parsing; record linkage; active learning; model adaptation; recurrent neural network; RNN; address in Taiwan.
File-URL: http://www.inderscience.com/link.php?id=129991
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:1:p:79-101

Template-Type: ReDIF-Article 1.0
Author-Name: Rajeev Kumar Gupta
Author-X-Name-First: Rajeev Kumar
Author-X-Name-Last: Gupta
Author-Name: Arti Jain
Author-X-Name-First: Arti
Author-X-Name-Last: Jain
Author-Name: Ruchika Kumar
Author-X-Name-First: Ruchika
Author-X-Name-Last: Kumar
Author-Name: R.K. Pateriya
Author-X-Name-First: R.K.
Author-X-Name-Last: Pateriya
Title: Capturing uncertainties through log analysis using DevOps
Abstract:
DevOps is an advancement of agile processes which is mainly used to improve the coordination between development and operation teams. Continuous practices survive within the core of the DevOps which ensures efficient pipelines and high-quality delivery of software. Using such practices in a synchronous, business dynamics compliance and ever-changing needs of clients can meet high performance and reliable final products. This research work is an attempt to propose a simplified solution, guideline and tools support for developing and maintaining quality of continuous practices that are used in the DevOps project. The system automates the correlation among various telemetry data to contribute towards enriching log analysis and reduces manual efforts. The proposed system undergoes in-depth analysis of logs, promotes quality assessments and feedback to developers, which in result, helps in deeper problem diagnosis of the telemetry data. In this work, an empirical study is carried out to gain conceptual clarity on integrated pipeline architecture and to address how automation in continuous monitoring accelerates and extends the feedback loop in the system.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 53-78
Issue: 1
Volume: 15
Year: 2023
Keywords: agile; DevOps; log analysis; telemetry data; software development life cycle; SDLC.
File-URL: http://www.inderscience.com/link.php?id=129995
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:1:p:53-78

Template-Type: ReDIF-Article 1.0
Author-Name: Muhammad Irfan Yousuf
Author-X-Name-First: Muhammad Irfan
Author-X-Name-Last: Yousuf
Author-Name: Raheel Anwar
Author-X-Name-First: Raheel
Author-X-Name-Last: Anwar
Title: Weighted edge sampling for static graphs
Abstract:
Graph sampling provides an efficient yet inexpensive solution for analysing large graphs. The purpose of sampling a graph is to extract a small representative subgraph from a big graph so that the sample can be used in place of the big graph for studying and analysing it. In this paper, we propose a new sampling method called weighted edge sampling. In this method, we give equal weight to all the edges in the beginning. During the sampling process, we sample an edge with the probability proportional to its weight. When an edge is sampled, we increase the weight of its neighbouring edges and this increases their probability to be sampled. Our method extracts the neighbourhood of a sampled edge more efficiently than previous approaches. We evaluate the efficacy of our sampling approach empirically using several real-world datasets. We find that our method produces better samples than the previous approaches. Our results show that our samples better estimate the degree and path length of the original graphs whereas our samples are less efficient in estimating the clustering coefficient of a graph.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 355-368
Issue: 4
Volume: 15
Year: 2023
Keywords: graph sampling; edge sampling; edge weight; graph induction.
File-URL: http://www.inderscience.com/link.php?id=134612
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:4:p:355-368