Template-Type: ReDIF-Article 1.0 Author-Name: Abasat Mirzaei Author-X-Name-First: Abasat Author-X-Name-Last: Mirzaei Author-Name: Fatemeh Hoseini Author-X-Name-First: Fatemeh Author-X-Name-Last: Hoseini Author-Name: Mehrshad Lalinia Author-X-Name-First: Mehrshad Author-X-Name-Last: Lalinia Title: An optimisation approach for determining the efficiency of vital medical devices in intensive care units with COVID-19 patients using Apriori algorithm Abstract: Improving the process of strategic management in hospitals preparation and equipping the intensive care units (ICUs) and the availability of medical devices plays an important role for knowing consumer behaviour and need. This cross-sectional study was performed in the ICU of Farhikhtegan Hospital, Tehran, Iran for a period of six months. During these months, ten medical devices have been used 5,497 times. These devices include: ventilator, oxygen cylinder, infusion pump, electrocardiography machine, vital signs monitor, oxygen flowmeter, wavy mattress, ultrasound sonography machine, ultrasound echocardiography machine, and dialysis machine. The Apriori algorithm showed that four devices: ventilator, oxygen cylinder, vital signs monitoring device, oxygen flowmeter are the most used ones by patients. These devices are positively correlated with each other and their confidence is over 80% and their support is 73%. For validating the results, we have used equivalence class clustering and bottom-up lattice traversal (ECLAT) algorithm in our dataset. Journal: Int. J. of Data Mining, Modelling and Management Pages: 154-168 Issue: 2 Volume: 15 Year: 2023 Keywords: medical equipment; COVID-19; hospital; Apriori algorithm; technology management; healthcare equipment; medical devices; data mining; medical data; association rule; ECLAT algorithm. File-URL: http://www.inderscience.com/link.php?id=131377 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:2:p:154-168 Template-Type: ReDIF-Article 1.0 Author-Name: Moustafa Sadek Kahil Author-X-Name-First: Moustafa Sadek Author-X-Name-Last: Kahil Author-Name: Abdelkrim Bouramoul Author-X-Name-First: Abdelkrim Author-X-Name-Last: Bouramoul Author-Name: Makhlouf Derdour Author-X-Name-First: Makhlouf Author-X-Name-Last: Derdour Title: Big data visual exploration as a recommendation problem Abstract: Big data visual exploration is believed to be considered as a recommendation problem. This proximity concerns essentially their purpose: it consists in selecting among huge amount of data those that are the most valuable according to specific criteria, to eventually present it to users. On the other hand, the recommendation systems are recently resolved mostly using neural networks (NNs). The present paper proposes three alternative solutions to improve the big data visual exploration based on recommendation using matrix factorisation (MF) namely: conventional, alternating least squares (ALS)-based and NN-based methods. It concerns generating the implicit data used to build recommendations, and providing the most valuable data patterns according to the user profiles. The first two solutions are developed using Apache Spark, while the third one was developed using TensorFlow2. A comparison based on results is done to show the most efficient one. The results show their applicability and effectiveness. Journal: Int. J. of Data Mining, Modelling and Management Pages: 133-153 Issue: 2 Volume: 15 Year: 2023 Keywords: big data visualisation; recommendation systems; collaborative filtering; content-based filtering; matrix factorisation; alternating least square; machine learning; neural networks. File-URL: http://www.inderscience.com/link.php?id=131378 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:2:p:133-153 Template-Type: ReDIF-Article 1.0 Author-Name: Isha Gupta Author-X-Name-First: Isha Author-X-Name-Last: Gupta Author-Name: Indranath Chatterjee Author-X-Name-First: Indranath Author-X-Name-Last: Chatterjee Author-Name: Neha Gupta Author-X-Name-First: Neha Author-X-Name-Last: Gupta Title: Identification of relevant features influencing movie reviews using sentiment analysis Abstract: Sentiment analysis is a systematic text mining research that examines individuals' behaviour, approach, and viewpoint. This paper analyses viewers' sentiments towards the movies released during the pandemic. This study employs the sentiment analysis techniques on movie reviews' accessed in real-time from internet movie database (IMDb). The paper's main objective is to identify the potential words that contribute to the biases of the reviews and influence overall viewers. The proposed methodology has employed valence aware dictionary for sentiment reasoning based on sentiment analysis of overall reviews, followed by application to various movie genres. Finally, we have applied Pearson's correlation analysis to find the association between the words among the genres. The paper also calculates the sentiment scores of reviews using different sentiment analysis models. Our results showed a minimum of 17% features common genre-wise. It reveals sets of most distinct influential words, which may be vital for understanding the nature of the language used for a particular kind of movie. Journal: Int. J. of Data Mining, Modelling and Management Pages: 169-183 Issue: 2 Volume: 15 Year: 2023 Keywords: sentiment analysis; feature selection; sentiment scores; internet movie database; IMDb reviews; adjectives and adverbs features. File-URL: http://www.inderscience.com/link.php?id=131395 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:2:p:169-183 Template-Type: ReDIF-Article 1.0 Author-Name: Ayşe Şenyürek Author-X-Name-First: Ayşe Author-X-Name-Last: Şenyürek Author-Name: Selçuk Alp Author-X-Name-First: Selçuk Author-X-Name-Last: Alp Title: Churn prediction in telecommunication sector with machine learning methods Abstract: The aim of this study is to construct a model in which the subscribers are able to cancel their subscriptions in the telecommunication sector. In this context, it was aimed to select data, to prepare the preliminary preparation, to use machine learning method, performance criteria and measurement processes. According to logistic regression, artificial neural network, random forest and boosting method, potential churn subscribers were estimated. When the results of the study are examined, it is seen that the boosting method gives more accurate and successful results than the other methods. The most important factors causing customer churn was the period remaining until the end of the contract, tenure, which operator preferred the close relatives and the quality of the network. Journal: Int. J. of Data Mining, Modelling and Management Pages: 184-202 Issue: 2 Volume: 15 Year: 2023 Keywords: churn analysis; telecommunication; customer relation management; CRM; machine learning. File-URL: http://www.inderscience.com/link.php?id=131396 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:2:p:184-202 Template-Type: ReDIF-Article 1.0 Author-Name: Makhlouf Ledmi Author-X-Name-First: Makhlouf Author-X-Name-Last: Ledmi Author-Name: Mohammed El Habib Souidi Author-X-Name-First: Mohammed El Habib Author-X-Name-Last: Souidi Author-Name: Michael Hahsler Author-X-Name-First: Michael Author-X-Name-Last: Hahsler Author-Name: Abdeldjalil Ledmi Author-X-Name-First: Abdeldjalil Author-X-Name-Last: Ledmi Author-Name: Chafia Kara-Mohamed Author-X-Name-First: Chafia Author-X-Name-Last: Kara-Mohamed Title: Mining association rules for classification using frequent generator itemsets in arules package Abstract: Mining frequent itemsets is an attractive research activity in data mining whose main aim is to provide useful relationships among data. Consequently, several open-source development platforms are continuously developed to facilitate the users' exploitation of new data mining tasks. Among these platforms, the R language is one of the most popular tools. In this paper, we propose an extension of <i>arules</i> package by adding the option of mining frequent generator itemsets. We discuss in detail how generators can be used for a classification task through an application example in relation with COVID-19. Journal: Int. J. of Data Mining, Modelling and Management Pages: 203-221 Issue: 2 Volume: 15 Year: 2023 Keywords: frequent generator itemsets; FGIs; classification; association rules; data mining; R language. File-URL: http://www.inderscience.com/link.php?id=131399 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:2:p:203-221 Template-Type: ReDIF-Article 1.0 Author-Name: Amina Madani Author-X-Name-First: Amina Author-X-Name-Last: Madani Author-Name: Fatima Boumahdi Author-X-Name-First: Fatima Author-X-Name-Last: Boumahdi Author-Name: Anfel Boukenaoui Author-X-Name-First: Anfel Author-X-Name-Last: Boukenaoui Author-Name: Mohamed Chaouki Kritli Author-X-Name-First: Mohamed Chaouki Author-X-Name-Last: Kritli Author-Name: Asma Ghribi Author-X-Name-First: Asma Author-X-Name-Last: Ghribi Author-Name: Fatma Limani Author-X-Name-First: Fatma Author-X-Name-Last: Limani Author-Name: Hamza Hentabli Author-X-Name-First: Hamza Author-X-Name-Last: Hentabli Title: An ABC approach for depression signs on social networks posts Abstract: Mental health is considered as one of today's world's most prominent plagues. In this paper, we aim to solve one of mental health's biggest issues, which is depression. Using the potential of social media platforms, our ABC approach is based on a combination of different deep learning models that are autoencoder, BiLSTM and CNN. We test our approach and discuss our experiments on three datasets of Reddit posts provided by 2019, 2020 and 2021 Conference and Labs of the Evaluation Forum (CLEF). Journal: Int. J. of Data Mining, Modelling and Management Pages: 275-296 Issue: 3 Volume: 15 Year: 2023 Keywords: depression signs; social networks; deep learning; convolutional neural network; CNN; BiLSTM; autoencoder. File-URL: http://www.inderscience.com/link.php?id=132972 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:3:p:275-296 Template-Type: ReDIF-Article 1.0 Author-Name: Mohammed El Amine Laghzaoui Author-X-Name-First: Mohammed El Amine Author-X-Name-Last: Laghzaoui Author-Name: Yahia Lebbah Author-X-Name-First: Yahia Author-X-Name-Last: Lebbah Title: A constraint programming approach for quantitative frequent pattern mining Abstract: Itemset mining is the first pattern mining problem studied in the literature. Most of the itemset mining studies have considered only Boolean datasets, where each transaction can contain or not items. In practical applications, items appear in some transactions with some quantities. In this paper, we propose an extension of the current efficient constraint programming approach for itemset mining, to take into account quantitative items in order to find patterns with their quantities directly on the original quantitative dataset. The contribution is two folds. Firstly, we facilitate the modelling task of mining problems through a new constraint. Secondly, we propose a new filtering algorithm to handle the frequency and closeness constraints. Experiments performed on standard benchmark datasets with numerous mining constraints show that our approach enables to find more informative quantitative patterns, which are better in running time than quantitative approaches based on classical Boolean patterns. Journal: Int. J. of Data Mining, Modelling and Management Pages: 297-311 Issue: 3 Volume: 15 Year: 2023 Keywords: itemset mining; quantitative database; closed itemset mining; constraint programming. File-URL: http://www.inderscience.com/link.php?id=132973 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:3:p:297-311 Template-Type: ReDIF-Article 1.0 Author-Name: Nouha Arfaoui Author-X-Name-First: Nouha Author-X-Name-Last: Arfaoui Title: A new process for healthcare big data warehouse integration Abstract: Healthcare domain generates huge amount of data from different and heterogynous clinical data sources using different devices to ensure a good managing hospital performance. Because of the quantity and complexity structure of the data, we use big healthcare data warehouse for the storage first and the decision making later. To achieve our goal, we propose a new process that deals with this type of data. It starts by unifying the different data, then it extracts it, loads it into big healthcare data warehouse and finally it makes the necessary transformations. For the first step, the ontology is used. It is the best solution to solve the problem of data sources heterogeneity. We use, also, Hadoop and its ecosystem including Hive, MapReduce and HDFS to accelerate the treatment through the parallelism exploiting the performance of ELT to ensure the 'schema-on-read' where the data is stored before performing the transformation tasks. Journal: Int. J. of Data Mining, Modelling and Management Pages: 240-254 Issue: 3 Volume: 15 Year: 2023 Keywords: big healthcare data warehouse; BHDW; Hive; Hadoop; MapReduce; ontology; big data; ELT; ETL. File-URL: http://www.inderscience.com/link.php?id=132974 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:3:p:240-254 Template-Type: ReDIF-Article 1.0 Author-Name: Wallace Anacleto Pinheiro Author-X-Name-First: Wallace Anacleto Author-X-Name-Last: Pinheiro Author-Name: Ana Bárbara Sapienza Pinheiro Author-X-Name-First: Ana Bárbara Sapienza Author-X-Name-Last: Pinheiro Title: Hierarchical++: improving the hierarchical clustering algorithm Abstract: Hierarchical grouping is a widely used grouping strategy. However, this technique often provides lower results when compared to other approaches, such as K-means clustering. In addition, many algorithms try to correct hierarchical fails refactoring intermediate clustering combination actions, which may worsen performance. In this work, we propose a new set of procedures that alter the hierarchical technique to improve its results. The idea is to do it right the first time, avoiding refactoring previous steps. These modifications involve the concept of golden boxes, based on initial points named seeds, which indicate groups that must keep disconnected. To assess our strategy, we compare the results of some approaches: traditional hierarchical clustering (single-link, complete-link, average, weighted, centroid, and median), K-means, K-means++, and the proposed method, named Hierarchical++. An experimental evaluation indicates that our proposal far surpasses the compared strategies. Journal: Int. J. of Data Mining, Modelling and Management Pages: 223-239 Issue: 3 Volume: 15 Year: 2023 Keywords: clustering; grouping; similarity; golden boxes; complex distributions; dendrograms; hierarchical; K-means; seed; centroid. File-URL: http://www.inderscience.com/link.php?id=132975 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:3:p:223-239 Template-Type: ReDIF-Article 1.0 Author-Name: Behnam Khamoushpour Author-X-Name-First: Behnam Author-X-Name-Last: Khamoushpour Author-Name: Abbas Sheikh Aboumasoudi Author-X-Name-First: Abbas Sheikh Author-X-Name-Last: Aboumasoudi Author-Name: Arash Shahin Author-X-Name-First: Arash Author-X-Name-Last: Shahin Author-Name: Shakiba Khademolqorani Author-X-Name-First: Shakiba Author-X-Name-Last: Khademolqorani Title: Designing a model for selecting, ranking and optimising service quality indicators using meta-heuristic algorithms Abstract: The purpose of this study is to select and rank the indicators affecting service quality and minimise the service quality gap. In this regards, two famous methods of meta-heuristic algorithms, one genetic algorithm and the other particle swarm optimisation, and their combination with support vector machine, namely 'GA-SVM and PSO-SVM' are used. Also, two macro quality indicators, including five performance indicators and five service quality gap indicators from the SERVQUAL model are considered. GA-SVM algorithm has been used to select the effective indicators in service quality and PSO-SVM has been implemented to rank these indicators. The efficiency and accuracy of the presented approach were confirmed through implementation on a manufacturing company. According to the obtained data, the two performance indicators of the final time of service level and the level of response do not play an important role in measuring and improving the quality of services provided in the company. Journal: Int. J. of Data Mining, Modelling and Management Pages: 255-274 Issue: 3 Volume: 15 Year: 2023 Keywords: service quality; information technology service management; ITSM; genetic algorithm; particle swarm optimisation; PSO; support vector machine; SVM; optimisation. File-URL: http://www.inderscience.com/link.php?id=132981 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:3:p:255-274 Template-Type: ReDIF-Article 1.0 Author-Name: Neha Gupta Author-X-Name-First: Neha Author-X-Name-Last: Gupta Title: Optimising data quality of a data warehouse using data purgation process Abstract: The rapid growth of data collection and storage services has impacted the quality of the data. Data purgation process helps in maintaining and improving the data quality when the data is subject to extract, transform and load (ETL) methodology. Metadata may contain unnecessary information which can be defined as dummy values, cryptic values or missing values. The present work has improved the EM algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics has been implemented to ensure dummy values, Wards algorithm with Minkowski distance has been applied to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics has been applied to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse. The proposed algorithms have helped in maintaining the accuracy, integrity, consistency, non-redundancy of data in a timely manner. Journal: Int. J. of Data Mining, Modelling and Management Pages: 102-131 Issue: 1 Volume: 15 Year: 2023 Keywords: data warehouse; DW; data quality; DQ; extract; transform and load; ETL; data purgation; DP. File-URL: http://www.inderscience.com/link.php?id=129961 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:1:p:102-131 Template-Type: ReDIF-Article 1.0 Author-Name: Alfredo Cuzzocrea Author-X-Name-First: Alfredo Author-X-Name-Last: Cuzzocrea Author-Name: Fabio Martinelli Author-X-Name-First: Fabio Author-X-Name-Last: Martinelli Author-Name: Francesco Mercaldo Author-X-Name-First: Francesco Author-X-Name-Last: Mercaldo Title: A deep-learning approach to game bot identification via behavioural features analysis in complex massively-cooperative environments Abstract: In the so-called <i>massively multiplayer online role-playing games</i> (MMORPGs), malicious players have the possibility of obtaining some kind of gains from competitions, via easy victories achieved thanks to the introduction of game bots in the games. In order to maintain fairness among players, it is important to detect the presence of game bots during video games so that they can be expelled from the games. This paper describes an approach to distinguish human players from game bots based on behavioural analysis. This implemented via supervised <i>machine learning</i> (ML) and <i>deep learning</i> (DL) algorithms. In order to detect game bots, considered algorithms are first trained with labelled features and then used to classify unseen-before features. In this paper, the performance of our game bots detection approach is experimentally obtained. The dataset we use for training and classification is extracted from logs generated during online video games matches of a real-life MMORPG. Journal: Int. J. of Data Mining, Modelling and Management Pages: 1-29 Issue: 1 Volume: 15 Year: 2023 Keywords: game bot detection; complex massively-cooperative environments; machine learning; deep learning; massively multiplayer online role-playing games; MMORPGs. File-URL: http://www.inderscience.com/link.php?id=129963 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:1:p:1-29 Template-Type: ReDIF-Article 1.0 Author-Name: Sima Hadadian Author-X-Name-First: Sima Author-X-Name-Last: Hadadian Author-Name: Zahra Naji-Azimi Author-X-Name-First: Zahra Author-X-Name-Last: Naji-Azimi Author-Name: Nasser Motahari Farimani Author-X-Name-First: Nasser Motahari Author-X-Name-Last: Farimani Author-Name: Behrouz Minaei-Bidgoli Author-X-Name-First: Behrouz Author-X-Name-Last: Minaei-Bidgoli Title: Application of rule-based data mining in extracting the rules from the number of patients and climatic factors in instantaneous to long-term spectrum Abstract: Predicting the number of patients helps managers to allocate resources in hospitals efficiently. In this research, the relationship between the number of patients with the temperature, relative humidity, wind speed, air pressure, and air pollution in instantaneous, short-, medium- and long-term indices was investigated. Genetic algorithm and ID3 decision tree have been used for feature selection, and classification based on multidimensional association rule mining algorithm has been applied for rule mining. The data have been collected for 19 months from a pediatric hospital whose wards are nephrology, hematology, emergency, and PICU. The results show that in the long-term index, all climatic factors are correlated with the number of patients in all wards. Also, several if-then rules have been obtained, indicating the relationship between climate factors in four indices with the number of patients in each hospital ward. According to if-then rules, optimal planning can be done for resource allocation in the hospital. Journal: Int. J. of Data Mining, Modelling and Management Pages: 30-52 Issue: 1 Volume: 15 Year: 2023 Keywords: temperature; relative humidity; wind speed; air pressure; air pollution; patients; hospital; association rule mining; classification; genetic algorithm; ID3 decision tree. File-URL: http://www.inderscience.com/link.php?id=129964 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:1:p:30-52 Template-Type: ReDIF-Article 1.0 Author-Name: Khaled Bedjou Author-X-Name-First: Khaled Author-X-Name-Last: Bedjou Author-Name: Faical Azouaou Author-X-Name-First: Faical Author-X-Name-Last: Azouaou Title: Detection of terrorism's apologies on Twitter using a new bi-lingual dataset Abstract: A lot of terrorist apology content is being shared on social media without being detected. Therefore, the automatic and immediate detection of these contents is essential for people's safety. In this paper, we propose a language independent process to detect and classify terrorism's apologies on Twitter into three classes (apology, no apology, and neutral). We tested the process on a bi-lingual (Arabic and English) dataset of 12,155 manually annotated tweets. We conducted two sets of experiments, one with imbalanced data and the other with oversampled data. We compared the classification performances of four machine learning algorithms (RF, DT, KNN, and NB) and five deep learning algorithms (GRU, SimpleRNN, LSTM, BiLSTM, and BERT). Our comparative study concluded that BERT achieves better classification performance than the others do, with an accuracy of 0.84 for Arabic and 0.81 for English on imbalanced data, and 0.88 for Arabic and 0.91 for English on oversampled data. Journal: Int. J. of Data Mining, Modelling and Management Pages: 331-354 Issue: 4 Volume: 15 Year: 2023 Keywords: terrorism's apology; social network analysis; Twitter; NLP; sentiment analysis; machine learning; deep learning; transfer learning. File-URL: http://www.inderscience.com/link.php?id=134581 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:4:p:331-354 Template-Type: ReDIF-Article 1.0 Author-Name: Alaa Khalaf Hamoud Author-X-Name-First: Alaa Khalaf Author-X-Name-Last: Hamoud Author-Name: Ali Salah Alasady Author-X-Name-First: Ali Salah Author-X-Name-Last: Alasady Author-Name: Wid Akeel Awadh Author-X-Name-First: Wid Akeel Author-X-Name-Last: Awadh Author-Name: Jasim Mohammed Dahr Author-X-Name-First: Jasim Mohammed Author-X-Name-Last: Dahr Author-Name: Mohammed B.M. Kamel Author-X-Name-First: Mohammed B.M. Author-X-Name-Last: Kamel Author-Name: Aqeel Majeed Humadi Author-X-Name-First: Aqeel Majeed Author-X-Name-Last: Humadi Author-Name: Ihab Ahmed Najm Author-X-Name-First: Ihab Ahmed Author-X-Name-Last: Najm Title: A comparative study of supervised/unsupervised machine learning algorithms with feature selection approaches to predict student performance Abstract: The field of educational data mining (EDM) is one of the most growing fields that aims to improve the performance of students, academic staff, and overall institutional performance. The implementing process of data mining algorithms almost needs the feature selection process to find the most correlated features and improve the accuracy. In this paper, a comparative study is performed to study implementation of supervised/unsupervised algorithms in predicting the students' performance. The student's grade is classified using different fields of supervised and unsupervised algorithms such as decision trees, clustering, and neural networks. These algorithms were examined over the questionnaire dataset before/after feature selection to measure the effect of feature selection on the result accuracy. The results showed that the random forest decision tree outperformed other supervised/unsupervised algorithms. The results also showed that the performance evaluation of algorithms with the dataset after removing the less correlated attributes is enhanced for most of the algorithms. Journal: Int. J. of Data Mining, Modelling and Management Pages: 393-409 Issue: 4 Volume: 15 Year: 2023 Keywords: educational data mining; EDM; students' performance; supervised algorithms; unsupervised algorithms; feature selection. File-URL: http://www.inderscience.com/link.php?id=134590 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:4:p:393-409 Template-Type: ReDIF-Article 1.0 Author-Name: Donald Douglas Atsa'am Author-X-Name-First: Donald Douglas Author-X-Name-Last: Atsa'am Author-Name: Frank Adusei-Mensah Author-X-Name-First: Frank Author-X-Name-Last: Adusei-Mensah Author-Name: Oluwafemi Samson Balogun Author-X-Name-First: Oluwafemi Samson Author-X-Name-Last: Balogun Author-Name: Temidayo Oluwatosin Omotehinwa Author-X-Name-First: Temidayo Oluwatosin Author-X-Name-Last: Omotehinwa Author-Name: Oluwaseun Alexander Dada Author-X-Name-First: Oluwaseun Alexander Author-X-Name-Last: Dada Author-Name: Richard Osei Agjei Author-X-Name-First: Richard Osei Author-X-Name-Last: Agjei Author-Name: Samuel Nii Odoi Devine Author-X-Name-First: Samuel Nii Odoi Author-X-Name-Last: Devine Title: A novel taxonomy of natural disasters based on casualty and consequence using hierarchical clustering Abstract: Post-disaster management requires a proportional deployment of human and material resources. The number of resources required to manage a disaster cannot be known without first evaluating the extent of casualty and consequence. This study proposed a taxonomy for classifying natural disasters based on casualty and consequence. Using a secondary data on global disasters from 1900 to 2021, the hierarchical cluster analysis technique was deployed for taxonomy formation. The learning algorithm evaluated the similarities in numbers of deaths, injuries, and the cost of damaged property caused by disasters. Three clusters were extracted which sub-grouped historical disasters based on similarities in casualty and consequence. Further, a taxonomy that defines the ranges of what constitute low, average, and high deaths/injuries/damage was established. Classifying a future disaster with this taxonomy prior to the deployment of resources for rescue, resettlement, compensation, and other disaster management operations will guide efficient resource allocation on a case-by-case basis. Journal: Int. J. of Data Mining, Modelling and Management Pages: 313-330 Issue: 4 Volume: 15 Year: 2023 Keywords: disaster taxonomy; natural disasters; casualty and consequence; post-disaster management; hierarchical cluster analysis. File-URL: http://www.inderscience.com/link.php?id=134591 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:4:p:313-330 Template-Type: ReDIF-Article 1.0 Author-Name: Mohammad Khanbabaei Author-X-Name-First: Mohammad Author-X-Name-Last: Khanbabaei Author-Name: Pantea Parsi Author-X-Name-First: Pantea Author-X-Name-Last: Parsi Author-Name: Najmeh Farhadi Author-X-Name-First: Najmeh Author-X-Name-Last: Farhadi Title: Using data mining to integrate recency-frequency-monetary value analysis and credit scoring methods for bank customer behaviour analysis Abstract: Banks apply credit scoring to identify customers with low credit risk. Additionally, recency-frequency-monetary value (RFM) analysis method is suitable for identifying valuable bank customers. Data mining techniques can be used to discover useful patterns hidden in customer data. However, in previous research, data mining has been used separately in both credit scoring and RFM approaches. To evaluate customer behaviour, banks must employ credit scoring and RFM analysis method, simultaneously. This study proposes a framework for using data mining techniques to integrate credit scoring and RFM methods in the field of banking. In this framework, k-means had better performance than Kohonen network and DBSCAN to identify and cluster valuable customers based on the RFM and credit scoring indices. Moreover, the C5 decision tree, BN, and SVM with 94.10%, 92.71%, and 92.36% accuracy had better performance to classify valuable bank customers based on RFM and credit scoring indices. Journal: Int. J. of Data Mining, Modelling and Management Pages: 369-392 Issue: 4 Volume: 15 Year: 2023 Keywords: data mining; RFM method; credit scoring; banking; marketing. File-URL: http://www.inderscience.com/link.php?id=134598 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:4:p:369-392 Template-Type: ReDIF-Article 1.0 Author-Name: You-Xuan Lin Author-X-Name-First: You-Xuan Author-X-Name-Last: Lin Title: Adaptable address parser with active learning Abstract: Address parsing, decomposing address strings to semantically meaningful components, is a measure to convert unstructured or semi-structured address data to structured one. Flexibility and variability in real-world address formats make parser development a non-trivial task. Even after all the time and effort dedicated to obtaining a capable parser, updating or even re-training is required for out-of-domain data and extra costs will be incurred. To minimise the cost of model building and updating, this study experiments with active learning for model training and adaptation. Models composed of character-level embedding and recurrent neural networks are trained to parse address in Taiwan. Results show that by active learning, 420 additional instances to the training data are sufficient for a model to adapt itself to unfamiliar data while its competence in the original domain is retained. This suggests that active learning is helpful for model adaptation when data labelling is expensive and restricted. Journal: Int. J. of Data Mining, Modelling and Management Pages: 79-101 Issue: 1 Volume: 15 Year: 2023 Keywords: address parsing; record linkage; active learning; model adaptation; recurrent neural network; RNN; address in Taiwan. File-URL: http://www.inderscience.com/link.php?id=129991 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:1:p:79-101 Template-Type: ReDIF-Article 1.0 Author-Name: Rajeev Kumar Gupta Author-X-Name-First: Rajeev Kumar Author-X-Name-Last: Gupta Author-Name: Arti Jain Author-X-Name-First: Arti Author-X-Name-Last: Jain Author-Name: Ruchika Kumar Author-X-Name-First: Ruchika Author-X-Name-Last: Kumar Author-Name: R.K. Pateriya Author-X-Name-First: R.K. Author-X-Name-Last: Pateriya Title: Capturing uncertainties through log analysis using DevOps Abstract: DevOps is an advancement of agile processes which is mainly used to improve the coordination between development and operation teams. Continuous practices survive within the core of the DevOps which ensures efficient pipelines and high-quality delivery of software. Using such practices in a synchronous, business dynamics compliance and ever-changing needs of clients can meet high performance and reliable final products. This research work is an attempt to propose a simplified solution, guideline and tools support for developing and maintaining quality of continuous practices that are used in the DevOps project. The system automates the correlation among various telemetry data to contribute towards enriching log analysis and reduces manual efforts. The proposed system undergoes in-depth analysis of logs, promotes quality assessments and feedback to developers, which in result, helps in deeper problem diagnosis of the telemetry data. In this work, an empirical study is carried out to gain conceptual clarity on integrated pipeline architecture and to address how automation in continuous monitoring accelerates and extends the feedback loop in the system. Journal: Int. J. of Data Mining, Modelling and Management Pages: 53-78 Issue: 1 Volume: 15 Year: 2023 Keywords: agile; DevOps; log analysis; telemetry data; software development life cycle; SDLC. File-URL: http://www.inderscience.com/link.php?id=129995 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:1:p:53-78 Template-Type: ReDIF-Article 1.0 Author-Name: Muhammad Irfan Yousuf Author-X-Name-First: Muhammad Irfan Author-X-Name-Last: Yousuf Author-Name: Raheel Anwar Author-X-Name-First: Raheel Author-X-Name-Last: Anwar Title: Weighted edge sampling for static graphs Abstract: Graph sampling provides an efficient yet inexpensive solution for analysing large graphs. The purpose of sampling a graph is to extract a small representative subgraph from a big graph so that the sample can be used in place of the big graph for studying and analysing it. In this paper, we propose a new sampling method called weighted edge sampling. In this method, we give equal weight to all the edges in the beginning. During the sampling process, we sample an edge with the probability proportional to its weight. When an edge is sampled, we increase the weight of its neighbouring edges and this increases their probability to be sampled. Our method extracts the neighbourhood of a sampled edge more efficiently than previous approaches. We evaluate the efficacy of our sampling approach empirically using several real-world datasets. We find that our method produces better samples than the previous approaches. Our results show that our samples better estimate the degree and path length of the original graphs whereas our samples are less efficient in estimating the clustering coefficient of a graph. Journal: Int. J. of Data Mining, Modelling and Management Pages: 355-368 Issue: 4 Volume: 15 Year: 2023 Keywords: graph sampling; edge sampling; edge weight; graph induction. File-URL: http://www.inderscience.com/link.php?id=134612 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:15:y:2023:i:4:p:355-368