Template-Type: ReDIF-Article 1.0
Author-Name: Aziz Ouaarab
Author-X-Name-First: Aziz
Author-X-Name-Last: Ouaarab
Title: Discrete cuckoo search for 0-1 knapsack problem
Abstract:
This paper presents a resolution of a space management optimisation problem such as 0-1 knapsack problems (KP) by discrete cuckoo search algorithm (DCS). The proposed approach includes an adaptation process of three main components: the objective function, the solution representation, and the step move operator. A simplified conception of these three components is designed without introducing an additional technique, especially in the search process for the optimal solution. Three sets of benchmark instances have been taken from the literature to test the performance of DCS. Experimental results prove that DCS is effective in solving different types of 0-1 KP instances. The result comparisons with other state-of-the-art algorithms show that DCS is a competitive approach that outperforms most of them.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 374-396
Issue: 4
Volume: 16
Year: 2024
Keywords: 0-1 knapsack problem; discrete cuckoo search; DCS; combinatorial optimisation; L&#233;vy flights; approximate algorithm.
File-URL: http://www.inderscience.com/link.php?id=142593
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:4:p:374-396

Template-Type: ReDIF-Article 1.0
Author-Name: Arpita Nath Boruah
Author-X-Name-First: Arpita Nath
Author-X-Name-Last: Boruah
Author-Name: Mrinal Goswami
Author-X-Name-First: Mrinal
Author-X-Name-Last: Goswami
Title: Early stage analysis of breast cancer using intelligent system
Abstract:
Breast cancer (BC) poses a considerable global health concern for women, which makes a significant issue for women's well-being worldwide. It is crucial to develop a system that can proactively identify the critical risk factors associated with BC. The present study introduces an intelligent system for BC by analysing risk factors (IS-BC-analysing-RF) which utilises decision tree rules to identify the primary risk factors underlying BC accurately. The rules are processed based on the proposed score function to get the most relevant ones. Finally, using the sequential search approach, the critical risk factors are identified along with their respective ranges. Based on the simulation results using University of California at Irvine (UCI) repository BC dataset, the findings indicate that the proposed IS-BC-analysing-RF system is highly significant and has the potential to effectively mitigate the risk of BC by targeting and managing one or two crucial risk factors.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 443-454
Issue: 4
Volume: 16
Year: 2024
Keywords: decision system; breast cancer; decision tree; machine learning; risk factor.
File-URL: http://www.inderscience.com/link.php?id=142594
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:4:p:443-454

Template-Type: ReDIF-Article 1.0
Author-Name: Prachee Dewangan
Author-X-Name-First: Prachee
Author-X-Name-Last: Dewangan
Author-Name: Debabala Swain
Author-X-Name-First: Debabala
Author-X-Name-Last: Swain
Author-Name: Monalisa Swain
Author-X-Name-First: Monalisa
Author-X-Name-Last: Swain
Title: A novel LWT-based robust watermark strategy for colour images
Abstract:
With the progress of information technology, digital data larceny and duplicity have become very easy. Image watermarking in cryptography is a major domain that provides manifold security features like confidentiality, authenticity, integrity, etc. This research introduces a robust watermarking scheme for colour images. The proposed technique segments the colour image into three layers red, green and blue. The lifting wavelet transform (LWT) and differential histogram shifting are used to embed text watermark information into the R layer. The performance of the proposed technique was assessed using the SIPI image dataset. Test outputs show that the proposed scheme maintains the balance between imperceptibility and robustness. This scheme has a better resistance against all types of attacks like different noises, filter effects, image compressions, etc. Besides, the text watermark can be successfully extracted for different types of tampering like content removal attacks, and content addition attacks.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 359-373
Issue: 4
Volume: 16
Year: 2024
Keywords: robust watermarking; geometric attack; fragile attack; dual watermark; lifting wavelet transform.
File-URL: http://www.inderscience.com/link.php?id=142595
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:4:p:359-373

Template-Type: ReDIF-Article 1.0
Author-Name: Amna Amin Sethi
Author-X-Name-First: Amna Amin
Author-X-Name-Last: Sethi
Author-Name: Saad Khan
Author-X-Name-First: Saad
Author-X-Name-Last: Khan
Author-Name: Fatima Hashmi
Author-X-Name-First: Fatima
Author-X-Name-Last: Hashmi
Author-Name: Saim Ali Akber
Author-X-Name-First: Saim Ali
Author-X-Name-Last: Akber
Title: Detecting driver mutations in colorectal cancer through big data analysis
Abstract:
Colorectal cancer (CRC) is a complex disease causing a significant challenge to global health with profound impacts on morbidity and mortality. There is a need to identify genetic biomarkers for early diagnosis of disease. In this study, a comprehensive analysis of CRC genomes was conducted to identify consistent mutations in both coding and non-coding highlighting their pivotal role in CRC pathogenesis. The results of this study revealed consistent mutations in coding regions that validated known CRC driver genes. The consistent non-coding mutations were also identified within transcription factors binding sites (TFBS) in CRC cell lines. The statistical significance of these mutations suggests their potential impact on gene regulation leading to the development and progression of CRC. They might act as potential biomarkers for early diagnosis of the disease. To conclude, the findings of this study might provide novel therapeutic targets and diagnostic markers for personalised medicine.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 420-442
Issue: 4
Volume: 16
Year: 2024
Keywords: colorectal cancer; CRC; driver mutations; driver genes; biomarkers; transcription factors binding sites; TFBS.
File-URL: http://www.inderscience.com/link.php?id=142596
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:4:p:420-442

Template-Type: ReDIF-Article 1.0
Author-Name: Puja Kaura
Author-X-Name-First: Puja
Author-X-Name-Last: Kaura
Author-Name: Ajay Kumar
Author-X-Name-First: Ajay
Author-X-Name-Last: Kumar
Title: Mapping the trajectory of sustainable finance research: an analysis using bibliometric approach
Abstract:
The research highlights the significance of sustainable finance within the financial, economic, and entrepreneurial domains to address the mounting apprehensions about social and environmental issues. The objective is to incorporate environmental, social, and governance considerations into financial judgments, thereby fostering accountability for ecological and societal impacts. This study conducts a bibliometric analysis of academic literature on sustainable finance from 2004 to 2023. The Scopus database was used for analysis which was conducted by the BiblioShiny application and R Studio. This review analysis offers a comprehensive examination of the progression of sustainable finance research within a contextual framework by examining the performance analysis of 475 English-language documents based on sources, keywords, countries, and authors. It examines publication trends, leading articles, authors, journals, and countries. The study also identifies the key themes and topics constituting this field's fundamental knowledge framework, recommends future research directions, and reveals a concentration of research on sustainable finance in developed countries rather than developing and underdeveloped countries.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 397-419
Issue: 4
Volume: 16
Year: 2024
Keywords: sustainable finance; climate finance; climate change; green finance; bibliometric analysis.
File-URL: http://www.inderscience.com/link.php?id=142607
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:4:p:397-419

Template-Type: ReDIF-Article 1.0
Author-Name: Souad Moufok
Author-X-Name-First: Souad
Author-X-Name-Last: Moufok
Author-Name: Anas Mouattah
Author-X-Name-First: Anas
Author-X-Name-Last: Mouattah
Author-Name: Khalid Hachemi
Author-X-Name-First: Khalid
Author-X-Name-Last: Hachemi
Title: K-means and DBSCAN for look-alike sound-alike medicines issue
Abstract:
The goal of this study is to analyse the application of data mining techniques in clustering drug names based on their spelling similarity in order to reduce the occurrence of dispensing errors caused by look-alike sound-alike medicine confusion, as they considered one of the most common causes of dispensing errors. Two unsupervised data mining methods, k-means and DBSCAN, were used in conjunction with two similarity measures, BiSim and Levenshtein. The results of the study showed that the approach is effective in identifying potential confusable medicines, with BiSim-based k-means clustering being favoured with a silhouette score of 0.5.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 49-65
Issue: 1
Volume: 16
Year: 2024
Keywords: look-alike sound-alike; LASA; data mining; medication errors; dispensing errors; k-means; DBSCAN.
File-URL: http://www.inderscience.com/link.php?id=136215
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:1:p:49-65

Template-Type: ReDIF-Article 1.0
Author-Name: Mathe John Kenny Kumar
Author-X-Name-First: Mathe John Kenny
Author-X-Name-Last: Kumar
Author-Name: Dipti Rana
Author-X-Name-First: Dipti
Author-X-Name-Last: Rana
Title: HARUIM: high average recent utility itemset mining
Abstract:
High utility itemset mining (HUIM) discovers itemsets that are profitable in nature. Previously, the recency of an itemset was determined by adding the recency of each transaction of an itemset. A major disadvantage of this method is that some transactions of an itemset which are very recent can cause the whole itemset to be recent. To overcome this limitation, we present a novel measure called &lt;i&gt;average recency&lt;/i&gt; to mine recent and high utility itemsets. Average recency upper-bound (arub) and estimated recency co-occurrence structure (ERCS) are proposed to prune unpromising itemsets. A variation of list structure known as average recent utility list (ARUL) has been created to hold data regarding utility and recency of itemsets. Through a series of comprehensive experimentation carried out on both real as well as synthetic datasets, it has been demonstrated that the proposed system surpasses the baseline algorithm in runtime, memory utilisation, and candidate generation.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 66-100
Issue: 1
Volume: 16
Year: 2024
Keywords: data mining; high utility itemset mining; HUIM; recency; average recency; list structure; pattern mining; EUCS; knowledge engineering; candidate generation.
File-URL: http://www.inderscience.com/link.php?id=136217
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:1:p:66-100

Template-Type: ReDIF-Article 1.0
Author-Name: V. Sitharamulu
Author-X-Name-First: V.
Author-X-Name-Last: Sitharamulu
Author-Name: K. Rajendra Prasad
Author-X-Name-First: K. Rajendra
Author-X-Name-Last: Prasad
Author-Name: K. Sudheer Reddy
Author-X-Name-First: K. Sudheer
Author-X-Name-Last: Reddy
Author-Name: A.V. Krishna Prasad
Author-X-Name-First: A.V. Krishna
Author-X-Name-Last: Prasad
Author-Name: M. Venkat Dass
Author-X-Name-First: M. Venkat
Author-X-Name-Last: Dass
Title: Hybrid classifier model for big data by leveraging map reduce framework
Abstract:
Big data technology is popular and desirable among many users for handling, analysing, and storing large data. However, clustering the large data has become more complex due to its size. In recent years, several techniques have been presented to retrieve the information from big data. The proposed hybrid classifier model CSDHAP, the hybridised form of sun flower optimisation (SFO) and deer hunting optimisation (DHO) algorithms with adaptive pollination rate using MapReduce framework. The CSDHAP is a data classification technique that performed using classifiers. The results of the presented approach are evaluated over the extant approaches using various metrics namely, F1-score, specificity, NPV, accuracy, FNR, FDR, sensitivity, precision, FPR, and MCC. It is pertinent to mention that, the proposed model is better than any of the traditional models. The proposed HC&#43;CSDHAP model attained better precision value than other traditional models like RNN, SVM, CNN, Bi-LSTM, NB, LSTM, and DBN, correspondingly.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 23-48
Issue: 1
Volume: 16
Year: 2024
Keywords: big data classification; MapReduce framework; long short-term memory; LSTM; deep belief network; DBN; optimisation.
File-URL: http://www.inderscience.com/link.php?id=136219
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:1:p:23-48

Template-Type: ReDIF-Article 1.0
Author-Name: Ivona Lipovac
Author-X-Name-First: Ivona
Author-X-Name-Last: Lipovac
Author-Name: Marina Bagić Babac
Author-X-Name-First: Marina Bagić
Author-X-Name-Last: Babac
Title: Developing a data pipeline solution for big data processing
Abstract:
This paper presents a comprehensive exploration of the concept of big data and its management while highlighting the challenges that arise in the process. The study showcases the development of a data pipeline, designed to facilitate big data collection, integration, and analysis while addressing state-of-the-art challenges, methods, tools, and technologies. Emphasis is placed on pipeline flexibility, with a view towards enabling ease of implementation of architecture changes, seamless integration of new sources, and straightforward implementation of additional transformations in existing pipelines as needed. The pipeline architecture is discussed in detail, with a focus on its design principles, components, and implementation details, as well as the mechanisms used to ensure its reliability, scalability, and performance. Results from a range of experiments demonstrate the pipeline's effectiveness in addressing the challenges of big data management and analysis, as well as its robustness and versatility in accommodating diverse data sources and processing requirements. This study provides insights into the critical role of data pipelines in enabling effective big data management and showcases the importance of flexibility in pipeline design to ensure adaptability to evolving data processing needs.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 1-22
Issue: 1
Volume: 16
Year: 2024
Keywords: big data; data pipeline; data processing; data analysis; cloud computing.
File-URL: http://www.inderscience.com/link.php?id=136221
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:1:p:1-22

Template-Type: ReDIF-Article 1.0
Author-Name: Syed Azeem Inam
Author-X-Name-First: Syed Azeem
Author-X-Name-Last: Inam
Author-Name: Daniyal Iqbal
Author-X-Name-First: Daniyal
Author-X-Name-Last: Iqbal
Author-Name: Hassan Hashim
Author-X-Name-First: Hassan
Author-X-Name-Last: Hashim
Author-Name: Mansoor Ahmed Khuhro
Author-X-Name-First: Mansoor Ahmed
Author-X-Name-Last: Khuhro
Title: An empirical approach towards detection of tuberculosis using deep convolutional neural network
Abstract:
Tuberculosis remains among the top disease, causing death all over the globe and its timely detection is a major concern for medical practitioners, especially after the emergence of the SARS-CoV-2 pandemic. Even with the recent advances in the methods for medical image classification, it is still challenging to diagnose tuberculosis without considering the associated historical and biological factors. There has been a great contribution of unsupervised learning in the development of techniques for image classification and the present study has utilised a deep convolutional neural network for detecting tuberculosis. It proposes a network comprising 54 layers having 59 connections. After computations, our proposed deep convolutional neural network attained an accuracy of 99.79%, 99.46%, and 99.5% for the classes of healthy, sick, and tuberculosis (TB) respectively for a public dataset, achieving higher accuracy as compared to other pre-trained network models.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 101-112
Issue: 1
Volume: 16
Year: 2024
Keywords: tuberculosis; image classification; deep convolutional neural network; DCNN; accuracy; F1 score.
File-URL: http://www.inderscience.com/link.php?id=136232
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:1:p:101-112

Template-Type: ReDIF-Article 1.0
Author-Name: Ali Vasfi Ağlarcı
Author-X-Name-First: Ali Vasfi
Author-X-Name-Last: Ağlarcı
Author-Name: Cengiz Bal
Author-X-Name-First: Cengiz
Author-X-Name-Last: Bal
Title: Effect of various factors on classification performance of ordinal logistic regression
Abstract:
The classification problem is the way in which a new observation belongs to a set of categories, using known features. For example, categorising e-mails as necessary or unnecessary, or finding a diagnosis of a disease using a patient's various values (such as gender, blood pressure, presence of various symptoms). Various methods are used in classification processes. In this study, the classification performance of ordinal logistic regression, which is a statistical method, was investigated. It has been revealed how the classification success of the method changes when the data set properties change. For this, a simulation study was carried out by deriving data sets with different properties with the help of the R program. As a result of the simulation study, it was observed that the correlation structure in the data set, the sample size, the number and distribution of the response variable categories affected the classification performance of the method. Suggestions have been made to improve the classification performance of the ordinal logistic regression method.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 196-208
Issue: 2
Volume: 16
Year: 2024
Keywords: statistical learning; classification; ordinal data; simulation.
File-URL: http://www.inderscience.com/link.php?id=138813
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:2:p:196-208

Template-Type: ReDIF-Article 1.0
Author-Name: P.V.N. Rajeswari
Author-X-Name-First: P.V.N.
Author-X-Name-Last: Rajeswari
Author-Name: M. Shashi
Author-X-Name-First: M.
Author-X-Name-Last: Shashi
Title: Intrusion detection system using statistical query tree with hierarchical clustering approach
Abstract:
The internet has become a major part of everyone's life. When no proper protection is provided, intruders misuse the access provided by the internet, leading to an increased risk of sensitive data leakage. To have a trade-off between scalability and precision, this research introduces a novel two-stage screening framework for intrusion detection systems (IDS) to identify the attacks and their types. The first stage aims to identify suspicious internet protocol (IP) addresses based on the abrupt deviation from the normal activity pattern. The second screening stage aims to analyse the packets received from suspicious IP addresses by applying a recently developed single-phase statistical hierarchical clustering (SHiC) algorithm designed for clustering and outlier detection. The data packets are classified as outliers based on their higher statistic distance to the existing components or clusters identified. The complete IDS framework is developed and applied to two benchmark datasets and compared with the results produced by several outlier detection algorithms. The proposed framework is found to be consistently more accurate in detecting attacks.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 176-195
Issue: 2
Volume: 16
Year: 2024
Keywords: statistical query tree; intrusion detection system; IDS; outlier; statistical hierarchical clustering; SHiC; cyber-attack; CICIDS-2017.
File-URL: http://www.inderscience.com/link.php?id=138822
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:2:p:176-195

Template-Type: ReDIF-Article 1.0
Author-Name: Hamed Khosravi
Author-X-Name-First: Hamed
Author-X-Name-Last: Khosravi
Author-Name: Mohammad Reza Shafie
Author-X-Name-First: Mohammad Reza
Author-X-Name-Last: Shafie
Author-Name: Morteza Hajiabadi
Author-X-Name-First: Morteza
Author-X-Name-Last: Hajiabadi
Author-Name: Ahmed Shoyeb Raihan
Author-X-Name-First: Ahmed Shoyeb
Author-X-Name-Last: Raihan
Author-Name: Imtiaz Ahmed
Author-X-Name-First: Imtiaz
Author-X-Name-Last: Ahmed
Title: Chatbots and ChatGPT: a bibliometric analysis and systematic review of publications in Web of Science and Scopus databases
Abstract:
This paper presents a bibliometric analysis of the scientific literature related to chatbots, focusing specifically on ChatGPT. Chatbots have gained increasing attention recently, with an annual growth rate of 19.16% and 27.19% on the Web of Sciences (WoS) and Scopus, respectively. The research consists of two study phases: 1) an analysis of chatbot literature; 2) a comprehensive review of scientific documents on ChatGPT. In the first phase, a bibliometric analysis is conducted on all the published literature from both Scopus (5,839) and WoS (2,531) databases covering the period from 1998 to 2023. Consequently, bibliometric analysis has been carried out on ChatGPT publications, and 45 published studies have been analysed thoroughly based on their methods, novelty, and conclusions. Overall, the study aims to provide guidelines for researchers to conduct their research more effectively in the field of chatbots and specifically highlight significant areas for future investigation into ChatGPT.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 113-147
Issue: 2
Volume: 16
Year: 2024
Keywords: chatbot; ChatGPT; bibliometrics; artificial intelligence; natural language processing; NLP; generative artificial intelligence.
File-URL: http://www.inderscience.com/link.php?id=138824
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:2:p:113-147

Template-Type: ReDIF-Article 1.0
Author-Name: Thouraya Sakouhi
Author-X-Name-First: Thouraya
Author-X-Name-Last: Sakouhi
Author-Name: Jalel Akaichi
Author-X-Name-First: Jalel
Author-X-Name-Last: Akaichi
Title: Clustering-based multidimensional sequential pattern mining of semantic trajectories
Abstract:
Knowledge discovery from mobility data is about identifying behaviours from trajectories. In fact, mining masses of trajectories is required to have an overview of this data, notably, investigate the relationship between different entities movement. Most state-of-the-art work in this issue operates on raw trajectories. Nevertheless, behaviours discovered from raw trajectories are not as rich and meaningful as those discovered from semantic trajectories. In this paper, we establish a mining approach to extract patterns from semantic trajectories. We propose to apply sequential pattern mining based on a pre-processing step of clustering to alleviate the former's temporal complexity. Mining considers the spatial and temporal dimensions at different levels of granularity providing then richer and more insightful patterns about humans behaviour. We evaluate our work on tourists semantic trajectories in Kyoto. Results showed the effectiveness and efficiency of our model compared to state-of-the-art work.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 148-175
Issue: 2
Volume: 16
Year: 2024
Keywords: mobility data; trajectories; semantic modelling; sequential pattern mining; clustering; mobility pattern.
File-URL: http://www.inderscience.com/link.php?id=138825
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:2:p:148-175

Template-Type: ReDIF-Article 1.0
Author-Name: Kamel Abdellaoui
Author-X-Name-First: Kamel
Author-X-Name-Last: Abdellaoui
Author-Name: Mohamed Ali Hadj Taieb
Author-X-Name-First: Mohamed Ali Hadj
Author-X-Name-Last: Taieb
Author-Name: Rafik Mahjoubi
Author-X-Name-First: Rafik
Author-X-Name-Last: Mahjoubi
Author-Name: Mohamed Ben Aouicha
Author-X-Name-First: Mohamed Ben
Author-X-Name-Last: Aouicha
Title: Data-driven journey: a data management paradigm-centric review and data mesh capabilities
Abstract:
Becoming data driven is one of the top strategic objectives of data-rich organisations. Africa must join the wave to capture and unlock the highest value from data. Therefore, this survey analyses the drivers, challenges, and evolution, of existing data management paradigms including data warehouse, data lake and data lakehouse. It reveals the limitations of monolithic approaches to address data at scale and how they led to a paradigm shift toward a more distributed and decentralised data mesh. The paper discusses data mesh capabilities to address the challenges of data availability and accessibility at scale in Africa to enable leapfrog development in its journey to being data driven.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 209-243
Issue: 2
Volume: 16
Year: 2024
Keywords: data-driven; data management paradigms; data mesh; analytics; developing countries.
File-URL: http://www.inderscience.com/link.php?id=138865
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:2:p:209-243

Template-Type: ReDIF-Article 1.0
Author-Name: Mohammed Alweshah
Author-X-Name-First: Mohammed
Author-X-Name-Last: Alweshah
Author-Name: Ghadeer Ahmad Alhebaishan
Author-X-Name-First: Ghadeer Ahmad
Author-X-Name-Last: Alhebaishan
Author-Name: Sofian Kassaymeh
Author-X-Name-First: Sofian
Author-X-Name-Last: Kassaymeh
Author-Name: Saleh Alkhalaileh
Author-X-Name-First: Saleh
Author-X-Name-Last: Alkhalaileh
Author-Name: Mohammed Ababneh
Author-X-Name-First: Mohammed
Author-X-Name-Last: Ababneh
Title: Improving intrusion detection in the IoT with African vultures optimisation algorithm-based feature selection
Abstract:
The security of the system may be jeopardised by unsecured data transmitted through IoT devices, and ensuring the reliability of data is critical to maintaining the integrity of information over the internet. To enhance the intrusion detection rate, several investigations have been conducted to develop methodologies capable of identifying the minimum required secure features. One such method is the use of the feature selection procedure with metaheuristic algorithms. In this study, the African vulture optimisation algorithm was used in two wrapper FS approaches to select the most secure features in IoT. The first approach used AVO, while the second employed OBL-AVO, a hybrid model combining AVO with opposition-based learning (OBL) to enhance exploration. Based on the outcomes, it was found that the OBL-AVO is superior to the AVO in enhancing FS. Furthermore, the proposed methods' were evaluated and compared to four recent approaches.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 293-325
Issue: 3
Volume: 16
Year: 2024
Keywords: intrusion detection; internet of things; IoT; feature selection; hybrid metaheuristics; African vultures optimisation algorithm; AVO; opposition-based learning; OBL.
File-URL: http://www.inderscience.com/link.php?id=140529
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:3:p:293-325

Template-Type: ReDIF-Article 1.0
Author-Name: Moumita Ghosh
Author-X-Name-First: Moumita
Author-X-Name-Last: Ghosh
Author-Name: Sourav Mondal
Author-X-Name-First: Sourav
Author-X-Name-Last: Mondal
Author-Name: Harshita Moondra
Author-X-Name-First: Harshita
Author-X-Name-Last: Moondra
Author-Name: Dina Tri Utari
Author-X-Name-First: Dina Tri
Author-X-Name-Last: Utari
Author-Name: Anirban Roy
Author-X-Name-First: Anirban
Author-X-Name-Last: Roy
Author-Name: Kartick Chandra Mondal
Author-X-Name-First: Kartick Chandra
Author-X-Name-Last: Mondal
Title: An irregular CLA-based novel frequent pattern mining approach
Abstract:
Frequent itemset mining has received a lot of attention in the field of data mining. Its main objective is to find groups of items that consistently appear together in datasets. Even while frequent itemset mining is useful, the algorithms for mining frequent itemsets have quite high resource requirements. In order to optimise the time and memory needs, a few improvements have been made in recent years. This study proposes CellFPM, a straightforward yet effective cellular learning automata-based method for finding frequent itemset occurrences. It works efficiently with large datasets. The efficiency of the proposed approach in time and memory requirements has been evaluated using benchmark datasets explicitly designed for performance measure. The varying size and density of the test datasets have confirmed the scalability of the suggested method. The findings show that CellFPM consistently surpasses the leading algorithms in terms of runtime and memory usage, particularly memory usage mostly.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 268-292
Issue: 3
Volume: 16
Year: 2024
Keywords: cellular learning automata; CLA; frequent itemsets; data mining; knowledge discovery.
File-URL: http://www.inderscience.com/link.php?id=140536
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:3:p:268-292

Template-Type: ReDIF-Article 1.0
Author-Name: Shengjuan Zhao
Author-X-Name-First: Shengjuan
Author-X-Name-Last: Zhao
Author-Name: Gyoogun Lim
Author-X-Name-First: Gyoogun
Author-X-Name-Last: Lim
Title: A comparative analysis of user attitudes towards ICO and IEO in blockchain projects: insights from social media big data
Abstract:
This study conducts a comparative analysis of two popular crowdfunding methods in the blockchain market, the initial coin offering (ICO) and the initial exchange offering (IEO) models. Using project names as keywords, we collected and analysed big data, applying techniques such as TF-IDF, LDA, social network analysis, and sentiment analysis. Our findings show that the attitude of target groups towards ICO and IEO projects is not significantly different, although IEO targets exhibit more interest in entertainment-related topics. Social network analysis reveals that the ICO target group is more sensitive to popular elements, such as pop singers, while the IEO target group is more interested in soccer competitions. Both projects show a strong interest in the US election. Our study suggests that IEO, as an upgraded financing model of ICO, does not enjoy high levels of trust from the market crowd. By identifying the preferences of the target groups for both models through multiple analyses, we recommend that these preferences be taken into consideration to improve the efficiency of targeted marketing.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 245-267
Issue: 3
Volume: 16
Year: 2024
Keywords: blockchain; big data; token issuance; initial coin offering; ICO; initial exchange offering; initial exchange offering; IEO.
File-URL: http://www.inderscience.com/link.php?id=140539
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:3:p:245-267

Template-Type: ReDIF-Article 1.0
Author-Name: B. Sivaiah
Author-X-Name-First: B.
Author-X-Name-Last: Sivaiah
Author-Name: R. Rajeswara Rao
Author-X-Name-First: R. Rajeswara
Author-X-Name-Last: Rao
Title: A node sets based fast and scalable frequent itemset algorithm for mining big data using map reduce paradigm
Abstract:
Big data is rapidly growing, making traditional tools inefficient for handling large amounts of data. Existing algorithms for frequent itemset mining struggle with scalability due to limitations in parallel processing power. In this paper, we proposed a fast and scalable frequent itemset mining (FSFIM) algorithm used to generate frequent item sets from huge data. Preorder coding (POC) trees and Nodeset data structures save half the memory of node-lists and N-lists. The FSFIM uses Cloudera's CDH Map Reduce framework. With a maximum speedup value of 1.85 when minimal support is set to 1, The experimental results reveal that FSFIM outperforms the state-of-the-art methods such as HBPFP, Mlib PFP, and Big FIM. Fast and scalable frequent itemset mining algorithm is more scalable and faster for mining frequent item sets from big data.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 326-343
Issue: 3
Volume: 16
Year: 2024
Keywords: big data; frequent itemset mining; FIM; MapReduce paradigm; fast and scalable frequent itemset mining; FSFIM.
File-URL: http://www.inderscience.com/link.php?id=140540
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:3:p:326-343

Template-Type: ReDIF-Article 1.0
Author-Name: Halima Drissi Touzani
Author-X-Name-First: Halima Drissi
Author-X-Name-Last: Touzani
Author-Name: Sanaa Faquir
Author-X-Name-First: Sanaa
Author-X-Name-Last: Faquir
Author-Name: Ali Yahyaouy
Author-X-Name-First: Ali
Author-X-Name-Last: Yahyaouy
Title: Data mining techniques along with fuzzy logic control to find solutions to road traffic accidents: case study in Morocco
Abstract:
Collecting data on road accidents is important. However, it is equally important to analyse and process this data to prevent future accidents. Data analysis can provide valuable insights and help identify patterns, contributing to the development of effective strategies and interventions to improve road safety. Over years, many efforts in research have tackled several causes related to traffic accidents trying to identify risk factors. Different statistics identified that most accidents are due to human errors. In Morocco, a lot of studies have been applied to cars system to become automatic or semi-automatic to avoid serious injuries due to poor driving practices. This paper presents data mining techniques applied on real traffic accidents data using statistical analysis, K-means clustering algorithm and fuzzy logic. The data represents accidents that happened in Morocco during 2014. Results showed important features that caused previous accidents which was used to implement an algorithm based on fuzzy logic to train a semi-autonomous car to make right decisions whenever needed and therefore, prevent accidents from happening.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 344-357
Issue: 3
Volume: 16
Year: 2024
Keywords: data analysis; data mining techniques; road traffic accidents; semi-autonomous cars; fuzzy logic control; decision algorithm; statistical methods; Morocco.
File-URL: http://www.inderscience.com/link.php?id=140542
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:3:p:344-357