Template-Type: ReDIF-Article 1.0 Author-Name: Aziz Ouaarab Author-X-Name-First: Aziz Author-X-Name-Last: Ouaarab Title: Discrete cuckoo search for 0-1 knapsack problem Abstract: This paper presents a resolution of a space management optimisation problem such as 0-1 knapsack problems (KP) by discrete cuckoo search algorithm (DCS). The proposed approach includes an adaptation process of three main components: the objective function, the solution representation, and the step move operator. A simplified conception of these three components is designed without introducing an additional technique, especially in the search process for the optimal solution. Three sets of benchmark instances have been taken from the literature to test the performance of DCS. Experimental results prove that DCS is effective in solving different types of 0-1 KP instances. The result comparisons with other state-of-the-art algorithms show that DCS is a competitive approach that outperforms most of them. Journal: Int. J. of Data Mining, Modelling and Management Pages: 374-396 Issue: 4 Volume: 16 Year: 2024 Keywords: 0-1 knapsack problem; discrete cuckoo search; DCS; combinatorial optimisation; Lévy flights; approximate algorithm. File-URL: http://www.inderscience.com/link.php?id=142593 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:4:p:374-396 Template-Type: ReDIF-Article 1.0 Author-Name: Arpita Nath Boruah Author-X-Name-First: Arpita Nath Author-X-Name-Last: Boruah Author-Name: Mrinal Goswami Author-X-Name-First: Mrinal Author-X-Name-Last: Goswami Title: Early stage analysis of breast cancer using intelligent system Abstract: Breast cancer (BC) poses a considerable global health concern for women, which makes a significant issue for women's well-being worldwide. It is crucial to develop a system that can proactively identify the critical risk factors associated with BC. The present study introduces an intelligent system for BC by analysing risk factors (IS-BC-analysing-RF) which utilises decision tree rules to identify the primary risk factors underlying BC accurately. The rules are processed based on the proposed score function to get the most relevant ones. Finally, using the sequential search approach, the critical risk factors are identified along with their respective ranges. Based on the simulation results using University of California at Irvine (UCI) repository BC dataset, the findings indicate that the proposed IS-BC-analysing-RF system is highly significant and has the potential to effectively mitigate the risk of BC by targeting and managing one or two crucial risk factors. Journal: Int. J. of Data Mining, Modelling and Management Pages: 443-454 Issue: 4 Volume: 16 Year: 2024 Keywords: decision system; breast cancer; decision tree; machine learning; risk factor. File-URL: http://www.inderscience.com/link.php?id=142594 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:4:p:443-454 Template-Type: ReDIF-Article 1.0 Author-Name: Prachee Dewangan Author-X-Name-First: Prachee Author-X-Name-Last: Dewangan Author-Name: Debabala Swain Author-X-Name-First: Debabala Author-X-Name-Last: Swain Author-Name: Monalisa Swain Author-X-Name-First: Monalisa Author-X-Name-Last: Swain Title: A novel LWT-based robust watermark strategy for colour images Abstract: With the progress of information technology, digital data larceny and duplicity have become very easy. Image watermarking in cryptography is a major domain that provides manifold security features like confidentiality, authenticity, integrity, etc. This research introduces a robust watermarking scheme for colour images. The proposed technique segments the colour image into three layers red, green and blue. The lifting wavelet transform (LWT) and differential histogram shifting are used to embed text watermark information into the R layer. The performance of the proposed technique was assessed using the SIPI image dataset. Test outputs show that the proposed scheme maintains the balance between imperceptibility and robustness. This scheme has a better resistance against all types of attacks like different noises, filter effects, image compressions, etc. Besides, the text watermark can be successfully extracted for different types of tampering like content removal attacks, and content addition attacks. Journal: Int. J. of Data Mining, Modelling and Management Pages: 359-373 Issue: 4 Volume: 16 Year: 2024 Keywords: robust watermarking; geometric attack; fragile attack; dual watermark; lifting wavelet transform. File-URL: http://www.inderscience.com/link.php?id=142595 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:4:p:359-373 Template-Type: ReDIF-Article 1.0 Author-Name: Amna Amin Sethi Author-X-Name-First: Amna Amin Author-X-Name-Last: Sethi Author-Name: Saad Khan Author-X-Name-First: Saad Author-X-Name-Last: Khan Author-Name: Fatima Hashmi Author-X-Name-First: Fatima Author-X-Name-Last: Hashmi Author-Name: Saim Ali Akber Author-X-Name-First: Saim Ali Author-X-Name-Last: Akber Title: Detecting driver mutations in colorectal cancer through big data analysis Abstract: Colorectal cancer (CRC) is a complex disease causing a significant challenge to global health with profound impacts on morbidity and mortality. There is a need to identify genetic biomarkers for early diagnosis of disease. In this study, a comprehensive analysis of CRC genomes was conducted to identify consistent mutations in both coding and non-coding highlighting their pivotal role in CRC pathogenesis. The results of this study revealed consistent mutations in coding regions that validated known CRC driver genes. The consistent non-coding mutations were also identified within transcription factors binding sites (TFBS) in CRC cell lines. The statistical significance of these mutations suggests their potential impact on gene regulation leading to the development and progression of CRC. They might act as potential biomarkers for early diagnosis of the disease. To conclude, the findings of this study might provide novel therapeutic targets and diagnostic markers for personalised medicine. Journal: Int. J. of Data Mining, Modelling and Management Pages: 420-442 Issue: 4 Volume: 16 Year: 2024 Keywords: colorectal cancer; CRC; driver mutations; driver genes; biomarkers; transcription factors binding sites; TFBS. File-URL: http://www.inderscience.com/link.php?id=142596 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:4:p:420-442 Template-Type: ReDIF-Article 1.0 Author-Name: Puja Kaura Author-X-Name-First: Puja Author-X-Name-Last: Kaura Author-Name: Ajay Kumar Author-X-Name-First: Ajay Author-X-Name-Last: Kumar Title: Mapping the trajectory of sustainable finance research: an analysis using bibliometric approach Abstract: The research highlights the significance of sustainable finance within the financial, economic, and entrepreneurial domains to address the mounting apprehensions about social and environmental issues. The objective is to incorporate environmental, social, and governance considerations into financial judgments, thereby fostering accountability for ecological and societal impacts. This study conducts a bibliometric analysis of academic literature on sustainable finance from 2004 to 2023. The Scopus database was used for analysis which was conducted by the BiblioShiny application and R Studio. This review analysis offers a comprehensive examination of the progression of sustainable finance research within a contextual framework by examining the performance analysis of 475 English-language documents based on sources, keywords, countries, and authors. It examines publication trends, leading articles, authors, journals, and countries. The study also identifies the key themes and topics constituting this field's fundamental knowledge framework, recommends future research directions, and reveals a concentration of research on sustainable finance in developed countries rather than developing and underdeveloped countries. Journal: Int. J. of Data Mining, Modelling and Management Pages: 397-419 Issue: 4 Volume: 16 Year: 2024 Keywords: sustainable finance; climate finance; climate change; green finance; bibliometric analysis. File-URL: http://www.inderscience.com/link.php?id=142607 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:4:p:397-419 Template-Type: ReDIF-Article 1.0 Author-Name: Souad Moufok Author-X-Name-First: Souad Author-X-Name-Last: Moufok Author-Name: Anas Mouattah Author-X-Name-First: Anas Author-X-Name-Last: Mouattah Author-Name: Khalid Hachemi Author-X-Name-First: Khalid Author-X-Name-Last: Hachemi Title: K-means and DBSCAN for look-alike sound-alike medicines issue Abstract: The goal of this study is to analyse the application of data mining techniques in clustering drug names based on their spelling similarity in order to reduce the occurrence of dispensing errors caused by look-alike sound-alike medicine confusion, as they considered one of the most common causes of dispensing errors. Two unsupervised data mining methods, k-means and DBSCAN, were used in conjunction with two similarity measures, BiSim and Levenshtein. The results of the study showed that the approach is effective in identifying potential confusable medicines, with BiSim-based k-means clustering being favoured with a silhouette score of 0.5. Journal: Int. J. of Data Mining, Modelling and Management Pages: 49-65 Issue: 1 Volume: 16 Year: 2024 Keywords: look-alike sound-alike; LASA; data mining; medication errors; dispensing errors; k-means; DBSCAN. File-URL: http://www.inderscience.com/link.php?id=136215 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:1:p:49-65 Template-Type: ReDIF-Article 1.0 Author-Name: Mathe John Kenny Kumar Author-X-Name-First: Mathe John Kenny Author-X-Name-Last: Kumar Author-Name: Dipti Rana Author-X-Name-First: Dipti Author-X-Name-Last: Rana Title: HARUIM: high average recent utility itemset mining Abstract: High utility itemset mining (HUIM) discovers itemsets that are profitable in nature. Previously, the recency of an itemset was determined by adding the recency of each transaction of an itemset. A major disadvantage of this method is that some transactions of an itemset which are very recent can cause the whole itemset to be recent. To overcome this limitation, we present a novel measure called <i>average recency</i> to mine recent and high utility itemsets. Average recency upper-bound (arub) and estimated recency co-occurrence structure (ERCS) are proposed to prune unpromising itemsets. A variation of list structure known as average recent utility list (ARUL) has been created to hold data regarding utility and recency of itemsets. Through a series of comprehensive experimentation carried out on both real as well as synthetic datasets, it has been demonstrated that the proposed system surpasses the baseline algorithm in runtime, memory utilisation, and candidate generation. Journal: Int. J. of Data Mining, Modelling and Management Pages: 66-100 Issue: 1 Volume: 16 Year: 2024 Keywords: data mining; high utility itemset mining; HUIM; recency; average recency; list structure; pattern mining; EUCS; knowledge engineering; candidate generation. File-URL: http://www.inderscience.com/link.php?id=136217 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:1:p:66-100 Template-Type: ReDIF-Article 1.0 Author-Name: V. Sitharamulu Author-X-Name-First: V. Author-X-Name-Last: Sitharamulu Author-Name: K. Rajendra Prasad Author-X-Name-First: K. Rajendra Author-X-Name-Last: Prasad Author-Name: K. Sudheer Reddy Author-X-Name-First: K. Sudheer Author-X-Name-Last: Reddy Author-Name: A.V. Krishna Prasad Author-X-Name-First: A.V. Krishna Author-X-Name-Last: Prasad Author-Name: M. Venkat Dass Author-X-Name-First: M. Venkat Author-X-Name-Last: Dass Title: Hybrid classifier model for big data by leveraging map reduce framework Abstract: Big data technology is popular and desirable among many users for handling, analysing, and storing large data. However, clustering the large data has become more complex due to its size. In recent years, several techniques have been presented to retrieve the information from big data. The proposed hybrid classifier model CSDHAP, the hybridised form of sun flower optimisation (SFO) and deer hunting optimisation (DHO) algorithms with adaptive pollination rate using MapReduce framework. The CSDHAP is a data classification technique that performed using classifiers. The results of the presented approach are evaluated over the extant approaches using various metrics namely, F1-score, specificity, NPV, accuracy, FNR, FDR, sensitivity, precision, FPR, and MCC. It is pertinent to mention that, the proposed model is better than any of the traditional models. The proposed HC+CSDHAP model attained better precision value than other traditional models like RNN, SVM, CNN, Bi-LSTM, NB, LSTM, and DBN, correspondingly. Journal: Int. J. of Data Mining, Modelling and Management Pages: 23-48 Issue: 1 Volume: 16 Year: 2024 Keywords: big data classification; MapReduce framework; long short-term memory; LSTM; deep belief network; DBN; optimisation. File-URL: http://www.inderscience.com/link.php?id=136219 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:1:p:23-48 Template-Type: ReDIF-Article 1.0 Author-Name: Ivona Lipovac Author-X-Name-First: Ivona Author-X-Name-Last: Lipovac Author-Name: Marina Bagić Babac Author-X-Name-First: Marina Bagić Author-X-Name-Last: Babac Title: Developing a data pipeline solution for big data processing Abstract: This paper presents a comprehensive exploration of the concept of big data and its management while highlighting the challenges that arise in the process. The study showcases the development of a data pipeline, designed to facilitate big data collection, integration, and analysis while addressing state-of-the-art challenges, methods, tools, and technologies. Emphasis is placed on pipeline flexibility, with a view towards enabling ease of implementation of architecture changes, seamless integration of new sources, and straightforward implementation of additional transformations in existing pipelines as needed. The pipeline architecture is discussed in detail, with a focus on its design principles, components, and implementation details, as well as the mechanisms used to ensure its reliability, scalability, and performance. Results from a range of experiments demonstrate the pipeline's effectiveness in addressing the challenges of big data management and analysis, as well as its robustness and versatility in accommodating diverse data sources and processing requirements. This study provides insights into the critical role of data pipelines in enabling effective big data management and showcases the importance of flexibility in pipeline design to ensure adaptability to evolving data processing needs. Journal: Int. J. of Data Mining, Modelling and Management Pages: 1-22 Issue: 1 Volume: 16 Year: 2024 Keywords: big data; data pipeline; data processing; data analysis; cloud computing. File-URL: http://www.inderscience.com/link.php?id=136221 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:1:p:1-22 Template-Type: ReDIF-Article 1.0 Author-Name: Syed Azeem Inam Author-X-Name-First: Syed Azeem Author-X-Name-Last: Inam Author-Name: Daniyal Iqbal Author-X-Name-First: Daniyal Author-X-Name-Last: Iqbal Author-Name: Hassan Hashim Author-X-Name-First: Hassan Author-X-Name-Last: Hashim Author-Name: Mansoor Ahmed Khuhro Author-X-Name-First: Mansoor Ahmed Author-X-Name-Last: Khuhro Title: An empirical approach towards detection of tuberculosis using deep convolutional neural network Abstract: Tuberculosis remains among the top disease, causing death all over the globe and its timely detection is a major concern for medical practitioners, especially after the emergence of the SARS-CoV-2 pandemic. Even with the recent advances in the methods for medical image classification, it is still challenging to diagnose tuberculosis without considering the associated historical and biological factors. There has been a great contribution of unsupervised learning in the development of techniques for image classification and the present study has utilised a deep convolutional neural network for detecting tuberculosis. It proposes a network comprising 54 layers having 59 connections. After computations, our proposed deep convolutional neural network attained an accuracy of 99.79%, 99.46%, and 99.5% for the classes of healthy, sick, and tuberculosis (TB) respectively for a public dataset, achieving higher accuracy as compared to other pre-trained network models. Journal: Int. J. of Data Mining, Modelling and Management Pages: 101-112 Issue: 1 Volume: 16 Year: 2024 Keywords: tuberculosis; image classification; deep convolutional neural network; DCNN; accuracy; F1 score. File-URL: http://www.inderscience.com/link.php?id=136232 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:1:p:101-112 Template-Type: ReDIF-Article 1.0 Author-Name: Ali Vasfi Ağlarcı Author-X-Name-First: Ali Vasfi Author-X-Name-Last: Ağlarcı Author-Name: Cengiz Bal Author-X-Name-First: Cengiz Author-X-Name-Last: Bal Title: Effect of various factors on classification performance of ordinal logistic regression Abstract: The classification problem is the way in which a new observation belongs to a set of categories, using known features. For example, categorising e-mails as necessary or unnecessary, or finding a diagnosis of a disease using a patient's various values (such as gender, blood pressure, presence of various symptoms). Various methods are used in classification processes. In this study, the classification performance of ordinal logistic regression, which is a statistical method, was investigated. It has been revealed how the classification success of the method changes when the data set properties change. For this, a simulation study was carried out by deriving data sets with different properties with the help of the R program. As a result of the simulation study, it was observed that the correlation structure in the data set, the sample size, the number and distribution of the response variable categories affected the classification performance of the method. Suggestions have been made to improve the classification performance of the ordinal logistic regression method. Journal: Int. J. of Data Mining, Modelling and Management Pages: 196-208 Issue: 2 Volume: 16 Year: 2024 Keywords: statistical learning; classification; ordinal data; simulation. File-URL: http://www.inderscience.com/link.php?id=138813 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:2:p:196-208 Template-Type: ReDIF-Article 1.0 Author-Name: P.V.N. Rajeswari Author-X-Name-First: P.V.N. Author-X-Name-Last: Rajeswari Author-Name: M. Shashi Author-X-Name-First: M. Author-X-Name-Last: Shashi Title: Intrusion detection system using statistical query tree with hierarchical clustering approach Abstract: The internet has become a major part of everyone's life. When no proper protection is provided, intruders misuse the access provided by the internet, leading to an increased risk of sensitive data leakage. To have a trade-off between scalability and precision, this research introduces a novel two-stage screening framework for intrusion detection systems (IDS) to identify the attacks and their types. The first stage aims to identify suspicious internet protocol (IP) addresses based on the abrupt deviation from the normal activity pattern. The second screening stage aims to analyse the packets received from suspicious IP addresses by applying a recently developed single-phase statistical hierarchical clustering (SHiC) algorithm designed for clustering and outlier detection. The data packets are classified as outliers based on their higher statistic distance to the existing components or clusters identified. The complete IDS framework is developed and applied to two benchmark datasets and compared with the results produced by several outlier detection algorithms. The proposed framework is found to be consistently more accurate in detecting attacks. Journal: Int. J. of Data Mining, Modelling and Management Pages: 176-195 Issue: 2 Volume: 16 Year: 2024 Keywords: statistical query tree; intrusion detection system; IDS; outlier; statistical hierarchical clustering; SHiC; cyber-attack; CICIDS-2017. File-URL: http://www.inderscience.com/link.php?id=138822 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:2:p:176-195 Template-Type: ReDIF-Article 1.0 Author-Name: Hamed Khosravi Author-X-Name-First: Hamed Author-X-Name-Last: Khosravi Author-Name: Mohammad Reza Shafie Author-X-Name-First: Mohammad Reza Author-X-Name-Last: Shafie Author-Name: Morteza Hajiabadi Author-X-Name-First: Morteza Author-X-Name-Last: Hajiabadi Author-Name: Ahmed Shoyeb Raihan Author-X-Name-First: Ahmed Shoyeb Author-X-Name-Last: Raihan Author-Name: Imtiaz Ahmed Author-X-Name-First: Imtiaz Author-X-Name-Last: Ahmed Title: Chatbots and ChatGPT: a bibliometric analysis and systematic review of publications in Web of Science and Scopus databases Abstract: This paper presents a bibliometric analysis of the scientific literature related to chatbots, focusing specifically on ChatGPT. Chatbots have gained increasing attention recently, with an annual growth rate of 19.16% and 27.19% on the Web of Sciences (WoS) and Scopus, respectively. The research consists of two study phases: 1) an analysis of chatbot literature; 2) a comprehensive review of scientific documents on ChatGPT. In the first phase, a bibliometric analysis is conducted on all the published literature from both Scopus (5,839) and WoS (2,531) databases covering the period from 1998 to 2023. Consequently, bibliometric analysis has been carried out on ChatGPT publications, and 45 published studies have been analysed thoroughly based on their methods, novelty, and conclusions. Overall, the study aims to provide guidelines for researchers to conduct their research more effectively in the field of chatbots and specifically highlight significant areas for future investigation into ChatGPT. Journal: Int. J. of Data Mining, Modelling and Management Pages: 113-147 Issue: 2 Volume: 16 Year: 2024 Keywords: chatbot; ChatGPT; bibliometrics; artificial intelligence; natural language processing; NLP; generative artificial intelligence. File-URL: http://www.inderscience.com/link.php?id=138824 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:2:p:113-147 Template-Type: ReDIF-Article 1.0 Author-Name: Thouraya Sakouhi Author-X-Name-First: Thouraya Author-X-Name-Last: Sakouhi Author-Name: Jalel Akaichi Author-X-Name-First: Jalel Author-X-Name-Last: Akaichi Title: Clustering-based multidimensional sequential pattern mining of semantic trajectories Abstract: Knowledge discovery from mobility data is about identifying behaviours from trajectories. In fact, mining masses of trajectories is required to have an overview of this data, notably, investigate the relationship between different entities movement. Most state-of-the-art work in this issue operates on raw trajectories. Nevertheless, behaviours discovered from raw trajectories are not as rich and meaningful as those discovered from semantic trajectories. In this paper, we establish a mining approach to extract patterns from semantic trajectories. We propose to apply sequential pattern mining based on a pre-processing step of clustering to alleviate the former's temporal complexity. Mining considers the spatial and temporal dimensions at different levels of granularity providing then richer and more insightful patterns about humans behaviour. We evaluate our work on tourists semantic trajectories in Kyoto. Results showed the effectiveness and efficiency of our model compared to state-of-the-art work. Journal: Int. J. of Data Mining, Modelling and Management Pages: 148-175 Issue: 2 Volume: 16 Year: 2024 Keywords: mobility data; trajectories; semantic modelling; sequential pattern mining; clustering; mobility pattern. File-URL: http://www.inderscience.com/link.php?id=138825 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:2:p:148-175 Template-Type: ReDIF-Article 1.0 Author-Name: Kamel Abdellaoui Author-X-Name-First: Kamel Author-X-Name-Last: Abdellaoui Author-Name: Mohamed Ali Hadj Taieb Author-X-Name-First: Mohamed Ali Hadj Author-X-Name-Last: Taieb Author-Name: Rafik Mahjoubi Author-X-Name-First: Rafik Author-X-Name-Last: Mahjoubi Author-Name: Mohamed Ben Aouicha Author-X-Name-First: Mohamed Ben Author-X-Name-Last: Aouicha Title: Data-driven journey: a data management paradigm-centric review and data mesh capabilities Abstract: Becoming data driven is one of the top strategic objectives of data-rich organisations. Africa must join the wave to capture and unlock the highest value from data. Therefore, this survey analyses the drivers, challenges, and evolution, of existing data management paradigms including data warehouse, data lake and data lakehouse. It reveals the limitations of monolithic approaches to address data at scale and how they led to a paradigm shift toward a more distributed and decentralised data mesh. The paper discusses data mesh capabilities to address the challenges of data availability and accessibility at scale in Africa to enable leapfrog development in its journey to being data driven. Journal: Int. J. of Data Mining, Modelling and Management Pages: 209-243 Issue: 2 Volume: 16 Year: 2024 Keywords: data-driven; data management paradigms; data mesh; analytics; developing countries. File-URL: http://www.inderscience.com/link.php?id=138865 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:2:p:209-243 Template-Type: ReDIF-Article 1.0 Author-Name: Mohammed Alweshah Author-X-Name-First: Mohammed Author-X-Name-Last: Alweshah Author-Name: Ghadeer Ahmad Alhebaishan Author-X-Name-First: Ghadeer Ahmad Author-X-Name-Last: Alhebaishan Author-Name: Sofian Kassaymeh Author-X-Name-First: Sofian Author-X-Name-Last: Kassaymeh Author-Name: Saleh Alkhalaileh Author-X-Name-First: Saleh Author-X-Name-Last: Alkhalaileh Author-Name: Mohammed Ababneh Author-X-Name-First: Mohammed Author-X-Name-Last: Ababneh Title: Improving intrusion detection in the IoT with African vultures optimisation algorithm-based feature selection Abstract: The security of the system may be jeopardised by unsecured data transmitted through IoT devices, and ensuring the reliability of data is critical to maintaining the integrity of information over the internet. To enhance the intrusion detection rate, several investigations have been conducted to develop methodologies capable of identifying the minimum required secure features. One such method is the use of the feature selection procedure with metaheuristic algorithms. In this study, the African vulture optimisation algorithm was used in two wrapper FS approaches to select the most secure features in IoT. The first approach used AVO, while the second employed OBL-AVO, a hybrid model combining AVO with opposition-based learning (OBL) to enhance exploration. Based on the outcomes, it was found that the OBL-AVO is superior to the AVO in enhancing FS. Furthermore, the proposed methods' were evaluated and compared to four recent approaches. Journal: Int. J. of Data Mining, Modelling and Management Pages: 293-325 Issue: 3 Volume: 16 Year: 2024 Keywords: intrusion detection; internet of things; IoT; feature selection; hybrid metaheuristics; African vultures optimisation algorithm; AVO; opposition-based learning; OBL. File-URL: http://www.inderscience.com/link.php?id=140529 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:3:p:293-325 Template-Type: ReDIF-Article 1.0 Author-Name: Moumita Ghosh Author-X-Name-First: Moumita Author-X-Name-Last: Ghosh Author-Name: Sourav Mondal Author-X-Name-First: Sourav Author-X-Name-Last: Mondal Author-Name: Harshita Moondra Author-X-Name-First: Harshita Author-X-Name-Last: Moondra Author-Name: Dina Tri Utari Author-X-Name-First: Dina Tri Author-X-Name-Last: Utari Author-Name: Anirban Roy Author-X-Name-First: Anirban Author-X-Name-Last: Roy Author-Name: Kartick Chandra Mondal Author-X-Name-First: Kartick Chandra Author-X-Name-Last: Mondal Title: An irregular CLA-based novel frequent pattern mining approach Abstract: Frequent itemset mining has received a lot of attention in the field of data mining. Its main objective is to find groups of items that consistently appear together in datasets. Even while frequent itemset mining is useful, the algorithms for mining frequent itemsets have quite high resource requirements. In order to optimise the time and memory needs, a few improvements have been made in recent years. This study proposes CellFPM, a straightforward yet effective cellular learning automata-based method for finding frequent itemset occurrences. It works efficiently with large datasets. The efficiency of the proposed approach in time and memory requirements has been evaluated using benchmark datasets explicitly designed for performance measure. The varying size and density of the test datasets have confirmed the scalability of the suggested method. The findings show that CellFPM consistently surpasses the leading algorithms in terms of runtime and memory usage, particularly memory usage mostly. Journal: Int. J. of Data Mining, Modelling and Management Pages: 268-292 Issue: 3 Volume: 16 Year: 2024 Keywords: cellular learning automata; CLA; frequent itemsets; data mining; knowledge discovery. File-URL: http://www.inderscience.com/link.php?id=140536 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:3:p:268-292 Template-Type: ReDIF-Article 1.0 Author-Name: Shengjuan Zhao Author-X-Name-First: Shengjuan Author-X-Name-Last: Zhao Author-Name: Gyoogun Lim Author-X-Name-First: Gyoogun Author-X-Name-Last: Lim Title: A comparative analysis of user attitudes towards ICO and IEO in blockchain projects: insights from social media big data Abstract: This study conducts a comparative analysis of two popular crowdfunding methods in the blockchain market, the initial coin offering (ICO) and the initial exchange offering (IEO) models. Using project names as keywords, we collected and analysed big data, applying techniques such as TF-IDF, LDA, social network analysis, and sentiment analysis. Our findings show that the attitude of target groups towards ICO and IEO projects is not significantly different, although IEO targets exhibit more interest in entertainment-related topics. Social network analysis reveals that the ICO target group is more sensitive to popular elements, such as pop singers, while the IEO target group is more interested in soccer competitions. Both projects show a strong interest in the US election. Our study suggests that IEO, as an upgraded financing model of ICO, does not enjoy high levels of trust from the market crowd. By identifying the preferences of the target groups for both models through multiple analyses, we recommend that these preferences be taken into consideration to improve the efficiency of targeted marketing. Journal: Int. J. of Data Mining, Modelling and Management Pages: 245-267 Issue: 3 Volume: 16 Year: 2024 Keywords: blockchain; big data; token issuance; initial coin offering; ICO; initial exchange offering; initial exchange offering; IEO. File-URL: http://www.inderscience.com/link.php?id=140539 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:3:p:245-267 Template-Type: ReDIF-Article 1.0 Author-Name: B. Sivaiah Author-X-Name-First: B. Author-X-Name-Last: Sivaiah Author-Name: R. Rajeswara Rao Author-X-Name-First: R. Rajeswara Author-X-Name-Last: Rao Title: A node sets based fast and scalable frequent itemset algorithm for mining big data using map reduce paradigm Abstract: Big data is rapidly growing, making traditional tools inefficient for handling large amounts of data. Existing algorithms for frequent itemset mining struggle with scalability due to limitations in parallel processing power. In this paper, we proposed a fast and scalable frequent itemset mining (FSFIM) algorithm used to generate frequent item sets from huge data. Preorder coding (POC) trees and Nodeset data structures save half the memory of node-lists and N-lists. The FSFIM uses Cloudera's CDH Map Reduce framework. With a maximum speedup value of 1.85 when minimal support is set to 1, The experimental results reveal that FSFIM outperforms the state-of-the-art methods such as HBPFP, Mlib PFP, and Big FIM. Fast and scalable frequent itemset mining algorithm is more scalable and faster for mining frequent item sets from big data. Journal: Int. J. of Data Mining, Modelling and Management Pages: 326-343 Issue: 3 Volume: 16 Year: 2024 Keywords: big data; frequent itemset mining; FIM; MapReduce paradigm; fast and scalable frequent itemset mining; FSFIM. File-URL: http://www.inderscience.com/link.php?id=140540 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:3:p:326-343 Template-Type: ReDIF-Article 1.0 Author-Name: Halima Drissi Touzani Author-X-Name-First: Halima Drissi Author-X-Name-Last: Touzani Author-Name: Sanaa Faquir Author-X-Name-First: Sanaa Author-X-Name-Last: Faquir Author-Name: Ali Yahyaouy Author-X-Name-First: Ali Author-X-Name-Last: Yahyaouy Title: Data mining techniques along with fuzzy logic control to find solutions to road traffic accidents: case study in Morocco Abstract: Collecting data on road accidents is important. However, it is equally important to analyse and process this data to prevent future accidents. Data analysis can provide valuable insights and help identify patterns, contributing to the development of effective strategies and interventions to improve road safety. Over years, many efforts in research have tackled several causes related to traffic accidents trying to identify risk factors. Different statistics identified that most accidents are due to human errors. In Morocco, a lot of studies have been applied to cars system to become automatic or semi-automatic to avoid serious injuries due to poor driving practices. This paper presents data mining techniques applied on real traffic accidents data using statistical analysis, K-means clustering algorithm and fuzzy logic. The data represents accidents that happened in Morocco during 2014. Results showed important features that caused previous accidents which was used to implement an algorithm based on fuzzy logic to train a semi-autonomous car to make right decisions whenever needed and therefore, prevent accidents from happening. Journal: Int. J. of Data Mining, Modelling and Management Pages: 344-357 Issue: 3 Volume: 16 Year: 2024 Keywords: data analysis; data mining techniques; road traffic accidents; semi-autonomous cars; fuzzy logic control; decision algorithm; statistical methods; Morocco. File-URL: http://www.inderscience.com/link.php?id=140542 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:16:y:2024:i:3:p:344-357