Template-Type: ReDIF-Article 1.0
Author-Name: Elsayed A.H. Elamir
Author-X-Name-First: Elsayed A.H.
Author-X-Name-Last: Elamir
Title: Data analytics for gross domestic product using random forest and extreme gradient boosting approaches: an empirical study
Abstract:
This study aims to use the random forest and extreme gradient boosting approaches to forecast and analyse gross domestic product per capita using data from World Bank development indicators on countries level over the period 2010 to 2017. The comprehensive comparisons are executed using years before 2017 as training data and year 2017 as testing data. The root mean squares error, and the coefficient of determination are used to judge among the different models. The random forest and extreme gradient boosting achieve accuracy 97.8% and 98.1%, respectively, using coefficient of determination. The results suggest that the investment in education, labour, health, and industry as well as decreasing in inflation, interest, unemployment is necessary to enhance gross domestic product per capita. Motivating results are given by two-way interaction measure that is useful in explaining co-dependencies in the model behaviour. The strongest interactions are between trade-technology, technology-education followed by consumption-health.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 269-286
Issue: 3
Volume: 14
Year: 2022
Keywords: bagging; boosting; business analytics; forecast; gross domestic product; GDP; machine learning.
File-URL: http://www.inderscience.com/link.php?id=125258
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:3:p:269-286
Template-Type: ReDIF-Article 1.0
Author-Name: Fedor Krasnov
Author-X-Name-First: Fedor
Author-X-Name-Last: Krasnov
Author-Name: Mikhail Shvartsman
Author-X-Name-First: Mikhail
Author-X-Name-Last: Shvartsman
Author-Name: Alexander Dimentov
Author-X-Name-First: Alexander
Author-X-Name-Last: Dimentov
Title: Comparing text corpora via topic modelling
Abstract:
A method is developed for conducting comparative analysis on the content of full text patents collections. Named T4C, the approach is based on topic modelling and machine learning and extends comparative text mining. The idea of T4C was inspired by the possibility of precise topics extracting from a joint collection of texts and following analysing the parts of collection on the topics. The different aspects of meta information of the patents full texts collection are considered. The ownership of a patent in a particular country can be identified with an accuracy of 97.5% by using supervised machine learning. By studying how patents vary with time, those belonging to a specific period can be identified with an accuracy of 85% for a given country. Also developed is a visual representation of the thematic correlation between groups of patents. In terms of the text composition of patent descriptions, Chinese patents differ fundamentally from US patents. T4C method is valid for structured medium-sized collections of texts in English. The experimental results are used to manage the patenting process at GazpromNeft STC.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 203-216
Issue: 3
Volume: 14
Year: 2022
Keywords: topic modelling; text classification; ARTM; additive regularisation of topic models; PLSA; random forest; comparing text collections.
File-URL: http://www.inderscience.com/link.php?id=125259
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:3:p:203-216
Template-Type: ReDIF-Article 1.0
Author-Name: Sadaf Kabir
Author-X-Name-First: Sadaf
Author-X-Name-Last: Kabir
Author-Name: Leily Farrokhvar
Author-X-Name-First: Leily
Author-X-Name-Last: Farrokhvar
Title: Nonlinear gradient-based feature selection for precise prediction of diseases
Abstract:
Developing accurate predictive models can profoundly help healthcare providers improve the quality of their services. However, medical data often contain several variables, and not all the data equally contribute towards the prediction. The existence of irrelevant and redundant features in a dataset can unnecessarily increase computational cost and complexity while deteriorating the performance of the predictive model. In this study, we employ the gradient-based prediction attribution as a general tool to identify important features in differentiable predictive models, such as neural networks (NN) and linear regression. Built upon this approach, we analyse single-stage and multi-stage scenarios for feature selection using ten medical datasets. Through extensive experiments, we demonstrate that the combination of the gradient-based approach with NN provides a powerful nonlinear technique to identify important features contributing to the prediction. In particular, nonlinear gradient-based feature selection achieves competitive results or significant improvements over previously reported results on all datasets.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 248-268
Issue: 3
Volume: 14
Year: 2022
Keywords: machine learning; feature selection; neural networks; logistic regression; disease prediction models; healthcare data.
File-URL: http://www.inderscience.com/link.php?id=125260
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:3:p:248-268
Template-Type: ReDIF-Article 1.0
Author-Name: Roberto Bertolini
Author-X-Name-First: Roberto
Author-X-Name-Last: Bertolini
Author-Name: Stephen J. Finch
Author-X-Name-First: Stephen J.
Author-X-Name-Last: Finch
Title: Synergistic effects between data corpora properties and machine learning performance in data pipelines
Abstract:
To analyse data, a computationally feasible pipeline must be developed for data modelling. Corpora properties affect performance variability of machine learning (ML) techniques in pipelines; however, this has not been thoroughly investigated using simulation methodologies. A Monte Carlo study is used to compare differences in the area under the curve (AUC) metric for large-<i>n</i>-small-<i>p</i>-corpora examining: 1) the choice of ML algorithm; 2) size of the training database; 3) measurement error; 4) class imbalance magnitude; 5) missing data pattern. Our simulations are consistent with established results under which these algorithms and corpora properties perform best, while providing insights into their synergistic effects. Measurement error negatively impacted pipeline performance across all corpora factors and ML algorithms. A larger training corpus ameliorated the decrease in predictive efficacy resulting from measurement error, class imbalance magnitudes, and missing data patterns. We discuss the implications of these findings for designing pipelines to enhance prediction performance.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 217-233
Issue: 3
Volume: 14
Year: 2022
Keywords: data pipeline; interaction/synergistic effects; Monte Carlo simulation; machine learning; binary classification; area under the curve; AUC.
File-URL: http://www.inderscience.com/link.php?id=125261
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:3:p:217-233
Template-Type: ReDIF-Article 1.0
Author-Name: Paúl Cumba-Armijos
Author-X-Name-First: Paúl
Author-X-Name-Last: Cumba-Armijos
Author-Name: Diego Riofrío-Luzcando
Author-X-Name-First: Diego
Author-X-Name-Last: Riofrío-Luzcando
Author-Name: Verónica Rodríguez-Arboleda
Author-X-Name-First: Verónica
Author-X-Name-Last: Rodríguez-Arboleda
Author-Name: Joe Carrión-Jumbo
Author-X-Name-First: Joe
Author-X-Name-Last: Carrión-Jumbo
Title: Detecting cyberbullying in Spanish texts through deep learning techniques
Abstract:
Recent recollected data suggests that it is possible to automatically detect events that may negatively affect the most vulnerable parts of our society, by using any communication technology like social networks or messaging applications. This research consolidates and prepares a corpus with Spanish bullying expressions taken from Twitter in order to use them as an input to train a convolutional neuronal network through deep learning techniques. As a result of this training, a predictive model was created, which can identify Spanish cyberbullying expressions such as insults, racism, homophobic attacks, and so on.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 234-247
Issue: 3
Volume: 14
Year: 2022
Keywords: cyberbullying; deep learning; convolutional neuronal network; Spanish; social networks.
File-URL: http://www.inderscience.com/link.php?id=125265
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:3:p:234-247
Template-Type: ReDIF-Article 1.0
Author-Name: Essam Alhroob
Author-X-Name-First: Essam
Author-X-Name-Last: Alhroob
Author-Name: Mohammed Falah Mohammed
Author-X-Name-First: Mohammed Falah
Author-X-Name-Last: Mohammed
Author-Name: Fadhl Hujainah
Author-X-Name-First: Fadhl
Author-X-Name-Last: Hujainah
Author-Name: Osama Nayel Al Sayaydeh
Author-X-Name-First: Osama Nayel Al
Author-X-Name-Last: Sayaydeh
Author-Name: Ngahzaifa Ab Ghani
Author-X-Name-First: Ngahzaifa Ab
Author-X-Name-Last: Ghani
Title: Investigation of contraction process issue in fuzzy min-max models
Abstract:
The fuzzy min-max (FMM) network is one of the most powerful neural networks. It combines a neural network and fuzzy sets into a unified framework to address pattern classification problems. The FMM consists of three main learning processes, namely, hyperbox contraction, hyperbox expansion and hyperbox overlap tests. Despite its various learning processes, the contraction process is considered as one of the major challenges in the FMM that affects the classification process. Thus, this study aims to investigate the FMM contraction process precisely to highlight its usage consequences during the learning process. Such investigation can assist practitioners and researchers in obtaining a better understanding about the consequences of using the contraction process on the network performance. Findings of this study indicate that the contraction process used in FMM can affect network performance in terms of misclassification and incapability in handling the membership ambiguity of the overlapping regions.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 1-14
Issue: 1
Volume: 14
Year: 2022
Keywords: pattern classification; fuzzy min-max; FMM models; contraction process.
File-URL: http://www.inderscience.com/link.php?id=122034
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:1:p:1-14
Template-Type: ReDIF-Article 1.0
Author-Name: Taiseer Abdalla Elfadil Eisa
Author-X-Name-First: Taiseer Abdalla Elfadil
Author-X-Name-Last: Eisa
Title: Plagiarism detection of figure images in scientific publications
Abstract:
Plagiarism is stealing others' work using their words directly or indirectly without a credit citation. Copying others' ideas is another type of plagiarism that may occur in many areas but the most serious one is the academic plagiarism. Therefore, technical solutions are urgently required for automatic detection of idea plagiarism. Detection of figure plagiarism is a particularly challenging field of research, because not only the text analytics but also graphic features need to be analysed. This paper investigates the issues of idea and figure plagiarism and proposes a detection method which copes with both text and structure change. The procedure depends on finding similar semantic meanings between figures by applying image processing and semantic mapping techniques. The figures were compared using the representation of shape features based on detailed comparisons between the components of figures. This is an improvement over existing methods, which only compare the numbers and types of shapes inside figures.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 15-29
Issue: 1
Volume: 14
Year: 2022
Keywords: plagiarism detection; figure plagiarism detection; idea plagiarism detection; academic plagiarism; image processing; semantic mapping techniques; content-based algorithms.
File-URL: http://www.inderscience.com/link.php?id=122036
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:1:p:15-29
Template-Type: ReDIF-Article 1.0
Author-Name: Nuhu Yusuf
Author-X-Name-First: Nuhu
Author-X-Name-Last: Yusuf
Author-Name: Mohd Amin Mohd Yunus
Author-X-Name-First: Mohd Amin Mohd
Author-X-Name-Last: Yunus
Author-Name: Norfaradilla Wahid
Author-X-Name-First: Norfaradilla
Author-X-Name-Last: Wahid
Author-Name: Aida Mustapha
Author-X-Name-First: Aida
Author-X-Name-Last: Mustapha
Author-Name: Nazri Mohd Nawi
Author-X-Name-First: Nazri Mohd
Author-X-Name-Last: Nawi
Author-Name: Noor Azah Samsudin
Author-X-Name-First: Noor Azah
Author-X-Name-Last: Samsudin
Title: Arabic text semantic-based query expansion
Abstract:
Query expansions are being used in many search applications for retrieving relevant documents. Although retrieving the relevant documents are important for search users, the complexity of Arabic morphology remains a challenge. As such, many irrelevant documents were still retrieved from the ranked results. To address this challenge, this paper proposes a new searching method for Arabic text semantic-based query expansion. The proposed method combines Arabic word synonyms and ontology to expand the query with additional terms. Specifically, the proposed method combined lexical words within the ranking algorithm and then improved with ontology links to expand query. The performance of Arabic text semantic-based query expansion was evaluated in terms of average precision, means average precision and means reciprocal rank. Experiments on Quran datasets show that the proposed method using Arabic text semantic-based query expansion approach outperforms the previous methods using other dataset which is called Tafsir dataset. The proposed method achieved 15.44% mean average precision.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 30-40
Issue: 1
Volume: 14
Year: 2022
Keywords: Arabic text; semantic search; query expansion; lexical words; ontology; ranking algorithms.
File-URL: http://www.inderscience.com/link.php?id=122037
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:1:p:30-40
Template-Type: ReDIF-Article 1.0
Author-Name: Talal Almutiri
Author-X-Name-First: Talal
Author-X-Name-Last: Almutiri
Author-Name: Faisal Saeed
Author-X-Name-First: Faisal
Author-X-Name-Last: Saeed
Title: A hybrid feature selection method combining Gini index and support vector machine with recursive feature elimination for gene expression classification
Abstract:
Microarray datasets are suffering from a curse of dimensionality, because of a large number of genes and low numbers of samples, wherefore, the high dimensionality leads to computational cost and complexity. Consequently, feature selection (FS) is the process of choosing informative genes that could help in improving the effectiveness of classification. In this study, a hybrid feature selection was proposed, which combines the Gini index and support vector machine with recursive feature elimination (GI-SVM-RFE), calculates a weight for each gene and recursively selects only ten genes to be the informative genes. To measure the impact of the proposed method, the experiments include four scenarios: baseline without feature selection, GI feature selection, SVM-RFE feature selection, and combining GI with SVM-RFE. In this paper, 11 microarray datasets were used. The proposed method showed an improvement in terms of classification accuracy when compared with other previous studies.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 41-62
Issue: 1
Volume: 14
Year: 2022
Keywords: classification; feature selection; gene expression; Gini index; microarray; recursive feature elimination.
File-URL: http://www.inderscience.com/link.php?id=122038
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:1:p:41-62
Template-Type: ReDIF-Article 1.0
Author-Name: Hema Dubey
Author-X-Name-First: Hema
Author-X-Name-Last: Dubey
Author-Name: Nilay Khare
Author-X-Name-First: Nilay
Author-X-Name-Last: Khare
Title: Fast parallel computation of PageRank scores with improved convergence time
Abstract:
PageRank is a conspicuous link-based approach used by many search engines in order to rank its search results. PageRank algorithm is based on performing iterations for calculating PageRank of web pages until the convergent point is met. The computational cost of this algorithm is very high for very large web graphs. So to overcome this drawback, in this paper we have proposed a fast parallel computation of PageRank which uses standard deviation technique to normalise the PageRank score of each web page. The proposed work is experimented on standard datasets taken from Stanford large network dataset collection, on a machine having multicore architecture using CUDA programming paradigm. We observed from the experiments that the proposed fast parallel PageRank algorithm needs lesser number of iterations to converge as compared to existing parallel PageRank method. We also determined that there is a speed up of about 2 to 10 for nine different standard datasets for the proposed algorithm over the existing algorithm.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 63-88
Issue: 1
Volume: 14
Year: 2022
Keywords: PageRank; normalisation; standard deviation; parallel computation; graphics processing unit; GPU; compute unified device architecture; CUDA.
File-URL: http://www.inderscience.com/link.php?id=122039
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:1:p:63-88
Template-Type: ReDIF-Article 1.0
Author-Name: Wael K. Hanna
Author-X-Name-First: Wael K.
Author-X-Name-Last: Hanna
Author-Name: Rasha Elstohy
Author-X-Name-First: Rasha
Author-X-Name-Last: Elstohy
Author-Name: Nouran M. Radwan
Author-X-Name-First: Nouran M.
Author-X-Name-Last: Radwan
Title: Prediction of air pollution and analysis of its effects on the pollution dispersion of PM10 in Egypt using machine learning algorithms
Abstract:
Air pollution has been considered as one of the serious threats in Egypt. According to a study in <i>Environmental Science & Technology Letters</i> journal, air pollution is one of the main responsible for shortening Egyptians lives by 1.85 years. The main cause of air pollution in Egypt is PM<SUB align="right"><SMALL>10</SMALL></SUB> which comes from industrial processes. PM<SUB align="right"><SMALL>10</SMALL></SUB> concentrations exceed daily average concentrations during 98% of the measurement period. In this paper, we will apply machine learning classification algorithms to build the most accurate model for air pollution prediction and analysing its effects on pollution dispersion of PM<SUB align="right"><SMALL>10</SMALL></SUB>. The proposed classification model begins with air quality data collection and pre-processing, and then classifying process to discover the main relevant features for prediction. Experimental results show a good performance of the proposed air quality model. Random forest and naïve Bayes algorithms achieved accuracy almost 82%, and JRip and fuzzy classifier achieved less classification results accuracy 65%, 76% respectively.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 358-371
Issue: 4
Volume: 14
Year: 2022
Keywords: air pollution; PM10; classification model; machine learning algorithms; Egypt.
File-URL: http://www.inderscience.com/link.php?id=126662
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:4:p:358-371
Template-Type: ReDIF-Article 1.0
Author-Name: Ikram Nekkache
Author-X-Name-First: Ikram
Author-X-Name-Last: Nekkache
Author-Name: Said Jabbour
Author-X-Name-First: Said
Author-X-Name-Last: Jabbour
Author-Name: Nadjet Kamel
Author-X-Name-First: Nadjet
Author-X-Name-Last: Kamel
Author-Name: Lakhdar Sais
Author-X-Name-First: Lakhdar
Author-X-Name-Last: Sais
Title: Detecting and exploiting symmetries in sequential pattern mining
Abstract:
In this paper, we introduce a new framework for discovering and using symmetries in sequential pattern mining tasks. Symmetries are permutations between items that leave invariant the sequential database. Symmetries present several potential benefits. They can be seen as a new kind of structural patterns expressing regularities and similarities between items. As symmetries induce a partition of the sequential patterns into equivalent classes, exploiting them would allow to improve the pattern enumeration process, while reducing the size of the output. To this end, we first address the problem of symmetry discovery from database of sequences. Then, we first show how Apriori-like algorithms can be enhanced by dynamic integration of the detected symmetries. Secondly, we provide a second symmetry breaking approach allowing to eliminate symmetries in a pre-processing step by reformulating the sequential database of transactions. Our experiments clearly show that several sequential pattern mining datasets contain such symmetry-based regularities. We also experimentally demonstrate that using such symmetries would results in significant reduction of the search space on some datasets.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 309-334
Issue: 4
Volume: 14
Year: 2022
Keywords: data mining; sequential pattern mining; symmetries.
File-URL: http://www.inderscience.com/link.php?id=126663
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:4:p:309-334
Template-Type: ReDIF-Article 1.0
Author-Name: Mehmet Özçalıcı
Author-X-Name-First: Mehmet
Author-X-Name-Last: Özçalıcı
Author-Name: Ayşe Tuğba Dosdoğru
Author-X-Name-First: Ayşe Tuğba
Author-X-Name-Last: Dosdoğru
Author-Name: Aslı Boru İpek
Author-X-Name-First: Aslı Boru
Author-X-Name-Last: İpek
Author-Name: Mustafa Göçken
Author-X-Name-First: Mustafa
Author-X-Name-Last: Göçken
Title: Comparison of harmony search derivatives for artificial neural network parameter optimisation: stock price forecasting
Abstract:
This study has been conducted on forecasting, as accurately as possible, the next day's stock price using harmony search (HS) and its variants [improved harmony search (IHS), global-best harmony search (GHS), self-adaptive harmony search (SAHS), and intelligent tuned harmony Search (ITHS) together with artificial neural network (ANN)]. The advantage of the proposed models are that the useful information in the original stock data is found by input variable selection and simultaneously the most proper number of hidden neurons in hidden layer is discovered to mitigate overfitting/underfitting problem in ANN. The results have shown that forecasts made by HS-ANN, IHS-ANN, GHS-ANN, SAHS-ANN, and ITHS-ANN demonstrate a tendency to achieve hit rates above 89%, which is considerably better than previously proposed forecasting models in literature. Hence, ANN models provide more valuable forecasting results for investors to hedge against potential risk in stock markets.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 335-357
Issue: 4
Volume: 14
Year: 2022
Keywords: stock price forecasting; artificial neural network; harmony search and its variants.
File-URL: http://www.inderscience.com/link.php?id=126664
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:4:p:335-357
Template-Type: ReDIF-Article 1.0
Author-Name: Yuehua Duan
Author-X-Name-First: Yuehua
Author-X-Name-Last: Duan
Author-Name: Zbigniew W. Ras
Author-X-Name-First: Zbigniew W.
Author-X-Name-Last: Ras
Title: Recommendation system for improving churn rate based on action rules and sentiment mining
Abstract:
It is well recognised that customers are one of the most valuable assets to a company. Therefore, it is of significant value for companies to reduce the customer outflow. In this paper, we focus on identifying the customers with high chance of attrition and provide valid and trustworthy recommendations to improve their customer churn rate. To this end, we designed and implemented a recommender system that can provide actionable recommendations to improve customer churn rate. We used both transaction and survey data from heavy equipment repair and service sector from 2011 to 2017. This data was collected by a consulting company based in Charlotte, North Carolina. In the survey data, customers give their thoughts, feelings, expectations and complaints by freeform text. We applied aspect-based sentiment analysis on the review text data to gain insightful knowledge on customers' attitudes toward the service. Action rule mining and meta-action triggering mechanism are used to recognise the actionable strategies to help with reducing customer churn.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 287-308
Issue: 4
Volume: 14
Year: 2022
Keywords: action rule mining; meta-actions; aspect-based sentiment analysis; recommender system; reduct.
File-URL: http://www.inderscience.com/link.php?id=126665
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:4:p:287-308
Template-Type: ReDIF-Article 1.0
Author-Name: Khaled Benali
Author-X-Name-First: Khaled
Author-X-Name-Last: Benali
Title: Ontology and web usage mining for website maintenance
Abstract:
The web mining and the semantic web are closely linked: on the one hand, web-mining techniques help in the construction of the semantic web on the other hand; the semantic web helps extract new knowledge. The present article presents an approach that uses ontology and web usage mining techniques for website maintenance. This work can help novice researchers start working enriched based on the extracted patterns on the site logs using an algorithm for maintain the website.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 372-400
Issue: 4
Volume: 14
Year: 2022
Keywords: apriori; knowledge; log file; semantic web; ontology; web usage mining; WUM; website maintenance.
File-URL: http://www.inderscience.com/link.php?id=126666
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:4:p:372-400
Template-Type: ReDIF-Article 1.0
Author-Name: Amal Bouraoui
Author-X-Name-First: Amal
Author-X-Name-Last: Bouraoui
Author-Name: Salma Jamoussi
Author-X-Name-First: Salma
Author-X-Name-Last: Jamoussi
Author-Name: Abdelmajid Ben Hamadou
Author-X-Name-First: Abdelmajid Ben
Author-X-Name-Last: Hamadou
Title: A comprehensive review of deep learning for natural language processing
Abstract:
Deep learning has attracted considerable attention across many natural language processing (NLP) domains. Deep learning models aim to learn embeddings of data with multiple levels of abstraction through multiple layers for either labelled structured input data or unlabelled unstructured input data. Currently, two research trends have emerged in building higher level embeddings. On one hand, a strong trend in deep learning leads towards increasingly powerful and complex models. On the other hand, multi-purpose sentence representation based on simple sums or averages of word vectors was recently shown to be effective. Furthermore, improving the performance of deep learning methods by attention mechanism has become a research hotspot in the last four years. In this paper, we seek to provide a comprehensive review of recent studies in building neural network (NN) embeddings that have been applied to NLP tasks. We provide a walk-through of deep learning evolution and a description of a variety of its architectures. We present and compare the performance of several deep learning models on standard datasets about different NLP tasks. We also present some deep learning challenges for natural language processing.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 149-182
Issue: 2
Volume: 14
Year: 2022
Keywords: deep learning; word embedding; sentence embedding; attention mechanism; compositional models; convolutional neural networks; CNNs; recurrent/recursive NNs; multi-purpose sentence embedding; natural language processing; NLP.
File-URL: http://www.inderscience.com/link.php?id=123356
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:2:p:149-182
Template-Type: ReDIF-Article 1.0
Author-Name: Kei Nakagawa
Author-X-Name-First: Kei
Author-X-Name-Last: Nakagawa
Author-Name: Kenichi Yoshida
Author-X-Name-First: Kenichi
Author-X-Name-Last: Yoshida
Title: Time-series gradient boosting tree for stock price prediction
Abstract:
We propose a time-series gradient boosting tree for a dataset with time-series and cross-sectional attributes. Our time-series gradient boosting tree has weak learners with time-series and cross-sectional attributes in its internal node, and split examples based on similarity between a pair of time-series or impurity between cross-sectional attributes. Dissimilarity between a pair of time-series is defined by the dynamic time warping method. In other words, the decision tree is constructed based on the shape that the time-series is similar or not similar to its past shape. We conducted an empirical analysis using major world indices and confirmed that our time-series gradient boosting tree is superior to prior research methods in terms of both profitability and accuracy.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 110-125
Issue: 2
Volume: 14
Year: 2022
Keywords: dynamic time warping method; time-series decision tree; time-series gradient boosting tree; stock price prediction.
File-URL: http://www.inderscience.com/link.php?id=123357
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:2:p:110-125
Template-Type: ReDIF-Article 1.0
Author-Name: Kenan Mengüç
Author-X-Name-First: Kenan
Author-X-Name-Last: Mengüç
Author-Name: Tarık Küçükdeniz
Author-X-Name-First: Tarık
Author-X-Name-Last: Küçükdeniz
Title: Suggestion and solution of a mathematical model for determining effective routes in football
Abstract:
As obtaining data gets easier and cheaper with the help of technological achievements, data-based analytics and management have become an essential part of planning and decision making to achieve success in the sports industry. The study finds offensive routes for a team game using high-security data produced with technology. An analysis of a sport's team match was performed using seasonal data. A mathematical model has been developed for this analysis, discussing the effectiveness of the routes the model offers. This article aims to find the safe, efficient route for organising the football on the field. In addition, the study also offers an experimental proposal for this purpose.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 126-148
Issue: 2
Volume: 14
Year: 2022
Keywords: match strategy; tactics; optimisation; transshipment problem.
File-URL: http://www.inderscience.com/link.php?id=123358
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:2:p:126-148
Template-Type: ReDIF-Article 1.0
Author-Name: Germán Lescano
Author-X-Name-First: Germán
Author-X-Name-Last: Lescano
Author-Name: Rosanna Costaguta
Author-X-Name-First: Rosanna
Author-X-Name-Last: Costaguta
Author-Name: Analía Amandi
Author-X-Name-First: Analía
Author-X-Name-Last: Amandi
Title: Emotions recognition in synchronic textual CSCL situations
Abstract:
Computer-supported collaborative learning (CSCL) is a useful practice to teach learners working in groups and to acquire collaborative skills. To evaluate the collaborative process can be heavy for teachers because it implies to analyse a lot of interactions. One issue to consider is socio-affective interactions due to their influence in the learning process. In this work, we propose an approach to recognise affective states in synchronic textual CSCL situations of students that speak Spanish. Through experimentation, we analyse emotions manifested by university students of computer sciences when they worked in groups in these situations and we evaluated the proposed approach using tools and libraries available in the market to make a sentiment analysis. Results obtained are promising. Providing CSCL environments with a tool to recognise socio-affective interactions can be useful in order to help teachers evaluate this dimension of the collaborative process.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 183-202
Issue: 2
Volume: 14
Year: 2022
Keywords: computer-supported collaborative learning; CSCL; socio-affective interactions; affective computing.
File-URL: http://www.inderscience.com/link.php?id=123359
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:2:p:183-202
Template-Type: ReDIF-Article 1.0
Author-Name: Akash Gupta
Author-X-Name-First: Akash
Author-X-Name-Last: Gupta
Author-Name: Amir Gharehgozli
Author-X-Name-First: Amir
Author-X-Name-Last: Gharehgozli
Title: Developing a machine learning framework to determine the spread of COVID-19 in the USA using meteorological, social, and demographic factors
Abstract:
Coronavirus disease of 2019 (COVID-19) has become a pandemic in the matter of a few months, since the outbreak in December 2019 in Wuhan, China. We study the impact of weather factors including temperature and pollution on the spread of COVID-19. We also include social and demographic variables such as per capita gross domestic product (GDP) and population density. Adapting the theory from the field of epidemiology, we develop a framework to build analytical models to predict the spread of COVID-19. In the proposed framework, we employ machine learning methods including linear regression, linear kernel support vector machine (SVM), radial kernel SVM, polynomial kernel SVM, and decision tree. Given the nonlinear nature of the problem, the radial kernel SVM performs the best and explains 95% more variation than the existing methods. In line with the literature, our study indicates the population density is the critical factor to determine the spread. The univariate analysis shows that a higher temperature, air pollution, and population density can increase the spread. On the other hand, a higher per capita GDP can decrease the spread.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 89-109
Issue: 2
Volume: 14
Year: 2022
Keywords: COVID-19; disease spread; social and demographic factors; machine learning; epidemiology; predictive modelling.
File-URL: http://www.inderscience.com/link.php?id=123360
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:14:y:2022:i:2:p:89-109