Template-Type: ReDIF-Article 1.0
Author-Name: Sachin Deshmukh
Author-X-Name-First: Sachin
Author-X-Name-Last: Deshmukh
Author-Name: Seema Sant
Author-X-Name-First: Seema
Author-X-Name-Last: Sant
Author-Name: Neerja Kashive
Author-X-Name-First: Neerja
Author-X-Name-Last: Kashive
Title: Modelling attrition to know why your employees leave or stay
Abstract:
Today's environmental factors influence every aspect of business, be it marketing, finance, operations or human resource. Talent shortage has become a global issue for organisations. One of the major challenges faced by any organisation is the increase in the level of employee attrition. The current study has tried to build a predictive model by using logistic regression and understand the specific factors that lead to attrition. This paper also attempts to compare factors responsible for attrition in two time periods, first period from 1996 to 2008 (Holtom's model) and second period from 2009 to 2016 to find whether any changes have taken place in employees' expectations, which, if not fulfilled, may lead to attrition. An analysis of an IT organisation's data reveal that factors responsible for attrition in the second period have changed, compared to the first period.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 231-253
Issue: 3
Volume: 13
Year: 2021
Keywords: attrition; predictive model; logistic regression.
File-URL: http://www.inderscience.com/link.php?id=118018
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:3:p:231-253

Template-Type: ReDIF-Article 1.0
Author-Name: Chia-Hao Chiu
Author-X-Name-First: Chia-Hao
Author-X-Name-Last: Chiu
Author-Name: Yun-Cheng Tsai
Author-X-Name-First: Yun-Cheng
Author-X-Name-Last: Tsai
Author-Name: Ho-Lin Chen
Author-X-Name-First: Ho-Lin
Author-X-Name-Last: Chen
Title: Long text to image converter for financial reports
Abstract:
In this study, we proposed a novel article analysis method. This method converts the article classification problem into an image classification problem by projecting texts into images and then applying CNN models for classification. We called the method the long text to image converter (LTIC). The features are extracted automatically from the generated images, hence there is no need of any explicit step of embedding the words or characters into numeric vector representations. This method saves the time to experiment pre-process. This study uses the financial domain as an example. In companies' financial reports, there will be a chapter that describes the company's financial trends. The content has many financial terms used to infer the company's current and future's financial position. The LTIC achieved excellent convolution matrix and test data accuracy. The results indicated an 80% accuracy rate. The proposed LTIC produced excellent results during practical application. The LTIC achieved excellent performance in classifying corporate financial reports under review. The return on simulated investment is 46%. In addition to tangible returns, the LTIC method reduced the time required for article analysis and is able to provide article classification references in a short period to facilitate the decisions of the researcher.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 211-230
Issue: 3
Volume: 13
Year: 2021
Keywords: article analysis; convolutional neural network; CNN; financial analysis; long text to image converter; LTIC.
File-URL: http://www.inderscience.com/link.php?id=118019
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:3:p:211-230

Template-Type: ReDIF-Article 1.0
Author-Name: Maira Alejandra Pulgarín Rodríguez
Author-X-Name-First: Maira Alejandra Pulgarín
Author-X-Name-Last: Rodríguez
Author-Name: Bárbara Maricely Fierro Chong
Author-X-Name-First: Bárbara Maricely Fierro
Author-X-Name-Last: Chong
Author-Name: Erica María Ossa Taborda
Author-X-Name-First: Erica María Ossa
Author-X-Name-Last: Taborda
Title: E-learning process through text mining for academic literacy
Abstract:
The aim of this paper is to present the results of a research carried out in a virtual faculty of education in a private university in Colombia. It consists of the characterisation of students' abilities for reading and writing comprehension for academic literacy. This study verifies the effectiveness of an e-learning platform implementation for all the programs incorporated in such faculty. The university established a methodological procedure for text mining in order to identify specific keywords in different text typologies for specialised areas. This platform allows professors and students to develop an expertise in disciplines using text mining as an interdisciplinary strategy to build the knowledge and improve the quality in their professional context.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 283-298
Issue: 3
Volume: 13
Year: 2021
Keywords: text mining; terminological work; cognitive processes; e- learning; academic literacy; reading comprehension; academic writing.
File-URL: http://www.inderscience.com/link.php?id=118020
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:3:p:283-298

Template-Type: ReDIF-Article 1.0
Author-Name: Muning Chang
Author-X-Name-First: Muning
Author-X-Name-Last: Chang
Title: Association rules in mobile game operation
Abstract:
Mobile games are now playing a significant role in the gaming industry as the internet continues to develop. Due to the economic and cultural value of mobile games, it is very importance for the gaming companies to maintain and further improve the product quality to remain competitive in the industry. The operation team plays the key points to maintain product profitability after issuing the games. This paper will analyse the gaming data collected during operation and propose operation strategies accordingly. A correlation coefficient algorithm suitable for time sequences is proposed, the association is defined by the similarity between data. The level of association between two-time sequences is reflected in the probability of the occurrence of such association. Based on the discovery, we can analyse the next popular mobile game in depth to explore the correlation between the number of users online, the number of new players, and the retention rate. The study found that there are two fatigue periods, at approximately day 30 and 120 when there is a high likelihood for user loss, which is important to consider in the strategic planning for the game operation.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 254-267
Issue: 3
Volume: 13
Year: 2021
Keywords: mobile games; association rules; sequence correlation; operation optimisation.
File-URL: http://www.inderscience.com/link.php?id=118023
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:3:p:254-267

Template-Type: ReDIF-Article 1.0
Author-Name: Paravee Maneejuk
Author-X-Name-First: Paravee
Author-X-Name-Last: Maneejuk
Author-Name: Chalerm Jaitang
Author-X-Name-First: Chalerm
Author-X-Name-Last: Jaitang
Author-Name: Woraphon Yamaka
Author-X-Name-First: Woraphon
Author-X-Name-Last: Yamaka
Title: A multivariate copula-based SUR probit model: application to insolvency probability of enterprises
Abstract:
The purpose of this study is to introduce a more flexible joint distribution for a probit model with more than two equations, or a so-called SUR probit model. The main idea of the suggested method is to use a multivariate copula to link the errors of equations in the SUR probit model. We conduct a simulation study to assess the performance of the model and then apply the model to a real economic problem that is the insolvency probability of small and medium enterprises in Thailand. This study considers three economic sectors and speculates some dependencies among them. The results obtained from the copula-based SUR probit model can show a better performance in both simulation and application studies. In addition, it is found to be suitable for explaining the causal effect of the companies' financial statements on their insolvency probability and challenged results for the Thai enterprises are brought out.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 268-282
Issue: 3
Volume: 13
Year: 2021
Keywords: multivariate copula; multivariate probit model; small and medium enterprises; financial statements; insolvency probability.
File-URL: http://www.inderscience.com/link.php?id=118025
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:3:p:268-282

Template-Type: ReDIF-Article 1.0
Author-Name: Woraphon Yamaka
Author-X-Name-First: Woraphon
Author-X-Name-Last: Yamaka
Author-Name: Pichayakone Rakpho
Author-X-Name-First: Pichayakone
Author-X-Name-Last: Rakpho
Author-Name: Paravee Maneejuk
Author-X-Name-First: Paravee
Author-X-Name-Last: Maneejuk
Title: Hedging agriculture commodities futures with histogram data: a Markov switching volatility and correlation model
Abstract:
In this study, the bivariate flexible Markov switching dynamic copula GARCH model is developed to histogram-value data for calculating optimal portfolio weight and optimal hedge. This model is an extension of the Markov switching dynamic copula GARCH in which all estimated parameters are allowed to be a regime dependent. The histogram data is constructed from the five-minute wheat spot and futures returns. We compare our proposed model with other bivariate GARCH models through AIC, BIC, and hedge effectiveness. The empirical results show that our model is slightly better than the conventional methods in terms of the lowest AIC and BIC, and the highest hedge effectiveness. This indicates that our proposed model is quite effective in reducing risks in portfolio returns.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 299-315
Issue: 3
Volume: 13
Year: 2021
Keywords: hedging strategy; Markov switching; time-varying dependence; histogram data; wheat.
File-URL: http://www.inderscience.com/link.php?id=118026
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:3:p:299-315

Template-Type: ReDIF-Article 1.0
Author-Name: Lamiche Chaabane
Author-X-Name-First: Lamiche
Author-X-Name-Last: Chaabane
Title: An enhanced cooperative method to solve multiple-sequence alignment problem
Abstract:
In this research study, we aim to propose a novel cooperative approach called dynamic simulated particle swarm optimisation (DSPSO) which is based on metaheuristics and the pairwise dynamic programming (DP) procedure to find an approximate solution for the multiple-sequence alignment (MSA) problem. The developed approach applies the particle swarm optimisation (PSO) algorithm to discover the search space globally and the simulated annealing (SA) technique to improve the population leader quality in order to overcome local optimum problem. After that the dynamic programming technique is integrated as an improver mechanism in order to improve the worst solution quality and to increase the convergence speed of the proposed approach. Simulation results on BAliBASE benchmarks have shown the potent of the proposed method to produce good quality alignments comparing to those given by other literature existing methods.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 1-16
Issue: 1/2
Volume: 13
Year: 2021
Keywords: cooperative approach; multiple-sequence alignment; MSA; DSPSO; particle swarm optimisation; PSO; SA; DP; BAliBASE benchmarks.
File-URL: http://www.inderscience.com/link.php?id=112907
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:1/2:p:1-16

Template-Type: ReDIF-Article 1.0
Author-Name: Ismaïl Biskri
Author-X-Name-First: Ismaïl
Author-X-Name-Last: Biskri
Author-Name: Mohamed Hassani
Author-X-Name-First: Mohamed
Author-X-Name-Last: Hassani
Title: A formal theoretical framework for a flexible classification process
Abstract:
The classification process is a complex technique that connects language, text, information and knowledge theories with computational formalisation, statistical and symbolic approaches, standard and non-standard logics, etc. This process should always be under the control of the user according to his subjectivity, his knowledge and the purpose of his analysis. It becomes important to create platforms to support the design of classification tools, their management, and their adaptation to new needs and experiments. In the last years, several platforms for data digging including textual data where classification is the main functionality have emerged. However, they lack flexibility and formal foundations. We propose in this paper a formal model with strong logical foundations based on applicative type systems.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 17-36
Issue: 1/2
Volume: 13
Year: 2021
Keywords: classification; flexibility; applicative systems; operators/operands; combinatory logics; inferential calculus; compositionality; processing chains; modules; discovery process; collaborative intelligent science.
File-URL: http://www.inderscience.com/link.php?id=112908
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:1/2:p:17-36

Template-Type: ReDIF-Article 1.0
Author-Name: Abdelkrime Aries
Author-X-Name-First: Abdelkrime
Author-X-Name-Last: Aries
Author-Name: Djamel Eddine Zegour
Author-X-Name-First: Djamel Eddine
Author-X-Name-Last: Zegour
Author-Name: Walid Khaled Hidouci
Author-X-Name-First: Walid Khaled
Author-X-Name-Last: Hidouci
Title: Graph-based cumulative score using statistical features for multilingual automatic text summarisation
Abstract:
Multilingual summarisation began to receive more attention these late years. Many approaches can be used to achieving this, among them: statistical and graph-based approaches. Our idea is to combine these two approaches into a new extractive text summarisation method. Surface statistical features are used to calculate a primary score for each sentence. The graph is used to selecting some candidate sentences and calculating a final score for each sentence based on its primary score and those of its neighbours in the graph. We propose four variants to calculating the cumulative score of a sentence. Also, the order of sentences is an important aspect of summary readability. We propose some other algorithms to generating the summary not just based on final scores but on sentences connections in the graph. The method is tested using MultiLing'15 workshop's MSS corpus and ROUGE metric. It is evaluated against some well known methods and it gives promising results.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 37-64
Issue: 1/2
Volume: 13
Year: 2021
Keywords: automatic text summarisation; ATS; graph-based summarisation; statistical features; multilingual summarisation; extractive summarisation.
File-URL: http://www.inderscience.com/link.php?id=112909
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:1/2:p:37-64

Template-Type: ReDIF-Article 1.0
Author-Name: Tayeb Kenaza
Author-X-Name-First: Tayeb
Author-X-Name-Last: Kenaza
Title: An ontology-based modelling and reasoning for alerts correlation
Abstract:
SIEM is a modern and powerful security tool thanks to several functions that it provides to take benefit of collected data, such as normalisation and aggregation. The main important function is events correlation, when security operators can get a precise and quick picture about threats and attacks in real-time. The quality of that picture depends on the efficiency of the adopted reasoning approach to putting together pieces of information provided by several analysers. In this paper, we propose a semantic approach based on description logics (DLs) which is a powerful tool for knowledge representation and reasoning. Indeed, ontology provides a comprehensive environment to represent information for intrusion detection and allows easy maintaining of information or adding new ones. We implemented a rule-based engine for alert correlation based on the proposed ontology and two attack scenarios are carried out to show the usefulness of our approach.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 65-80
Issue: 1/2
Volume: 13
Year: 2021
Keywords: information security; intrusion detection; security information and event management system; SIEM; alert correlation; rules-based reasoning; ontology; ontology web language; OWL; Semantic Web Rule Language; SWRL.
File-URL: http://www.inderscience.com/link.php?id=112913
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:1/2:p:65-80

Template-Type: ReDIF-Article 1.0
Author-Name: Meriem Bahi
Author-X-Name-First: Meriem
Author-X-Name-Last: Bahi
Author-Name: Mohamed Batouche
Author-X-Name-First: Mohamed
Author-X-Name-Last: Batouche
Title: Convolutional neural network with stacked autoencoders for predicting drug-target interaction and binding affinity
Abstract:
The prediction of novel drug-target interactions (DTIs) is critically important for drug repositioning, as it can lead the researchers to find new indications for existing drugs and to reduce the cost and time of the de novo drug development process. In order to explore new ways for this innovation, we have proposed two novel methods named SCA-DTIs and SCA-DTA, respectively to predict both drug-target interactions and drug-target binding affinities (DTAs) based on convolutional neural network (CNN) with stacked autoencoders (SAE). Initialising a CNN's weights with filters of trained stacked autoencoders yields to superior performance. Moreover, for boosting the performance of the DTIs prediction, we propose a new method called RNDTIs to generate reliable negative samples. Tests on different benchmark datasets show that the proposed method can achieve an excellent prediction performance with an accuracy of more than 99%. These results demonstrate the strength of the proposed model potential for DTIs and DTA prediction, thereby improving the drug repurposing process.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 81-113
Issue: 1/2
Volume: 13
Year: 2021
Keywords: stacked autoencoders; SAE; convolutional neural network; CNN; semi-supervised learning; deep learning; drug repositioning; drug-target interaction; DTI; binding affinity.
File-URL: http://www.inderscience.com/link.php?id=112914
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:1/2:p:81-113

Template-Type: ReDIF-Article 1.0
Author-Name: Mostefa Zafer
Author-X-Name-First: Mostefa
Author-X-Name-Last: Zafer
Author-Name: Mustapha Reda Senouci
Author-X-Name-First: Mustapha Reda
Author-X-Name-Last: Senouci
Author-Name: Mohamed Aissani
Author-X-Name-First: Mohamed
Author-X-Name-Last: Aissani
Title: Efficient deployment approach of wireless sensor networks on 3D terrains
Abstract:
Ensuring the coverage of a region of interest (RoI) when deploying a wireless sensor network (WSN) is an objective that depends on several factors, such as the detection capability of the used sensor nodes and the topography of the RoI. To address the topography challenges, in this paper, we propose a new WSN deployment approach based on the idea of partitioning the RoI into sub-regions with relatively simple topography. Then allocating, to each constructed sub-region, the necessary number of sensor nodes and finding their appropriates positions to maximise the coverage quality. The performance evaluation of this approach coupled with three different deployment methods named deployment method based on simulated annealing (DMSA), greedy deployment method (GDM), and random deployment method (RDM), has revealed its relevance since it helped to significantly improve the coverage quality of the RoI.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 114-136
Issue: 1/2
Volume: 13
Year: 2021
Keywords: wireless sensor networks; WSNs; 3D terrains; deployment; coverage.
File-URL: http://www.inderscience.com/link.php?id=112915
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:1/2:p:114-136

Template-Type: ReDIF-Article 1.0
Author-Name: Lamia Berkani
Author-X-Name-First: Lamia
Author-X-Name-Last: Berkani
Title: Recommendation of items using a social-based collaborative filtering approach and classification techniques
Abstract:
With the large amount of data generated every day in social networks, the use of classification techniques becomes a necessity. The clustering-based approaches reduce the search space by clustering similar users or items together. We focus in this paper on the personalised item recommendation in social context. Our approach combines in different ways the social filtering algorithm and the traditional user-based collaborative filtering algorithm. The social information is formalised by some social-behaviour metrics such as friendship, commitment and trust degrees of users. Moreover, two classification techniques are used: an unsupervised technique applied initially to all users and a supervised technique applied to newly added users. Finally, the proposed approach has been experimented using different existing datasets. The obtained results show the contribution of integrating social information on the collaborative filtering and the added value of using the classification techniques on the different algorithms in terms of the recommendation accuracy.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 137-159
Issue: 1/2
Volume: 13
Year: 2021
Keywords: item recommendation; collaborative filtering; social filtering; supervised classification; unsupervised classification.
File-URL: http://www.inderscience.com/link.php?id=112919
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:1/2:p:137-159

Template-Type: ReDIF-Article 1.0
Author-Name: Karima Sid
Author-X-Name-First: Karima
Author-X-Name-Last: Sid
Author-Name: Mohamed Batouche
Author-X-Name-First: Mohamed
Author-X-Name-Last: Batouche
Title: Distributed heterogeneous ensemble learning on Apache Spark for ligand-based virtual screening
Abstract:
Virtual screening is one of the most common computer-aided drug design techniques that apply computational tools and methods on large libraries of molecules to extract the drugs. Ensemble learning is a recent paradigm launched to improve machine learning results in terms of predictive performance and robustness. It has been successfully applied in ligand-based virtual screening (LBVS) approaches. Applying ensemble learning on huge molecular libraries is computationally expensive. Hence, the distribution and parallelisation of the task have become a significant step by using sophisticated frameworks such as Apache Spark. In this paper, we propose a new approach HEnsL_DLBVS, for heterogeneous ensemble learning, distributed on Spark to improve the large-scale LBVS results. To handle the problem of imbalanced big training datasets, we propose a novel hybrid technique. We generate new training datasets to evaluate the approach. Experimental results confirm the effectiveness of our approach with satisfactory accuracy and its superiority over homogeneous models.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 160-191
Issue: 1/2
Volume: 13
Year: 2021
Keywords: virtual screening; big data; computer-aided drug design; CADD; Apache Spark; machine learning; drug discovery; ensemble learning; imbalanced datasets; Spark MLlib; ligand-based virtual screening; LBVS.
File-URL: http://www.inderscience.com/link.php?id=112920
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:1/2:p:160-191

Template-Type: ReDIF-Article 1.0
Author-Name: Noussaiba Benadjimi
Author-X-Name-First: Noussaiba
Author-X-Name-Last: Benadjimi
Author-Name: Khaled-Walid Hidouci
Author-X-Name-First: Khaled-Walid
Author-X-Name-Last: Hidouci
Title: Hash-processing of universal quantification-like queries dealing with requirements and prohibitions
Abstract:
This paper is focused on flexible universal quantification-like queries handling simultaneously positive and negative preferences (requirements or prohibitions). We emphasise the performance improvement of the considered operator by proposing new variants of the classical hash-division algorithm. The issue of answers ranking is also dealt with. We target in our work the in memory databases systems (also called main-memory database systems) with a very large volume of data. In these systems, all the data is primarily stored in the RAM of a computer. We have introduced a parallel implementation of the operator that takes into account the data skew issue. Our empirical analysis for both sequential and parallel versions shows the relevance of our approach. They demonstrate that the new processing of the mixed operator in a main-memory database achieves better performance compared to the conventional ones, and becomes faster through parallelism.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 192-210
Issue: 1/2
Volume: 13
Year: 2021
Keywords: universal quantification queries; relational division; relational anti-division; main-memory databases; flexible division; hash-division.
File-URL: http://www.inderscience.com/link.php?id=112921
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:1/2:p:192-210

Template-Type: ReDIF-Article 1.0
Author-Name: Seyyed Mohammad Mirtaghian Rudsari
Author-X-Name-First: Seyyed Mohammad Mirtaghian
Author-X-Name-Last: Rudsari
Author-Name: Naji Gharibi
Author-X-Name-First: Naji
Author-X-Name-Last: Gharibi
Title: Application of structural equation modelling in Iranian tourism researches: challenges and guidelines
Abstract:
The main purpose of this study is to identify and analyse the challenges in using structural equation modelling (SEM) in tourism research in Iran. The paper examines how Iranian scholars have used the technique, using a sample of 172 papers published in the top five tourism journals published in Farsi (i.e., Persian). The results indicate that often there is a lack of discussion as to sample size, issues of normality of distribution, effect analysis, the role of coefficients of determination, and additionally selective and arbitrary reporting of fit indices are not uncommon. The paper also emphasises the role of theory in constructing such models.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 364-387
Issue: 4
Volume: 13
Year: 2021
Keywords: structural equation modelling; SEM; covariance-based SEM; partial least squares; challenges and misuse; Iranian tourism research.
File-URL: http://www.inderscience.com/link.php?id=119627
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:4:p:364-387

Template-Type: ReDIF-Article 1.0
Author-Name: Konstantin Savenkov
Author-X-Name-First: Konstantin
Author-X-Name-Last: Savenkov
Author-Name: Vladimir Gorbachenko
Author-X-Name-First: Vladimir
Author-X-Name-Last: Gorbachenko
Author-Name: Anatoly Solomakha
Author-X-Name-First: Anatoly
Author-X-Name-Last: Solomakha
Title: New perspectives on deep neural networks in decision support in surgery
Abstract:
The paper considers the development of a neural network system for predicting complications after acute appendicitis operations. A neural network of deep architecture has been developed. As a learning set, a set developed by the authors based on real clinic data was used. To select significant features, a method for selecting features based on the interquartile range of the F1-score is proposed. For preliminary processing of training data, it is proposed to use an overcomplete autoencoder. Overcomplete autoencoder converts the selected features into a space of higher dimension, which, according to Cover's theorem facilitates the classification of features according to complication and not corresponding to complication. To overcome the overfitting of the network, the dropout method of neurons was used. The neural network is implemented using the Keras and TensorFlow libraries. Trained neural network showed high diagnostic metrics on test data set.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 317-336
Issue: 4
Volume: 13
Year: 2021
Keywords: neural networks; features selection; learning neural networks; overfitting; overcomplete autoencoder; medical diagnostics.
File-URL: http://www.inderscience.com/link.php?id=119628
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:4:p:317-336

Template-Type: ReDIF-Article 1.0
Author-Name: Satish M. Srinivasan
Author-X-Name-First: Satish M.
Author-X-Name-Last: Srinivasan
Author-Name: Ruchika Chari
Author-X-Name-First: Ruchika
Author-X-Name-Last: Chari
Author-Name: Abhishek Tripathi
Author-X-Name-First: Abhishek
Author-X-Name-Last: Tripathi
Title: Modelling and visualising emotions in Twitter feeds
Abstract:
Predictive analytics on Twitter feeds is becoming a popular field for research. A tweet holds wealth of information on how an individual express and communicates their feelings and emotions within their social network. Large-scale mining of tweets will not only help in capturing an individual's emotion but also the emotions of a larger group. In this study, an emotion-based classification scheme has been proposed. By training the na&#239;ve Bayes multinomial and the random forest classifiers on different training datasets, emotion classification was performed on the test dataset containing tweets related to the 2016 US presidential election. Upon classifying the tweets in the test dataset to one of the four basic emotion types: anger, happy, sadness and surprise, and by determining the sentiments of the people, we have tried to portray the flux in the emotional landscape of the people towards the presidential candidates in the 2016 US election.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 337-350
Issue: 4
Volume: 13
Year: 2021
Keywords: emotion classification; Twitter data analysis; US presidential election; supervised classifier; random forest; na&#239;ve Bayes multinomial; NBM.
File-URL: http://www.inderscience.com/link.php?id=119629
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:4:p:337-350

Template-Type: ReDIF-Article 1.0
Author-Name: Po-Jen Chuang
Author-X-Name-First: Po-Jen
Author-X-Name-Last: Chuang
Author-Name: Yun-Sheng Tu
Author-X-Name-First: Yun-Sheng
Author-X-Name-Last: Tu
Title: Pursuing efficient data stream mining by removing long patterns from summaries
Abstract:
Frequent pattern mining is a useful data mining technique. It can help in digging out frequently used patterns from the massive internet data streams for significant applications and analyses. To uplift the mining accuracy and reduce the needed processing time, this paper proposes a new approach that is able to remove less used long patterns from the pattern summary to preserve space for more frequently used short patterns, in order to enhance the performance of existing frequent pattern mining algorithms. Extensive simulation runs are carried out to check the performance of the proposed approach. The results show that our approach can strengthen the mining performance by effectively bringing down the required run time and substantially increasing the mining accuracy.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 388-409
Issue: 4
Volume: 13
Year: 2021
Keywords: data streams; frequent pattern mining; pattern summary; length skip; performance evaluation.
File-URL: http://www.inderscience.com/link.php?id=119630
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:4:p:388-409

Template-Type: ReDIF-Article 1.0
Author-Name: Nourelhouda Yahi
Author-X-Name-First: Nourelhouda
Author-X-Name-Last: Yahi
Author-Name: Hacene Belhadef
Author-X-Name-First: Hacene
Author-X-Name-Last: Belhadef
Author-Name: Mathieu Roche
Author-X-Name-First: Mathieu
Author-X-Name-Last: Roche
Title: Investigating the impact of preprocessing on document embedding: an empirical comparison
Abstract:
Digital representation of text documents is a crucial task in machine learning and natural language processing (NLP). It aims to transform unstructured text documents into mathematically-computable elements. In recent years, several methods have been proposed and implemented to encode text documents into fixed-length feature vectors. This operation is known as document embedding and it has become an interesting and open area of research. Paragraph vector (Doc2vec) is one of the most used document embedding methods. It has gained a good reputation thanks to its good results. To overcome its limits, Doc2vec, was extended by proposing the document through corruption (Doc2vecC) technique. To get a deep view of these two methods, this work presents a study on the impact of morphosyntactic text preprocessing on these two document embedding methods. We have done this analysis by applying the most-used text preprocessing techniques, such as cleaning, stemming and lemmatisation, and their different combinations. The experimental analysis on the Microsoft Research Paraphrase dataset (MSRP), reveals that the preprocessing techniques serve to improve the classifier accuracy; and that the stemming method outperforms the other techniques.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 351-363
Issue: 4
Volume: 13
Year: 2021
Keywords: natural language preprocessing; document embedding; paragraph vector; document through corruption; text preprocessing; semantic similarity.
File-URL: http://www.inderscience.com/link.php?id=119631
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:13:y:2021:i:4:p:351-363