Template-Type: ReDIF-Article 1.0
Author-Name: Renata Dantas
Author-X-Name-First: Renata
Author-X-Name-Last: Dantas
Author-Name: Jamilson Dantas
Author-X-Name-First: Jamilson
Author-X-Name-Last: Dantas
Author-Name: Gabriel Alves
Author-X-Name-First: Gabriel
Author-X-Name-Last: Alves
Author-Name: Paulo Maciel
Author-X-Name-First: Paulo
Author-X-Name-Last: Maciel
Title: Analysis of a performability model for the BRT system
Abstract:
Large cities have increasing mobility problems due to the large number of vehicles on the streets, which results in traffic jams and the eventual a waste of time and resources. An alternative to improve traffic is to prioritise the public transportation system. Several metropolises around the world are adopting bus rapid transit (BRT) systems since they present compelling results considering the cost-benefit perspective. The evaluating metrics such as performance, reliability, and performability aids in the planning, monitoring, and optimising of the BRT systems. This paper presents hierarchical models, using CTMC modelling techniques, to assess metrics such as performance and performability. The results show that these models pointed to the peak intervals that are more likely to arrive at the destination in a shorter time, in addition to showing the probability of the vehicle being affected by the failure at each interval. It was also possible to establish bases for the replication of the model in different scenarios to enable new comparative studies.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 64-86
Issue: 1
Volume: 11
Year: 2019
Keywords: bus rapid transit; BRT; CTMC; performability analysis.
File-URL: http://www.inderscience.com/link.php?id=96530
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:1:p:64-86

Template-Type: ReDIF-Article 1.0
Author-Name: Aylin Caliskan
Author-X-Name-First: Aylin
Author-X-Name-Last: Caliskan
Author-Name: Burcu Karaöz
Author-X-Name-First: Burcu
Author-X-Name-Last: Karaöz
Title: Can market indicators forecast the port throughput?
Abstract:
The main aim of this study is to forecast the likelihood of increasing or decreasing port throughput from month to month with determined market indicators as input variables. Additionally, the other aim is to determine whether artificial neural network (ANN) and support vector machines (SVM) algorithms are capable of accurately predicting the movement of port throughput. To this aim, Turkish ports were chosen as research environment. The monthly average exchange rates of US dollar, euro, and gold (compared to Turkish lira), and crude oil prices were used as market indicators in the prediction models. The experimental results reveal that, the model with specific market indicators, successfully forecasts the direction of movement on port throughput with accuracy rate of 90.9% in ANN and accuracy rate of 84.6% in SVM. The model developed in the research may help managers to develop short-term logistics plans in operational processes and may help researchers in terms of adapting the model to other research areas.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 45-63
Issue: 1
Volume: 11
Year: 2019
Keywords: port throughput; predicting; forecasting in shipping; artificial neural network; ANN; support vector machine; SVM.
File-URL: http://www.inderscience.com/link.php?id=96532
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:1:p:45-63

Template-Type: ReDIF-Article 1.0
Author-Name: Las Johansen Balios Caluza
Author-X-Name-First: Las Johansen Balios
Author-X-Name-Last: Caluza
Title: Deciphering published articles on cyberterrorism: a latent Dirichlet allocation algorithm application
Abstract:
An emerging issue called cyberterrorism is a fatal problem causing a disturbance in the cyberspace. To unravel underlying issues about cyberterrorism, it is imperative to look into available documents found in the NATO's repository. Extraction of articles using web-mining technique and performed topic modelling on NLP. Moreover, this study employed &lt;i&gt;latent Dirichlet allocation algorithm&lt;/i&gt;, an unsupervised machine learning to generate latent themes from the text corpus. An identified five underlying themes revealed based on the result. Finally, a profound understanding of cyberterrorism as a pragmatic menace of the cyberspace through a worldwide spread of black propaganda, recruitment, computer and network hacking, economic sabotage and others revealed. As a result, countries around the world, including NATO and its allies, had continuously improved its capabilities against cyberterrorism.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 87-101
Issue: 1
Volume: 11
Year: 2019
Keywords: topic modelling; latent Dirichlet allocation; LDA; cyberterrorism; unsupervised machine learning; natural language processing; NLP; sequential exploratory design; Gibbs sampling; cyberspace; web mining.
File-URL: http://www.inderscience.com/link.php?id=96539
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:1:p:87-101

Template-Type: ReDIF-Article 1.0
Author-Name: Hima Suresh
Author-X-Name-First: Hima
Author-X-Name-Last: Suresh
Author-Name: Gladston Raj. S
Author-X-Name-First: Gladston Raj.
Author-X-Name-Last: S
Title: An innovative and efficient method for Twitter sentiment analysis
Abstract:
The research in sentiment analysis is one of the most accomplished fields in data mining area. Specifically, sentiment analysis centres on analysing attitudes and opinions relating a particular topic of interest using machine learning approaches, lexicon-based approaches or hybrid approaches. Users are purposive to develop an automated system that could identify and classify sentiments in the related text. An efficient approach for predicting sentiments would allow us to bring out opinions from the web contents and to predict online public choices, which could prove valuable for ameliorating changes in the sentiment of Twitter users. This paper presents a proposed model to analyse the brand impact using the real data gathered from the micro blog, Twitter collected over a period of 14 months and also discusses the review covering the existing methods and approaches in sentiment analysis. Twitter-based information gathering techniques enable collecting direct responses from the target audience; it provides valuable understanding into public sentiments in the prediction of an opinion of a particular product. The experimental result shows that the proposed method for Twitter sentiment analysis is the best, with an unrivalled accuracy of 86.8%.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 1-18
Issue: 1
Volume: 11
Year: 2019
Keywords: sentiment analysis; machine learning approach; lexicon-based approach; supervised learning.
File-URL: http://www.inderscience.com/link.php?id=96543
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:1:p:1-18

Template-Type: ReDIF-Article 1.0
Author-Name: Hamed Sabahno
Author-X-Name-First: Hamed
Author-X-Name-Last: Sabahno
Author-Name: Seyed Meysam Mousavi
Author-X-Name-First: Seyed Meysam
Author-X-Name-Last: Mousavi
Author-Name: Amirhossein Amiri
Author-X-Name-First: Amirhossein
Author-X-Name-Last: Amiri
Title: A new development of an adaptive <i><span style="text-decoration: overline">X</span></i> &amp;minus; <i>R</i> control chart under a fuzzy environment
Abstract:
It is proved that adaptive control charts have better performance than classical control charts due to adaptability of some or all of their parameters to the previous process information. Fuzzy classical control charts have been occasionally considered by many researchers in the last two decades; however, fuzzy adaptive control charts have not been investigated. In this paper, we introduce a new adaptive &lt;i&gt;&lt;span style="text-decoration: overline"&gt;X&lt;/span&gt;&lt;/i&gt; &amp;minus; &lt;i&gt;R&lt;/i&gt; fuzzy control chart that allows all of the charts' parameters to adapt based on the process state in the previous sample. Also, the warning limits are redefined in the fuzzy environments. We utilise fuzzy mode defuzzification technique to design the decision procedure in the proposed fuzzy adaptive control chart. Finally, an illustrative example is used to present the application of the proposed control chart.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 19-44
Issue: 1
Volume: 11
Year: 2019
Keywords: <i><span style="text-decoration: overline">X</span></i> &minus; <i>R</i> control charts; adaptive control charts; fuzzy uncertainty; trapezoidal fuzzy numbers; TrFNs.
File-URL: http://www.inderscience.com/link.php?id=96547
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:1:p:19-44

Template-Type: ReDIF-Article 1.0
Author-Name: Razieh Davashi
Author-X-Name-First: Razieh
Author-X-Name-Last: Davashi
Author-Name: Mohammad-Hossein Nadimi-Shahraki
Author-X-Name-First: Mohammad-Hossein
Author-X-Name-Last: Nadimi-Shahraki
Title: EFP-tree: an efficient FP-tree for incremental mining of frequent patterns
Abstract:
Frequent pattern mining from dynamic databases where there are many incremental updates is a significant research issue in data mining. After incremental updates, the validity of the frequent patterns is changed. A simple way to handle this state is rerunning mining algorithms from scratch which is very costly. To solve this problem, researchers have introduced incremental mining approach. In this article, an efficient FP-tree named EFP-tree is proposed for incremental mining of frequent patterns. For original database, it is constructed like FP-tree by using an auxiliary list without any reconstruction. Consistently, for incremental updates, EFP-tree is reconstructed once and therefore reduces the number of tree reconstructions, reconstructed branches and the search space. The experimental results show that using EFP-tree can reduce reconstructed branches and the runtime in both static and incremental mining and enhance the scalability compared to well-known tree structures CanTree, CP-tree, SPO-tree and GM-tree in both dense and sparse datasets.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 144-166
Issue: 2
Volume: 11
Year: 2019
Keywords: data mining; dynamic databases; frequent pattern; incremental mining; FP-tree.
File-URL: http://www.inderscience.com/link.php?id=98958
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:2:p:144-166

Template-Type: ReDIF-Article 1.0
Author-Name: T. Subetha
Author-X-Name-First: T.
Author-X-Name-Last: Subetha
Author-Name: S. Chitrakala
Author-X-Name-First: S.
Author-X-Name-Last: Chitrakala
Title: Human activity recognition based on interaction modelling
Abstract:
Human activity recognition aims at recognising and interpreting the activities of humans automatically from videos. Among the activities of humans, identifying the interactions between human within minimal computation time and reduced misclassification rate is a cumbersome task. Hence, an interaction-based human activity recognition system is proposed in this paper that utilises silhouette features to identify and classify the interactions between humans. The main issues that affect the performance of activity recognition are sudden illumination changes, detection of static human, data discrimination, data variance, crowding problem, and computational complexity. To accomplish the preceding issues, three new algorithms named weight-based updating Gaussian mixture model (wu-GMM), spatial dissemination-based contour silhouettes (SDCS), and weighted constrained dynamic time warping (WCDTW) are proposed. Experiments are conducted with the gaming dataset and Kinect interaction dataset to show that the proposed system recognises the interactions with reduced misclassification rate and minimal processing time compared to the existing system.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 167-188
Issue: 2
Volume: 11
Year: 2019
Keywords: human activity recognition; Gaussian mixture model; contour silhouettes; weight-based updating Gaussian mixture model; spatial dissemination-based contour silhouettes; weighted constrained dynamic time warping; dynamic time warping; stochastic neighbour embedding; t-stochastic neighbour embedding; reduced variance-t stochastic neighbour embedding.
File-URL: http://www.inderscience.com/link.php?id=98967
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:2:p:167-188

Template-Type: ReDIF-Article 1.0
Author-Name: Mohammad Daoud
Author-X-Name-First: Mohammad
Author-X-Name-Last: Daoud
Title: Using implicitly and explicitly rated online customer reviews to build opinionated Arabic lexicons
Abstract:
Creating an opinionated lexicon is an important step towards a reliable social media analysis system. In this article we are proposing an approach and describing an experiment to build an Arabic polarised lexical database from analysing online implicitly and explicitly rated customer reviews. These reviews are written in modern standard Arabic and Palestinian/Jordanian dialect. Therefore, the produced lexicon contains casual slangs and dialectic entries used by the online community, which is useful for sentiment analysis of informal social media micro-blogs. We have extracted 28,000 entries from processing 15,100 reviews and by expanding the initial lexicon through Google translate. We calculated an implicit rating for every review driven by its text to address the problem of ambiguous opinions of certain online posts, where the text of the review does not match the given rating (the explicit rating). Each entry was given a polarity tag and a confidence score. High confidence scores have increased the precision of the polarisation process. Explicit rating has increased the coverage and confidence of polarity.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 189-203
Issue: 2
Volume: 11
Year: 2019
Keywords: polarised lexicon; social media analysis; opinion mining; term extraction.
File-URL: http://www.inderscience.com/link.php?id=98968
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:2:p:189-203

Template-Type: ReDIF-Article 1.0
Author-Name: Eftychios Protopapadakis
Author-X-Name-First: Eftychios
Author-X-Name-Last: Protopapadakis
Author-Name: Dimitrios Niklis
Author-X-Name-First: Dimitrios
Author-X-Name-Last: Niklis
Author-Name: Michalis Doumpos
Author-X-Name-First: Michalis
Author-X-Name-Last: Doumpos
Author-Name: Anastasios Doulamis
Author-X-Name-First: Anastasios
Author-X-Name-Last: Doulamis
Author-Name: Constantin Zopounidis
Author-X-Name-First: Constantin
Author-X-Name-Last: Zopounidis
Title: Sample selection algorithms for credit risk modelling through data mining techniques
Abstract:
Credit risk assessment is a very challenging and important problem in the domain of financial risk management. The development of reliable credit rating/scoring models is of paramount importance in this area. There are different algorithms and approaches for constructing such models to classify credit applicants (firms or individuals) into risk classes. Reliable sample selection is crucial for this task. The aim of this paper is to examine the effectiveness of sample selection schemes in combination with different classifiers for constructing reliable default prediction models. We consider different algorithms to select representative cases and handle class imbalances. Empirical results are reported for a dataset of Greek companies from the commercial sector.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 103-128
Issue: 2
Volume: 11
Year: 2019
Keywords: credit risk modelling; data mining; sampling; classification.
File-URL: http://www.inderscience.com/link.php?id=98969
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:2:p:103-128

Template-Type: ReDIF-Article 1.0
Author-Name: Carlos Roberto Silveira Junior
Author-X-Name-First: Carlos Roberto Silveira
Author-X-Name-Last: Junior
Author-Name: Marilde Terezinha Prado Santos
Author-X-Name-First: Marilde Terezinha Prado
Author-X-Name-Last: Santos
Author-Name: Marcela Xavier Ribeiro
Author-X-Name-First: Marcela Xavier
Author-X-Name-Last: Ribeiro
Title: A flexible architecture for the pre-processing of solar satellite image time series data &#45; the SETL architecture
Abstract:
Satellite image time series (SITS) is a challenging domain for knowledge discovery database due to their characteristics: each image has several sunspots and each sunspot is associated with sensor data composed of the radiation level and the sunspot classifications. Each image has time parameters and sunspots' coordinates, spatiotemporal data. Several challenges of SITS domain are faced during the extract, transform, and load (ETL) process. In this paper, we proposed an architecture called SITS's extract, transform, and load (SETL) that extracts the visual characteristics of each sunspot and associates it with sunspot's sensor data considering the spatiotemporal relations. SETL brings flexibility and extensibility to working with challenging domains such as SITS because it integrates textual, visual and spatiotemporal characteristics at sunspot-record level. Furthermore, we obtained acceptable performance results according to a domain expert and increased the possibility of using different data mining algorithms comparing to the art state.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 129-143
Issue: 2
Volume: 11
Year: 2019
Keywords: satellite image time series; SITS; spatiotemporal ETL process; solar STIS process.
File-URL: http://www.inderscience.com/link.php?id=98970
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:2:p:129-143

Template-Type: ReDIF-Article 1.0
Author-Name: Bartosz Zieliński
Author-X-Name-First: Bartosz
Author-X-Name-Last: Zieliński
Author-Name: Paweł Maślanka
Author-X-Name-First: Paweł
Author-X-Name-Last: Maślanka
Author-Name: Ścibor Sobieski
Author-X-Name-First: Ścibor
Author-X-Name-Last: Sobieski
Title: Allegories for database modelling
Abstract:
Allegories abstract and generalise (in the categorical framework) the algebra of binary relations. Arrows in an allegory enjoy a lot of properties and structure available for plain binary relations. At the same time, allegories are sufficiently general to allow the description within the same uniform framework of the lattice valued (e.g., fuzzy) relations and some more general structures. The paper presents a conceptual data modelling formalism which uses the language of allegories. We will provide examples demonstrating expressiveness of this formalism. While most of the examples are meant to be interpreted in the allegory of sets and binary relations, we also show the usefulness of using other allegories, such as the allegory of sets and lattice valued relations, with which one can model replicated data or data stored in a valid time temporal database.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 209-234
Issue: 3
Volume: 11
Year: 2019
Keywords: categories; allegories; data modelling; conceptual modelling; fuzzy databases; relational model; relations; relation algebra; relational products; locales.
File-URL: http://www.inderscience.com/link.php?id=100384
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:3:p:209-234

Template-Type: ReDIF-Article 1.0
Author-Name: Prudhvi Janga
Author-X-Name-First: Prudhvi
Author-X-Name-Last: Janga
Author-Name: Karen C. Davis
Author-X-Name-First: Karen C.
Author-X-Name-Last: Davis
Title: A grammar-based approach for XML schema extraction and heterogeneous document integration
Abstract:
The availability of vast amounts of heterogeneous XML web data motivates finding efficient methods to search, integrate, query, and present this data. The structure of XML documents is useful for achieving these tasks; however, not every XML document on the web includes a schema. We discuss challenges and solutions in the area of generation and integration of XML schemas. We propose and implement a framework for efficient schema extraction and integration from heterogeneous XML document collections collected from the web. Our approach introduces the schema extended context-free grammar (SECFG) to model XML schemas, including detection of attributes, data types, and element occurrences. Unlike other implementations, our approach supports the generation of XML schemas in any XML schema language, e.g., DTD or XSD. We compare our approach with other proposed approaches and conclude that we offer the same or better functionality more efficiently and with greater flexibility. The approach we propose is flexible enough to facilitate integration of and translation to tabular (relational) data.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 235-258
Issue: 3
Volume: 11
Year: 2019
Keywords: XML schema; schema integration; schema extraction; schema discovery.
File-URL: http://www.inderscience.com/link.php?id=100385
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:3:p:235-258

Template-Type: ReDIF-Article 1.0
Author-Name: Kobamelo Moremedi
Author-X-Name-First: Kobamelo
Author-X-Name-Last: Moremedi
Author-Name: John Andrew Van Der Poll
Author-X-Name-First: John Andrew Van Der
Author-X-Name-Last: Poll
Title: Towards a comparative evaluation of text-based specification formalisms and diagrammatic notations
Abstract:
Specification plays a pivotal role in software engineering to facilitate the development of highly dependable software. Various techniques for specification work have been developed to provide for precise and unambiguous specifications. Z is a formal specification language that is based on a strongly-typed fragment of Zermelo-Fraenkel set theory and first-order logic to provide for provably correct specifications. While diagrammatic specification languages may lack precision, they may, owing to their visual characteristics be a lucrative option for advocates of semi-formal specification techniques. In this research, we investigate the extent to which diagrammatic notations may capture the essence of, e.g., a Z specification. Several diagrammatic notations are considered and combined for this purpose. A case study is employed towards the end to evaluate the utility of the diagrammatic notation developed in this article. Comparisons on the merits of a diagrammatic notation are presented to further determine their feasibility.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 259-283
Issue: 3
Volume: 11
Year: 2019
Keywords: case study; diagrammatic notation; formal specification; Euler diagrams; set theory; spider diagrams; Venn diagrams; pierce diagrams; Z.
File-URL: http://www.inderscience.com/link.php?id=100386
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:3:p:259-283

Template-Type: ReDIF-Article 1.0
Author-Name: Alfredo Cuzzocrea
Author-X-Name-First: Alfredo
Author-X-Name-Last: Cuzzocrea
Author-Name: Giorgio Mario Grasso
Author-X-Name-First: Giorgio Mario
Author-X-Name-Last: Grasso
Author-Name: Massimiliano Nolich
Author-X-Name-First: Massimiliano
Author-X-Name-Last: Nolich
Title: Effective and efficient distributed management of big clinical data: a framework
Abstract:
Managing big data in distributed environments is a critical research challenge that has driven the attention from the community. In this context, there are several issues to be faced-off, including: 1) dealing with massive and heterogeneous data; 2) inconsistency problems; 3) query optimisation bottlenecks, and so forth. Clinical data represent a vibrant case of big data, due to both practical as well as methodological challenges exposed by such data. Following these considerations, in this paper we present an architecture for the storage, exchange and use of health data for administrative and epidemiological purposes, which focuses on the patient, who in a safe and easy way can make use of their data for therapeutic and research purposes. The proposed architecture would bring benefits both to patients, giving them the desired centrality in the care process, and to health administration, which could exploit the same infrastructure for better addressing health policies.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 284-313
Issue: 3
Volume: 11
Year: 2019
Keywords: big data; healthcare management; distributed big data management.
File-URL: http://www.inderscience.com/link.php?id=100387
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:3:p:284-313

Template-Type: ReDIF-Article 1.0
Author-Name: Nilay Khare
Author-X-Name-First: Nilay
Author-X-Name-Last: Khare
Author-Name: Hema Dubey
Author-X-Name-First: Hema
Author-X-Name-Last: Dubey
Title: Fast parallel PageRank technique for detecting spam web pages
Abstract:
Brin and Larry proposed PageRank in 1998, which appears as a prevailing link analysis technique used by web search engines to rank its search results list. Computation of PageRank values in an efficient and faster manner for very immense web graph is truly an essential concern for search engines today. To identify the spam web pages and also deal with them is yet another important concern in web browsing. In this research article, an efficient and faster parallel PageRank algorithm is proposed, which harnesses the power of graphics processing units (GPUs). In proposed algorithm, the PageRank scores are non-uniformly distributes among the web pages, so it is also competent of coping with spam web pages. The experiments are performed on standard datasets available in Stanford large network dataset collection. There is a speed up of about 1.1 to 1.7 for proposed parallel PageRank algorithm over existing parallel PageRank algorithm.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 350-365
Issue: 4
Volume: 11
Year: 2019
Keywords: graphics processing unit; GPU; compute unified device architecture; CUDA; parallel PageRank technique; spam web pages.
File-URL: http://www.inderscience.com/link.php?id=102720
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:4:p:350-365

Template-Type: ReDIF-Article 1.0
Author-Name: Alireza Hekmatinia
Author-X-Name-First: Alireza
Author-X-Name-Last: Hekmatinia
Author-Name: Ali Mohammadi Shanghooshabad
Author-X-Name-First: Ali Mohammadi
Author-X-Name-Last: Shanghooshabad
Author-Name: Mohammad Mahdi Motevali
Author-X-Name-First: Mohammad Mahdi
Author-X-Name-Last: Motevali
Author-Name: Mehrdad Almasi
Author-X-Name-First: Mehrdad
Author-X-Name-Last: Almasi
Title: Tuning parameters via a new rapid, accurate and parameter-less method using meta-learning
Abstract:
Dealing with a large parameter space in data mining tasks is extremely time-consuming, and the tuning method itself needs to be tuned since methods themselves have at least one parameter. Here, a new rapid and parameter-less method is presented to tune algorithms on diverse datasets to achieve high quality results in a short consumed time. The method presented here uses a pre-knowledge by using meta-features to guess closer point to optimal point in parameter space of target algorithms (here, support vector machine algorithm is used). For preparing the pre-knowledge, 282 meta-features are introduced and then genetic algorithm is applied to determine best meta-features for the target algorithm. Then the best meta-features are used to tune the target algorithm on unseen datasets. The results show in less than 0.19 minute in average, the method obtains approximately the same classification rates in comparison with others, but the consumed time is dramatically declined.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 366-390
Issue: 4
Volume: 11
Year: 2019
Keywords: parameter tuning; meta learning; meta feature; SVM tuning; genetic algorithm.
File-URL: http://www.inderscience.com/link.php?id=102727
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:4:p:366-390

Template-Type: ReDIF-Article 1.0
Author-Name: Monalisha Ghosh
Author-X-Name-First: Monalisha
Author-X-Name-Last: Ghosh
Author-Name: Goutam Sanyal
Author-X-Name-First: Goutam
Author-X-Name-Last: Sanyal
Title: Analysing sentiments based on multi feature combination with supervised learning
Abstract:
Researches on sentiment analysis are growing to a great extent and attracting wide ranges of attention from academics and industries as well. Feature generation and selection are consequent for text mining as the high dimensional feature set can affect the performance of sentiment analysis. This paper exhibits the efficacy of the proposed combined feature selection technique on machine learning classification algorithms over their individual usefulness. Initially, we transform the review datasets into the feature vector of unigram features along with bi-tagged features based on POS pattern. Next, information gain (IG), Chi squared (&#967;&lt;SUP align="right"&gt;&lt;SMALL&gt;2&lt;/SMALL&gt;&lt;/SUP&gt;) and minimum redundancy maximum relevancy (mRMR) feature selection methods are applied to obtain an optimal feature subset for further functionality. These features are then given input to multiple machine learning classifiers, namely, support vector machine (SVM), multinomial Na&#239;ve Bayes (MNB), Bernoulli Na&#239;ve Bayes (BNB) and logistic regression (LR) on multi domain product review datasets. The performance of the algorithm is measured by evaluation methods such as precision, recall, and F-measure. Experimental results show that the feature selection method mRMR with SVM achieved a better accuracy of 91.39, which is encouraging and comparable to the related research.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 391-416
Issue: 4
Volume: 11
Year: 2019
Keywords: sentiment analysis; opinion mining; text classification; feature selection method; machine learning algorithms optimal feature vector.
File-URL: http://www.inderscience.com/link.php?id=102728
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:4:p:391-416

Template-Type: ReDIF-Article 1.0
Author-Name: Francesco Cauteruccio
Author-X-Name-First: Francesco
Author-X-Name-Last: Cauteruccio
Author-Name: Paolo Lo Giudice
Author-X-Name-First: Paolo Lo
Author-X-Name-Last: Giudice
Author-Name: Giorgio Terracina
Author-X-Name-First: Giorgio
Author-X-Name-Last: Terracina
Author-Name: Domenico Ursino
Author-X-Name-First: Domenico
Author-X-Name-Last: Ursino
Author-Name: Nadia Mammone
Author-X-Name-First: Nadia
Author-X-Name-Last: Mammone
Author-Name: Francesco Carlo Morabito
Author-X-Name-First: Francesco Carlo
Author-X-Name-Last: Morabito
Title: A new network-based approach to investigating neurological disorders
Abstract:
In this paper, we present a new network-based approach to help experts investigate neurological disorders in which the connections among brain areas play a key role. Our approach receives the EEG of a patient and associates a network with it, with nodes that represent electrodes and with edges that denote the disconnection degree of the corresponding brain areas, measured by means of a new string-based metric. Then, it performs some suitable projections on this network, depending on the neurological disorder to investigate. After this, it computes the values of a new coefficient, called connection coefficient, on them. These values can be employed to help neurologists in their analyses. We show how our approach can be employed for three different disorders, namely Creutzfeldt-Jacob disease, childhood absence epilepsy and Alzheimer's disease.
Journal: Int. J. of Data Mining, Modelling and Management
Pages: 315-349
Issue: 4
Volume: 11
Year: 2019
Keywords: network analysis; connection coefficient; consensus multi-parameterised edit distance; <i>c</i>MPED; electroencephalogram; neurological disorders.
File-URL: http://www.inderscience.com/link.php?id=102730
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:4:p:315-349