Template-Type: ReDIF-Article 1.0 Author-Name: Renata Dantas Author-X-Name-First: Renata Author-X-Name-Last: Dantas Author-Name: Jamilson Dantas Author-X-Name-First: Jamilson Author-X-Name-Last: Dantas Author-Name: Gabriel Alves Author-X-Name-First: Gabriel Author-X-Name-Last: Alves Author-Name: Paulo Maciel Author-X-Name-First: Paulo Author-X-Name-Last: Maciel Title: Analysis of a performability model for the BRT system Abstract: Large cities have increasing mobility problems due to the large number of vehicles on the streets, which results in traffic jams and the eventual a waste of time and resources. An alternative to improve traffic is to prioritise the public transportation system. Several metropolises around the world are adopting bus rapid transit (BRT) systems since they present compelling results considering the cost-benefit perspective. The evaluating metrics such as performance, reliability, and performability aids in the planning, monitoring, and optimising of the BRT systems. This paper presents hierarchical models, using CTMC modelling techniques, to assess metrics such as performance and performability. The results show that these models pointed to the peak intervals that are more likely to arrive at the destination in a shorter time, in addition to showing the probability of the vehicle being affected by the failure at each interval. It was also possible to establish bases for the replication of the model in different scenarios to enable new comparative studies. Journal: Int. J. of Data Mining, Modelling and Management Pages: 64-86 Issue: 1 Volume: 11 Year: 2019 Keywords: bus rapid transit; BRT; CTMC; performability analysis. File-URL: http://www.inderscience.com/link.php?id=96530 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:1:p:64-86 Template-Type: ReDIF-Article 1.0 Author-Name: Aylin Caliskan Author-X-Name-First: Aylin Author-X-Name-Last: Caliskan Author-Name: Burcu Karaöz Author-X-Name-First: Burcu Author-X-Name-Last: Karaöz Title: Can market indicators forecast the port throughput? Abstract: The main aim of this study is to forecast the likelihood of increasing or decreasing port throughput from month to month with determined market indicators as input variables. Additionally, the other aim is to determine whether artificial neural network (ANN) and support vector machines (SVM) algorithms are capable of accurately predicting the movement of port throughput. To this aim, Turkish ports were chosen as research environment. The monthly average exchange rates of US dollar, euro, and gold (compared to Turkish lira), and crude oil prices were used as market indicators in the prediction models. The experimental results reveal that, the model with specific market indicators, successfully forecasts the direction of movement on port throughput with accuracy rate of 90.9% in ANN and accuracy rate of 84.6% in SVM. The model developed in the research may help managers to develop short-term logistics plans in operational processes and may help researchers in terms of adapting the model to other research areas. Journal: Int. J. of Data Mining, Modelling and Management Pages: 45-63 Issue: 1 Volume: 11 Year: 2019 Keywords: port throughput; predicting; forecasting in shipping; artificial neural network; ANN; support vector machine; SVM. File-URL: http://www.inderscience.com/link.php?id=96532 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:1:p:45-63 Template-Type: ReDIF-Article 1.0 Author-Name: Las Johansen Balios Caluza Author-X-Name-First: Las Johansen Balios Author-X-Name-Last: Caluza Title: Deciphering published articles on cyberterrorism: a latent Dirichlet allocation algorithm application Abstract: An emerging issue called cyberterrorism is a fatal problem causing a disturbance in the cyberspace. To unravel underlying issues about cyberterrorism, it is imperative to look into available documents found in the NATO's repository. Extraction of articles using web-mining technique and performed topic modelling on NLP. Moreover, this study employed <i>latent Dirichlet allocation algorithm</i>, an unsupervised machine learning to generate latent themes from the text corpus. An identified five underlying themes revealed based on the result. Finally, a profound understanding of cyberterrorism as a pragmatic menace of the cyberspace through a worldwide spread of black propaganda, recruitment, computer and network hacking, economic sabotage and others revealed. As a result, countries around the world, including NATO and its allies, had continuously improved its capabilities against cyberterrorism. Journal: Int. J. of Data Mining, Modelling and Management Pages: 87-101 Issue: 1 Volume: 11 Year: 2019 Keywords: topic modelling; latent Dirichlet allocation; LDA; cyberterrorism; unsupervised machine learning; natural language processing; NLP; sequential exploratory design; Gibbs sampling; cyberspace; web mining. File-URL: http://www.inderscience.com/link.php?id=96539 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:1:p:87-101 Template-Type: ReDIF-Article 1.0 Author-Name: Hima Suresh Author-X-Name-First: Hima Author-X-Name-Last: Suresh Author-Name: Gladston Raj. S Author-X-Name-First: Gladston Raj. Author-X-Name-Last: S Title: An innovative and efficient method for Twitter sentiment analysis Abstract: The research in sentiment analysis is one of the most accomplished fields in data mining area. Specifically, sentiment analysis centres on analysing attitudes and opinions relating a particular topic of interest using machine learning approaches, lexicon-based approaches or hybrid approaches. Users are purposive to develop an automated system that could identify and classify sentiments in the related text. An efficient approach for predicting sentiments would allow us to bring out opinions from the web contents and to predict online public choices, which could prove valuable for ameliorating changes in the sentiment of Twitter users. This paper presents a proposed model to analyse the brand impact using the real data gathered from the micro blog, Twitter collected over a period of 14 months and also discusses the review covering the existing methods and approaches in sentiment analysis. Twitter-based information gathering techniques enable collecting direct responses from the target audience; it provides valuable understanding into public sentiments in the prediction of an opinion of a particular product. The experimental result shows that the proposed method for Twitter sentiment analysis is the best, with an unrivalled accuracy of 86.8%. Journal: Int. J. of Data Mining, Modelling and Management Pages: 1-18 Issue: 1 Volume: 11 Year: 2019 Keywords: sentiment analysis; machine learning approach; lexicon-based approach; supervised learning. File-URL: http://www.inderscience.com/link.php?id=96543 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:1:p:1-18 Template-Type: ReDIF-Article 1.0 Author-Name: Hamed Sabahno Author-X-Name-First: Hamed Author-X-Name-Last: Sabahno Author-Name: Seyed Meysam Mousavi Author-X-Name-First: Seyed Meysam Author-X-Name-Last: Mousavi Author-Name: Amirhossein Amiri Author-X-Name-First: Amirhossein Author-X-Name-Last: Amiri Title: A new development of an adaptive X &minus; R control chart under a fuzzy environment Abstract: It is proved that adaptive control charts have better performance than classical control charts due to adaptability of some or all of their parameters to the previous process information. Fuzzy classical control charts have been occasionally considered by many researchers in the last two decades; however, fuzzy adaptive control charts have not been investigated. In this paper, we introduce a new adaptive <i><span style="text-decoration: overline">X</span></i> &minus; <i>R</i> fuzzy control chart that allows all of the charts' parameters to adapt based on the process state in the previous sample. Also, the warning limits are redefined in the fuzzy environments. We utilise fuzzy mode defuzzification technique to design the decision procedure in the proposed fuzzy adaptive control chart. Finally, an illustrative example is used to present the application of the proposed control chart. Journal: Int. J. of Data Mining, Modelling and Management Pages: 19-44 Issue: 1 Volume: 11 Year: 2019 Keywords: XR control charts; adaptive control charts; fuzzy uncertainty; trapezoidal fuzzy numbers; TrFNs. File-URL: http://www.inderscience.com/link.php?id=96547 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:1:p:19-44 Template-Type: ReDIF-Article 1.0 Author-Name: Razieh Davashi Author-X-Name-First: Razieh Author-X-Name-Last: Davashi Author-Name: Mohammad-Hossein Nadimi-Shahraki Author-X-Name-First: Mohammad-Hossein Author-X-Name-Last: Nadimi-Shahraki Title: EFP-tree: an efficient FP-tree for incremental mining of frequent patterns Abstract: Frequent pattern mining from dynamic databases where there are many incremental updates is a significant research issue in data mining. After incremental updates, the validity of the frequent patterns is changed. A simple way to handle this state is rerunning mining algorithms from scratch which is very costly. To solve this problem, researchers have introduced incremental mining approach. In this article, an efficient FP-tree named EFP-tree is proposed for incremental mining of frequent patterns. For original database, it is constructed like FP-tree by using an auxiliary list without any reconstruction. Consistently, for incremental updates, EFP-tree is reconstructed once and therefore reduces the number of tree reconstructions, reconstructed branches and the search space. The experimental results show that using EFP-tree can reduce reconstructed branches and the runtime in both static and incremental mining and enhance the scalability compared to well-known tree structures CanTree, CP-tree, SPO-tree and GM-tree in both dense and sparse datasets. Journal: Int. J. of Data Mining, Modelling and Management Pages: 144-166 Issue: 2 Volume: 11 Year: 2019 Keywords: data mining; dynamic databases; frequent pattern; incremental mining; FP-tree. File-URL: http://www.inderscience.com/link.php?id=98958 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:2:p:144-166 Template-Type: ReDIF-Article 1.0 Author-Name: T. Subetha Author-X-Name-First: T. Author-X-Name-Last: Subetha Author-Name: S. Chitrakala Author-X-Name-First: S. Author-X-Name-Last: Chitrakala Title: Human activity recognition based on interaction modelling Abstract: Human activity recognition aims at recognising and interpreting the activities of humans automatically from videos. Among the activities of humans, identifying the interactions between human within minimal computation time and reduced misclassification rate is a cumbersome task. Hence, an interaction-based human activity recognition system is proposed in this paper that utilises silhouette features to identify and classify the interactions between humans. The main issues that affect the performance of activity recognition are sudden illumination changes, detection of static human, data discrimination, data variance, crowding problem, and computational complexity. To accomplish the preceding issues, three new algorithms named weight-based updating Gaussian mixture model (wu-GMM), spatial dissemination-based contour silhouettes (SDCS), and weighted constrained dynamic time warping (WCDTW) are proposed. Experiments are conducted with the gaming dataset and Kinect interaction dataset to show that the proposed system recognises the interactions with reduced misclassification rate and minimal processing time compared to the existing system. Journal: Int. J. of Data Mining, Modelling and Management Pages: 167-188 Issue: 2 Volume: 11 Year: 2019 Keywords: human activity recognition; Gaussian mixture model; contour silhouettes; weight-based updating Gaussian mixture model; spatial dissemination-based contour silhouettes; weighted constrained dynamic time warping; dynamic time warping; stochastic neighbour embedding; t-stochastic neighbour embedding; reduced variance-t stochastic neighbour embedding. File-URL: http://www.inderscience.com/link.php?id=98967 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:2:p:167-188 Template-Type: ReDIF-Article 1.0 Author-Name: Mohammad Daoud Author-X-Name-First: Mohammad Author-X-Name-Last: Daoud Title: Using implicitly and explicitly rated online customer reviews to build opinionated Arabic lexicons Abstract: Creating an opinionated lexicon is an important step towards a reliable social media analysis system. In this article we are proposing an approach and describing an experiment to build an Arabic polarised lexical database from analysing online implicitly and explicitly rated customer reviews. These reviews are written in modern standard Arabic and Palestinian/Jordanian dialect. Therefore, the produced lexicon contains casual slangs and dialectic entries used by the online community, which is useful for sentiment analysis of informal social media micro-blogs. We have extracted 28,000 entries from processing 15,100 reviews and by expanding the initial lexicon through Google translate. We calculated an implicit rating for every review driven by its text to address the problem of ambiguous opinions of certain online posts, where the text of the review does not match the given rating (the explicit rating). Each entry was given a polarity tag and a confidence score. High confidence scores have increased the precision of the polarisation process. Explicit rating has increased the coverage and confidence of polarity. Journal: Int. J. of Data Mining, Modelling and Management Pages: 189-203 Issue: 2 Volume: 11 Year: 2019 Keywords: polarised lexicon; social media analysis; opinion mining; term extraction. File-URL: http://www.inderscience.com/link.php?id=98968 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:2:p:189-203 Template-Type: ReDIF-Article 1.0 Author-Name: Eftychios Protopapadakis Author-X-Name-First: Eftychios Author-X-Name-Last: Protopapadakis Author-Name: Dimitrios Niklis Author-X-Name-First: Dimitrios Author-X-Name-Last: Niklis Author-Name: Michalis Doumpos Author-X-Name-First: Michalis Author-X-Name-Last: Doumpos Author-Name: Anastasios Doulamis Author-X-Name-First: Anastasios Author-X-Name-Last: Doulamis Author-Name: Constantin Zopounidis Author-X-Name-First: Constantin Author-X-Name-Last: Zopounidis Title: Sample selection algorithms for credit risk modelling through data mining techniques Abstract: Credit risk assessment is a very challenging and important problem in the domain of financial risk management. The development of reliable credit rating/scoring models is of paramount importance in this area. There are different algorithms and approaches for constructing such models to classify credit applicants (firms or individuals) into risk classes. Reliable sample selection is crucial for this task. The aim of this paper is to examine the effectiveness of sample selection schemes in combination with different classifiers for constructing reliable default prediction models. We consider different algorithms to select representative cases and handle class imbalances. Empirical results are reported for a dataset of Greek companies from the commercial sector. Journal: Int. J. of Data Mining, Modelling and Management Pages: 103-128 Issue: 2 Volume: 11 Year: 2019 Keywords: credit risk modelling; data mining; sampling; classification. File-URL: http://www.inderscience.com/link.php?id=98969 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:2:p:103-128 Template-Type: ReDIF-Article 1.0 Author-Name: Carlos Roberto Silveira Junior Author-X-Name-First: Carlos Roberto Silveira Author-X-Name-Last: Junior Author-Name: Marilde Terezinha Prado Santos Author-X-Name-First: Marilde Terezinha Prado Author-X-Name-Last: Santos Author-Name: Marcela Xavier Ribeiro Author-X-Name-First: Marcela Xavier Author-X-Name-Last: Ribeiro Title: A flexible architecture for the pre-processing of solar satellite image time series data - the SETL architecture Abstract: Satellite image time series (SITS) is a challenging domain for knowledge discovery database due to their characteristics: each image has several sunspots and each sunspot is associated with sensor data composed of the radiation level and the sunspot classifications. Each image has time parameters and sunspots' coordinates, spatiotemporal data. Several challenges of SITS domain are faced during the extract, transform, and load (ETL) process. In this paper, we proposed an architecture called SITS's extract, transform, and load (SETL) that extracts the visual characteristics of each sunspot and associates it with sunspot's sensor data considering the spatiotemporal relations. SETL brings flexibility and extensibility to working with challenging domains such as SITS because it integrates textual, visual and spatiotemporal characteristics at sunspot-record level. Furthermore, we obtained acceptable performance results according to a domain expert and increased the possibility of using different data mining algorithms comparing to the art state. Journal: Int. J. of Data Mining, Modelling and Management Pages: 129-143 Issue: 2 Volume: 11 Year: 2019 Keywords: satellite image time series; SITS; spatiotemporal ETL process; solar STIS process. File-URL: http://www.inderscience.com/link.php?id=98970 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:2:p:129-143 Template-Type: ReDIF-Article 1.0 Author-Name: Bartosz Zieliński Author-X-Name-First: Bartosz Author-X-Name-Last: Zieliński Author-Name: Paweł Maślanka Author-X-Name-First: Paweł Author-X-Name-Last: Maślanka Author-Name: Ścibor Sobieski Author-X-Name-First: Ścibor Author-X-Name-Last: Sobieski Title: Allegories for database modelling Abstract: Allegories abstract and generalise (in the categorical framework) the algebra of binary relations. Arrows in an allegory enjoy a lot of properties and structure available for plain binary relations. At the same time, allegories are sufficiently general to allow the description within the same uniform framework of the lattice valued (e.g., fuzzy) relations and some more general structures. The paper presents a conceptual data modelling formalism which uses the language of allegories. We will provide examples demonstrating expressiveness of this formalism. While most of the examples are meant to be interpreted in the allegory of sets and binary relations, we also show the usefulness of using other allegories, such as the allegory of sets and lattice valued relations, with which one can model replicated data or data stored in a valid time temporal database. Journal: Int. J. of Data Mining, Modelling and Management Pages: 209-234 Issue: 3 Volume: 11 Year: 2019 Keywords: categories; allegories; data modelling; conceptual modelling; fuzzy databases; relational model; relations; relation algebra; relational products; locales. File-URL: http://www.inderscience.com/link.php?id=100384 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:3:p:209-234 Template-Type: ReDIF-Article 1.0 Author-Name: Prudhvi Janga Author-X-Name-First: Prudhvi Author-X-Name-Last: Janga Author-Name: Karen C. Davis Author-X-Name-First: Karen C. Author-X-Name-Last: Davis Title: A grammar-based approach for XML schema extraction and heterogeneous document integration Abstract: The availability of vast amounts of heterogeneous XML web data motivates finding efficient methods to search, integrate, query, and present this data. The structure of XML documents is useful for achieving these tasks; however, not every XML document on the web includes a schema. We discuss challenges and solutions in the area of generation and integration of XML schemas. We propose and implement a framework for efficient schema extraction and integration from heterogeneous XML document collections collected from the web. Our approach introduces the schema extended context-free grammar (SECFG) to model XML schemas, including detection of attributes, data types, and element occurrences. Unlike other implementations, our approach supports the generation of XML schemas in any XML schema language, e.g., DTD or XSD. We compare our approach with other proposed approaches and conclude that we offer the same or better functionality more efficiently and with greater flexibility. The approach we propose is flexible enough to facilitate integration of and translation to tabular (relational) data. Journal: Int. J. of Data Mining, Modelling and Management Pages: 235-258 Issue: 3 Volume: 11 Year: 2019 Keywords: XML schema; schema integration; schema extraction; schema discovery. File-URL: http://www.inderscience.com/link.php?id=100385 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:3:p:235-258 Template-Type: ReDIF-Article 1.0 Author-Name: Kobamelo Moremedi Author-X-Name-First: Kobamelo Author-X-Name-Last: Moremedi Author-Name: John Andrew Van Der Poll Author-X-Name-First: John Andrew Van Der Author-X-Name-Last: Poll Title: Towards a comparative evaluation of text-based specification formalisms and diagrammatic notations Abstract: Specification plays a pivotal role in software engineering to facilitate the development of highly dependable software. Various techniques for specification work have been developed to provide for precise and unambiguous specifications. Z is a formal specification language that is based on a strongly-typed fragment of Zermelo-Fraenkel set theory and first-order logic to provide for provably correct specifications. While diagrammatic specification languages may lack precision, they may, owing to their visual characteristics be a lucrative option for advocates of semi-formal specification techniques. In this research, we investigate the extent to which diagrammatic notations may capture the essence of, e.g., a Z specification. Several diagrammatic notations are considered and combined for this purpose. A case study is employed towards the end to evaluate the utility of the diagrammatic notation developed in this article. Comparisons on the merits of a diagrammatic notation are presented to further determine their feasibility. Journal: Int. J. of Data Mining, Modelling and Management Pages: 259-283 Issue: 3 Volume: 11 Year: 2019 Keywords: case study; diagrammatic notation; formal specification; Euler diagrams; set theory; spider diagrams; Venn diagrams; pierce diagrams; Z. File-URL: http://www.inderscience.com/link.php?id=100386 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:3:p:259-283 Template-Type: ReDIF-Article 1.0 Author-Name: Alfredo Cuzzocrea Author-X-Name-First: Alfredo Author-X-Name-Last: Cuzzocrea Author-Name: Giorgio Mario Grasso Author-X-Name-First: Giorgio Mario Author-X-Name-Last: Grasso Author-Name: Massimiliano Nolich Author-X-Name-First: Massimiliano Author-X-Name-Last: Nolich Title: Effective and efficient distributed management of big clinical data: a framework Abstract: Managing big data in distributed environments is a critical research challenge that has driven the attention from the community. In this context, there are several issues to be faced-off, including: 1) dealing with massive and heterogeneous data; 2) inconsistency problems; 3) query optimisation bottlenecks, and so forth. Clinical data represent a vibrant case of big data, due to both practical as well as methodological challenges exposed by such data. Following these considerations, in this paper we present an architecture for the storage, exchange and use of health data for administrative and epidemiological purposes, which focuses on the patient, who in a safe and easy way can make use of their data for therapeutic and research purposes. The proposed architecture would bring benefits both to patients, giving them the desired centrality in the care process, and to health administration, which could exploit the same infrastructure for better addressing health policies. Journal: Int. J. of Data Mining, Modelling and Management Pages: 284-313 Issue: 3 Volume: 11 Year: 2019 Keywords: big data; healthcare management; distributed big data management. File-URL: http://www.inderscience.com/link.php?id=100387 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:3:p:284-313 Template-Type: ReDIF-Article 1.0 Author-Name: Nilay Khare Author-X-Name-First: Nilay Author-X-Name-Last: Khare Author-Name: Hema Dubey Author-X-Name-First: Hema Author-X-Name-Last: Dubey Title: Fast parallel PageRank technique for detecting spam web pages Abstract: Brin and Larry proposed PageRank in 1998, which appears as a prevailing link analysis technique used by web search engines to rank its search results list. Computation of PageRank values in an efficient and faster manner for very immense web graph is truly an essential concern for search engines today. To identify the spam web pages and also deal with them is yet another important concern in web browsing. In this research article, an efficient and faster parallel PageRank algorithm is proposed, which harnesses the power of graphics processing units (GPUs). In proposed algorithm, the PageRank scores are non-uniformly distributes among the web pages, so it is also competent of coping with spam web pages. The experiments are performed on standard datasets available in Stanford large network dataset collection. There is a speed up of about 1.1 to 1.7 for proposed parallel PageRank algorithm over existing parallel PageRank algorithm. Journal: Int. J. of Data Mining, Modelling and Management Pages: 350-365 Issue: 4 Volume: 11 Year: 2019 Keywords: graphics processing unit; GPU; compute unified device architecture; CUDA; parallel PageRank technique; spam web pages. File-URL: http://www.inderscience.com/link.php?id=102720 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:4:p:350-365 Template-Type: ReDIF-Article 1.0 Author-Name: Alireza Hekmatinia Author-X-Name-First: Alireza Author-X-Name-Last: Hekmatinia Author-Name: Ali Mohammadi Shanghooshabad Author-X-Name-First: Ali Mohammadi Author-X-Name-Last: Shanghooshabad Author-Name: Mohammad Mahdi Motevali Author-X-Name-First: Mohammad Mahdi Author-X-Name-Last: Motevali Author-Name: Mehrdad Almasi Author-X-Name-First: Mehrdad Author-X-Name-Last: Almasi Title: Tuning parameters via a new rapid, accurate and parameter-less method using meta-learning Abstract: Dealing with a large parameter space in data mining tasks is extremely time-consuming, and the tuning method itself needs to be tuned since methods themselves have at least one parameter. Here, a new rapid and parameter-less method is presented to tune algorithms on diverse datasets to achieve high quality results in a short consumed time. The method presented here uses a pre-knowledge by using meta-features to guess closer point to optimal point in parameter space of target algorithms (here, support vector machine algorithm is used). For preparing the pre-knowledge, 282 meta-features are introduced and then genetic algorithm is applied to determine best meta-features for the target algorithm. Then the best meta-features are used to tune the target algorithm on unseen datasets. The results show in less than 0.19 minute in average, the method obtains approximately the same classification rates in comparison with others, but the consumed time is dramatically declined. Journal: Int. J. of Data Mining, Modelling and Management Pages: 366-390 Issue: 4 Volume: 11 Year: 2019 Keywords: parameter tuning; meta learning; meta feature; SVM tuning; genetic algorithm. File-URL: http://www.inderscience.com/link.php?id=102727 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:4:p:366-390 Template-Type: ReDIF-Article 1.0 Author-Name: Monalisha Ghosh Author-X-Name-First: Monalisha Author-X-Name-Last: Ghosh Author-Name: Goutam Sanyal Author-X-Name-First: Goutam Author-X-Name-Last: Sanyal Title: Analysing sentiments based on multi feature combination with supervised learning Abstract: Researches on sentiment analysis are growing to a great extent and attracting wide ranges of attention from academics and industries as well. Feature generation and selection are consequent for text mining as the high dimensional feature set can affect the performance of sentiment analysis. This paper exhibits the efficacy of the proposed combined feature selection technique on machine learning classification algorithms over their individual usefulness. Initially, we transform the review datasets into the feature vector of unigram features along with bi-tagged features based on POS pattern. Next, information gain (IG), Chi squared (χ<SUP align="right"><SMALL>2</SMALL></SUP>) and minimum redundancy maximum relevancy (mRMR) feature selection methods are applied to obtain an optimal feature subset for further functionality. These features are then given input to multiple machine learning classifiers, namely, support vector machine (SVM), multinomial Naïve Bayes (MNB), Bernoulli Naïve Bayes (BNB) and logistic regression (LR) on multi domain product review datasets. The performance of the algorithm is measured by evaluation methods such as precision, recall, and F-measure. Experimental results show that the feature selection method mRMR with SVM achieved a better accuracy of 91.39, which is encouraging and comparable to the related research. Journal: Int. J. of Data Mining, Modelling and Management Pages: 391-416 Issue: 4 Volume: 11 Year: 2019 Keywords: sentiment analysis; opinion mining; text classification; feature selection method; machine learning algorithms optimal feature vector. File-URL: http://www.inderscience.com/link.php?id=102728 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:4:p:391-416 Template-Type: ReDIF-Article 1.0 Author-Name: Francesco Cauteruccio Author-X-Name-First: Francesco Author-X-Name-Last: Cauteruccio Author-Name: Paolo Lo Giudice Author-X-Name-First: Paolo Lo Author-X-Name-Last: Giudice Author-Name: Giorgio Terracina Author-X-Name-First: Giorgio Author-X-Name-Last: Terracina Author-Name: Domenico Ursino Author-X-Name-First: Domenico Author-X-Name-Last: Ursino Author-Name: Nadia Mammone Author-X-Name-First: Nadia Author-X-Name-Last: Mammone Author-Name: Francesco Carlo Morabito Author-X-Name-First: Francesco Carlo Author-X-Name-Last: Morabito Title: A new network-based approach to investigating neurological disorders Abstract: In this paper, we present a new network-based approach to help experts investigate neurological disorders in which the connections among brain areas play a key role. Our approach receives the EEG of a patient and associates a network with it, with nodes that represent electrodes and with edges that denote the disconnection degree of the corresponding brain areas, measured by means of a new string-based metric. Then, it performs some suitable projections on this network, depending on the neurological disorder to investigate. After this, it computes the values of a new coefficient, called connection coefficient, on them. These values can be employed to help neurologists in their analyses. We show how our approach can be employed for three different disorders, namely Creutzfeldt-Jacob disease, childhood absence epilepsy and Alzheimer's disease. Journal: Int. J. of Data Mining, Modelling and Management Pages: 315-349 Issue: 4 Volume: 11 Year: 2019 Keywords: network analysis; connection coefficient; consensus multi-parameterised edit distance; cMPED; electroencephalogram; neurological disorders. File-URL: http://www.inderscience.com/link.php?id=102730 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:11:y:2019:i:4:p:315-349