Template-Type: ReDIF-Article 1.0 Author-Name: Amirhossein Amiri Author-X-Name-First: Amirhossein Author-X-Name-Last: Amiri Author-Name: Mohammad Reza Maleki Author-X-Name-First: Mohammad Reza Author-X-Name-Last: Maleki Author-Name: Fatemeh Sogandi Author-X-Name-First: Fatemeh Author-X-Name-Last: Sogandi Title: Estimating the time of a step change in the multivariate-attribute process mean using ANN and MLE Abstract: In this paper, we consider correlated multivariate-attribute quality characteristics and provide two methods including a modular method based on artificial neural network (ANN) as well as maximum likelihood estimation (MLE) method to estimate the time of change in the parameters of the process mean. We evaluate the performance of the estimators in terms of some criteria in change point estimation and compare them through simulation studies. The results show that the proposed ANN-based model outperforms the MLE approach under most step shifts in the mean vector of the multivariate-attribute process. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 81-98 Issue: 1 Volume: 10 Year: 2018 Keywords: artificial neural network; ANN; step-change point estimation; multivariate-attribute quality characteristics; maximum likelihood estimation; MLE. File-URL: http://www.inderscience.com/link.php?id=90630 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:1:p:81-98 Template-Type: ReDIF-Article 1.0 Author-Name: Dhruv Kumar Author-X-Name-First: Dhruv Author-X-Name-Last: Kumar Author-Name: Poonam Goyal Author-X-Name-First: Poonam Author-X-Name-Last: Goyal Author-Name: Navneet Goyal Author-X-Name-First: Navneet Author-X-Name-Last: Goyal Title: An efficient method for batch updates in OPTICS cluster ordering Abstract: DBSCAN is one of the popular density-based clustering algorithms, but requires re-clustering the entire data when the input parameters are changed. OPTICS overcomes this limitation. In this paper, we propose a batch-wise incremental OPTICS algorithm which performs efficient insertion and deletion of a batch of points in a hierarchical cluster ordering, which is the output of OPTICS. Only a couple of algorithms are available in the literature on incremental versions of OPTICS. This can be attributed to the sequential access patterns of OPTICS. The existing incremental algorithms address the problem of incrementally updating the hierarchical cluster ordering for point-wise insertion/deletion, but these algorithms are only good for infrequent updates. The proposed incremental OPTICS algorithm performs batch-wise insertions/deletions and is suitable for frequent updates. It produces exactly the same hierarchical cluster ordering as that of classical OPTICS. Real datasets have been used for experimental evaluation of the proposed algorithm and results show remarkable performance improvement over the classical and other existing incremental OPTICS algorithms. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 57-80 Issue: 1 Volume: 10 Year: 2018 Keywords: OPTICS; incremental clustering; batch updates; density-based clustering. File-URL: http://www.inderscience.com/link.php?id=90631 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:1:p:57-80 Template-Type: ReDIF-Article 1.0 Author-Name: Alagirisamy Kamatchi Subbiah Sukumaran Author-X-Name-First: Alagirisamy Kamatchi Subbiah Author-X-Name-Last: Sukumaran Title: The consumer choice between the private doctors and the healthcare clinics Abstract: There are not many studies on healthcare from the point of view of the healthcare consumers. The study did not reveal substantial difference between the two healthcare service providers in the eyes of the consumers on the basis of their preference for the healthcare attributes included in the study. Higher income groups of consumers did not attach importance to the cost of healthcare and, the time spent by the doctors with them. Similarly, the eldest consumers were not much worried about the cost of the healthcare. The youngest consumers preferred 'convenient location'. The consumers in the middle age belong neither to the youngest nor to the eldest in their preference towards 'convenient location', 'friendly staff', and 'quick appointment'. The study concludes that the consumers cannot be segmented on the basis of 'doctor consumers' and 'clinic consumers', but they can be segmented on the basis of the demographic characteristics of income and age. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 21-37 Issue: 1 Volume: 10 Year: 2018 Keywords: healthcare; private doctor; healthcare clinic; consumer choice; MANOVA; multiple discriminant analysis; neural networks. File-URL: http://www.inderscience.com/link.php?id=90632 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:1:p:21-37 Template-Type: ReDIF-Article 1.0 Author-Name: Mohammad Mehdi Parhizgar Author-X-Name-First: Mohammad Mehdi Author-X-Name-Last: Parhizgar Author-Name: Elham Keshavarz Author-X-Name-First: Elham Author-X-Name-Last: Keshavarz Title: Identifying and prioritisation entrepreneurial behaviour factors using fuzzy AHP approach Abstract: The purpose of this research is to realise how organisations have been sustaining their growth through applying entrepreneurial behaviour factors. Regarding the thematic nature of research model, experts' opinion in oil industry, companies in oil industry have been examined in the current study included: 1) oil pipeline and telecommunication company; 2) oil products distribution company; 3) National Gas Company in Semnan and Khorasan province have been brought for this research as statistic population. The number of experts participating in the study is 30 persons who were interested in improving discussion. The main tools used for gathering the data in this study were company records and questionnaire. In this study, structural factors, underlying factors, behaviour factors sub-criteria were ranked regarding to criteria related to different levels of entrepreneurial behaviour sub-criteria of oil industry by using fuzzy analytic hierarchy process (FAHP). The results obtained from fuzzy AHP method according to the entrepreneurial behaviour factors indicate that structural factors are more important than underlying factors and behaviour factors. According to the structural factors scale, it is concluded that entrepreneur organisation structure is more important than other factors. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 38-56 Issue: 1 Volume: 10 Year: 2018 Keywords: entrepreneur; entrepreneurship benefits; entrepreneurial behaviour; fuzzy analytic hierarchy process; FAHP. File-URL: http://www.inderscience.com/link.php?id=90633 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:1:p:38-56 Template-Type: ReDIF-Article 1.0 Author-Name: Yu Sang Chang Author-X-Name-First: Yu Sang Author-X-Name-Last: Chang Author-Name: Jinsoo Lee Author-X-Name-First: Jinsoo Author-X-Name-Last: Lee Author-Name: Hyuk Ju Kwon Author-X-Name-First: Hyuk Ju Author-X-Name-Last: Kwon Title: When will the 2015 millennium development goal of infant mortality rate be finally realised? - Projections for 21 OECD countries through 2050 Abstract: According to the United Nations Children's Fund (UNICEF), the number of global infant deaths for those under the age of one year was down from 8.4 million in 1990 to 5.4 million in 2010. However, the declining trend of infant mortality rate varies significantly from country to country based on the vastly different environmental elements they face. This paper attempts to predict the future infant mortality rate of 21 OECD countries through 2015 and 2050 by the use of experience curve model and compare the results to the two other well-known projections by the United Nations Population Division and the US Census Bureau in the context of the millennium development goal targets. The results from all three projections indicate that only one or two countries will meet the two-thirds reduction target of the 2015 millennium development goal. By 2050, four to 18 countries will still not be able to meet the target. Therefore, each country may need to undertake a comprehensive review of its policies and programs of infant mortality control to generate many alternative plans for major improvement. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 1-20 Issue: 1 Volume: 10 Year: 2018 Keywords: child health; health policy; infant mortality rate; IMR; experience curve model; millennium development goals. File-URL: http://www.inderscience.com/link.php?id=90634 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:1:p:1-20 Template-Type: ReDIF-Article 1.0 Author-Name: Andri Mirzal Author-X-Name-First: Andri Author-X-Name-Last: Mirzal Title: Clustering and latent semantic indexing aspects of the non-negative matrix factorisation Abstract: This paper proposes a theoretical support for clustering aspect of non-negative matrix factorisation (NMF). By utilising Karush-Kuhn-Tucker optimality conditions, we show that NMF objective is equivalent to graph clustering objective, so clustering aspect of NMF has a solid justification. Different from previous approaches - which either ignore non-negativity constraints or assume absolute orthonormality on coefficient matrix in order to derive the equivalency - our approach takes non-negativity constraints into account and makes no assumption about orthonormality of coefficient matrix. Thus, not only stationary point being used in deriving the equivalency is guaranteed to be located on NMFs feasible region, but also the result is more realistic since NMF does not produce orthonormal matrix. Furthermore, because clustering capability of a matrix decomposition technique may imply its latent semantic indexing (LSI) aspect, we also study LSI aspect of NMF. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 153-181 Issue: 2 Volume: 10 Year: 2018 Keywords: bound-constrained optimisation; clustering method; non-negative matrix factorisation; NMF; Karush-Kuhn-Tucker conditions; latent semantic indexing; LSI; singular value decomposition; SVD. File-URL: http://www.inderscience.com/link.php?id=92443 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:2:p:153-181 Template-Type: ReDIF-Article 1.0 Author-Name: Shilpa Jain Author-X-Name-First: Shilpa Author-X-Name-Last: Jain Author-Name: Dinesh C.S. Bisht Author-X-Name-First: Dinesh C.S. Author-X-Name-Last: Bisht Author-Name: Prakash Chandra Mathpal Author-X-Name-First: Prakash Chandra Author-X-Name-Last: Mathpal Title: Particle swarm optimised fuzzy method for prediction of water table elevation fluctuation Abstract: Particle swarm optimisation (PSO) is a population-based powerful evolutionary computational technique inspired by social behaviour simulation of bird flocking and fish schooling. PSO has been applied successfully to a wide range of applications like scheduling, artificial neural networks (ANN) training, control strategy determination and ingredient mix optimisation. Fuzzy logic can easily cope up with vagueness and uncertainty in time series data. This has been applied for prediction of water table elevation, in our earlier work and results are quite promising. But, the optimisation of length of fuzzy intervals was a big constraint for researchers. In this research paper, the optimal length of fuzzy intervals in the universe of discourse is been selected using particle swarm optimisation. The results obtained after applying this combined approach to prediction of water table elevation are better than the previous method. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 99-110 Issue: 2 Volume: 10 Year: 2018 Keywords: fuzzy logic; particle swarm optimisation; PSO; mean square error; water table; forecasting. File-URL: http://www.inderscience.com/link.php?id=92444 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:2:p:99-110 Template-Type: ReDIF-Article 1.0 Author-Name: Malaya Dutta Borah Author-X-Name-First: Malaya Dutta Author-X-Name-Last: Borah Author-Name: Rajni Jindal Author-X-Name-First: Rajni Author-X-Name-Last: Jindal Title: An approach for high utility pattern mining Abstract: Mining high utility pattern has become prominent as it provides semantic significance (utility/weighted patterns) associated with items in a transaction. Data analysis and respective strategies for mining high utility patterns is important in real world scenarios. Recent researches focused on high utility pattern mining using tree-based data structure which suffers greater computation time, since they generate multiple tree branches. To cope up with these problems, this work proposes a novel binary tree-based data structure with average maximum utility (AvgMU) and mining algorithm to mine high utility patterns from incremental data which reduces tree constructions and computation time. The proposed algorithms are implemented using synthetic, real datasets and compared with state-of-the-art tree-based algorithms. Experimental results show that the proposed work has better performance in terms of running time, scalability and memory consumption than the other algorithms compared in this research work. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 124-152 Issue: 2 Volume: 10 Year: 2018 Keywords: high utility pattern mining; frequent pattern mining; tree-based data structure; incremental mining; data analysis; average maximum utility; AvgMU. File-URL: http://www.inderscience.com/link.php?id=92445 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:2:p:124-152 Template-Type: ReDIF-Article 1.0 Author-Name: Manasi Vinayak Harshe Author-X-Name-First: Manasi Vinayak Author-X-Name-Last: Harshe Author-Name: Rajesh H. Kulkarni Author-X-Name-First: Rajesh H. Author-X-Name-Last: Kulkarni Title: Outlier detection using weighted holoentropy with hyperbolic tangent function Abstract: Numerous research works has been carried out in the literature to detect the outlier's a.k.a anomalies. Outlier detection is considered as a pre-processing step for locating those objects in a dataset that do not conform to well-defined notions of expected behaviour. In the proposed method, logistic sigmoid function related to hyperbolic tangent will be used as weightage function for finding the outlier data point(s). It can distribute the outlier data points effectively as compared with the reverse sigmoid function. The method is implemented with four phases. In the first phase, data is read out and dynamic entropy is calculated. In the second phase, probability and dynamic entropy computations using logistic sigmoid function related to hyperbolic tangent are performed. In the third phase, dynamic entropies are sorted and top N point is selected as outlier data point(s) and finally, the accuracy for correct outliers is computed for the proposed method. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 182-203 Issue: 2 Volume: 10 Year: 2018 Keywords: outlier; holoentropy; weightage function. File-URL: http://www.inderscience.com/link.php?id=92446 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:2:p:182-203 Template-Type: ReDIF-Article 1.0 Author-Name: Narges Sadat Bathaeian Author-X-Name-First: Narges Sadat Author-X-Name-Last: Bathaeian Title: Using imputation algorithms when missing values appear in the test data in contrast with the training data Abstract: Real datasets suffer from the problem of missing data. Imputation is a common solution for this problem. Most of research works perform imputation algorithms to training data. Therefore, the output variable of samples might influence the imputation model. This paper aims to compare different imputation algorithms when they are applied to test data and training data. In this research, first, the relations between output variable and different imputation algorithms are investigated. Then six different types of imputation algorithms are applied to both training data and test data. Chosen datasets are globally available, and cover both classification and regression tasks. Also missing values are injected artificially to them. The results showed that performance of all algorithms will reduce in the case of elimination of output variable. Particularly, decline in algorithm, which uses k nearest neighbour for imputation in the classification datasets is not ignorable. Nevertheless, algorithms that are based on random forests have less decline and show better results compared with other five types of algorithms. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 111-123 Issue: 2 Volume: 10 Year: 2018 Keywords: missing values; imputation algorithms; regression; kNN; MICE; random forest; tree; EM. File-URL: http://www.inderscience.com/link.php?id=92447 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:2:p:111-123 Template-Type: ReDIF-Article 1.0 Author-Name: Abolfazl Kazemi Author-X-Name-First: Abolfazl Author-X-Name-Last: Kazemi Author-Name: Ghazaleh Khodabandehlouie Author-X-Name-First: Ghazaleh Author-X-Name-Last: Khodabandehlouie Title: A new initialisation method for k-means algorithm in the clustering problem: data analysis Abstract: Clustering is one of the most important tasks in exploratory data analysis. One of the simplest and the most widely used clustering algorithm is K-means which was proposed in 1955. K-means algorithm is conceptually simple and easy to implement. This is evidenced by hundreds of publications over the last 50 years that extend k-means in various ways. Unfortunately, because of its nature, this algorithm is very sensitive to the initial placement of the cluster centres. In order to address this problem, many initialisation methods (IMs) have been proposed. In this thesis, we first provide a historical overview of these methods. Then we present two new non-random initialisation methods for k-means algorithm. Finally, we analyse the experimental results using real datasets and then the performance of IMs is evaluated by TOPSIS multi-criteria decision-making method. Finally, we prove that not only famous k-means IMs often have poor performance but also there are in fact strong alternative approaches. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 291-304 Issue: 3 Volume: 10 Year: 2018 Keywords: clustering; K-means algorithm; cluster centre initialisation; sum of squared error criterion; data analysis. File-URL: http://www.inderscience.com/link.php?id=94127 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:291-304 Template-Type: ReDIF-Article 1.0 Author-Name: Anusuya Kirubakaran Author-X-Name-First: Anusuya Author-X-Name-Last: Kirubakaran Author-Name: M. Aramudhan Author-X-Name-First: M. Author-X-Name-Last: Aramudhan Title: A watchdog approach - name-matching algorithm for big data risk intelligence Abstract: Even though modern world is ruled by data and preventive measures are in place to keep the data quality higher, risk intelligence teams are challenged for one of the risk analysis task aimed at record linkages on heterogeneous data from multiple data sources due higher ratio of non-standard and poor quality data present in big data systems caused by variety of data format across regions, data platforms, data storage systems, data migration, etc. To keep these record linkages in mind, in this paper, we try to address the complications in name matching process irrespective of spelling, structure and phonetic variations. Success of name matching is achieved when the algorithm is capable of handling names with discrepancies due to naming conventions, cross language translation, operating system transformation, data migration, batch feeds, typos and other external factors. In this paper, we have discussed the varieties of name representation in data source and the methods to parse and find the maximum probabilities of name match comparable to watchdog security with high accuracy as well as the percentage of false negative rate being reduced. The proposed methods can be applied to financial sector's risk intelligence analysis like know your customer (KYC), anti-money laundering (AML), customer due diligence (CDD), anti-terrorism, watchlist screening and fraud detection. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 273-290 Issue: 3 Volume: 10 Year: 2018 Keywords: hybrid name matching; string similarity measure; data matching; risk intelligence. File-URL: http://www.inderscience.com/link.php?id=94128 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:273-290 Template-Type: ReDIF-Article 1.0 Author-Name: Hongying Sun Author-X-Name-First: Hongying Author-X-Name-Last: Sun Author-Name: Yu Tian Author-X-Name-First: Yu Author-X-Name-Last: Tian Title: Using improved genetic algorithm under uncertain circumstance of site selection of O2O customer returns Abstract: Online-to-offline (O2O) e-commerce supports online purchase and offline servicing. In recent years, with the growth of online shopping in China, O2O has become a new popular mode of e-commerce appliance. Buying online and returning offline are becoming a dominant shopping mode. The returns of customer should be collected to be treated in a more cost-efficient manner. To this end, this paper aims to propose an integer programming model to minimise the cost in construction couple with operating charges by optimising the sites of reverse logistics with the customer returns. For lowering storage costs, physical stores and their geographical sites should be far away from the residential area. In addition, this paper designs an improved genetic algorithm for solving two-stage heredity under random circumstance in that this model builds up multilayer reverse logistics network for recycling customer returns. Both the simulation and numerical examples prove the effectiveness and feasibility of this improved genetic algorithm. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 241-256 Issue: 3 Volume: 10 Year: 2018 Keywords: reverse logistics; site selection; improved genetic algorithm; O2O e-commerce. File-URL: http://www.inderscience.com/link.php?id=94129 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:241-256 Template-Type: ReDIF-Article 1.0 Author-Name: J.V.N. Lakshmi Author-X-Name-First: J.V.N. Author-X-Name-Last: Lakshmi Title: Data analysis on big data: improving the map and shuffle phases in Hadoop Map Reduce Abstract: The data management has become a challenging issue for network centric applications which need to process large amount of datasets. System requires advanced tools to analyse these datasets. As an efficient parallel computing programming model Map Reduce and Hadoop are used for large-scale data analysis. However, Map Reduce still suffers with performance problems Map Reduce uses a shuffle phase individual shuffle service component with efficient I/O policy. The map phase requires an improvement in its performance as this phase's output acts as an input to the next phase. Its result reveals the efficiency, so map phase needs some intermediate check points which regularly monitor all the splits generated by intermediate phases. This acts as a barrier for effective resource utilisation. This paper implements shuffle as a service component to decrease the overall execution time of jobs, monitor map phase by skew handling and increase resource utilisation in a cluster. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 305-316 Issue: 3 Volume: 10 Year: 2018 Keywords: Map Reduce; Hadoop; shuffle; big data; data analytics; Hadoop distributed file system; HDFS; rack awareness; stragglers; light weight processing; OLAP; OLTP. File-URL: http://www.inderscience.com/link.php?id=94130 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:305-316 Template-Type: ReDIF-Article 1.0 Author-Name: Naveen Dahiya Author-X-Name-First: Naveen Author-X-Name-Last: Dahiya Author-Name: Vishal Bhatnagar Author-X-Name-First: Vishal Author-X-Name-Last: Bhatnagar Author-Name: Manjeet Singh Author-X-Name-First: Manjeet Author-X-Name-Last: Singh Title: A fuzzy-based automatic prediction system for quality evaluation of conceptual data warehouse models Abstract: In the paper, we present an automatic system based on fuzzy logic to predict the understanding time of conceptual data warehouse models. The system takes as input the values of quality metrics for a model and gives understanding time as output. The metrics used for quality evaluation have been proposed and validated by Manuel Serrano. The results of automatic system are compared with the results of actual data collection made manually. The predicted results are highly significant to prove the validity and efficiency of the designed automatic system. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 317-333 Issue: 3 Volume: 10 Year: 2018 Keywords: fuzzy logic; quality metrics; data warehouse; understanding time; conceptual models. File-URL: http://www.inderscience.com/link.php?id=94131 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:317-333 Template-Type: ReDIF-Article 1.0 Author-Name: C. Seelammal Author-X-Name-First: C. Author-X-Name-Last: Seelammal Author-Name: K. Vimala Devi Author-X-Name-First: K. Vimala Author-X-Name-Last: Devi Title: Multi-criteria decision support for feature selection in network anomaly detection system Abstract: The growth of computer networks from LAN to cloud, virtualisation and mobility always keeps intrusion detection system (IDS) as a critical component in the field of network security infrastructure. Tremendous growth and usage of internet raises concerns about how to protect and communicate the digital information in a safe manner. The market for security solutions for next-generation is rapidly evolving and constantly changing to accommodate today's threat. Many intrusion detection techniques, methods and algorithms are implemented to detect these novel attacks. But there's no clear feature set, uncertainty bounds established as a baseline for dynamic environments. The main objective of this paper is to determine and provide the best feature selection for next generation dynamic environments using multi-criteria decision making, decision tree learning with emphasis on optimisation (contingency of weight allocation) and handling large datasets. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 334-350 Issue: 3 Volume: 10 Year: 2018 Keywords: intrusion detection; multi-criteria; classification; anomaly; data mining; feature selection; machine learning. File-URL: http://www.inderscience.com/link.php?id=94132 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:334-350 Template-Type: ReDIF-Article 1.0 Author-Name: K.G. Srinivasa Author-X-Name-First: K.G. Author-X-Name-Last: Srinivasa Author-Name: R. Sharath Author-X-Name-First: R. Author-X-Name-Last: Sharath Author-Name: S. Krishna Chaitanya Author-X-Name-First: S. Krishna Author-X-Name-Last: Chaitanya Author-Name: K.N. Nirupam Author-X-Name-First: K.N. Author-X-Name-Last: Nirupam Author-Name: B.J. Sowmya Author-X-Name-First: B.J. Author-X-Name-Last: Sowmya Title: Data analytics on census data to predict the income and economic hierarchy Abstract: The US Census Bureau conducts the American Community Survey generating a massive dataset with millions of data points. The rich dataset contains detailed information of approximately 3.5 million households about who they are and how they live including ancestry, education, work, transportation, internet use and residency. This enormous data encourages the need to know more about the population and to derive insight. The ever demanding requirement in exposing the subtlety in case of economic issues is the motivation behind to construe meaningful conclusions in income domain. Hence the focus is to concentrate on bringing out unique insights into the financial status of the people living in the country. These conclusions delineated might aid in delivering wiser decisions in regard to economic growth of the country. Using relevant attributes, demographic graphs are plotted, aiding the conclusions drawn. Also classifications into various economic classes are done using well known classifiers. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 223-240 Issue: 3 Volume: 10 Year: 2018 Keywords: index terms-demographic graphs; Benford's law; income; K-means clustering; Naive Bayes classifier. File-URL: http://www.inderscience.com/link.php?id=94133 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:223-240 Template-Type: ReDIF-Article 1.0 Author-Name: Brojo Kishore Mishra Author-X-Name-First: Brojo Kishore Author-X-Name-Last: Mishra Author-Name: Abhaya Kumar Sahoo Author-X-Name-First: Abhaya Kumar Author-X-Name-Last: Sahoo Author-Name: Chittaranjan Pradhan Author-X-Name-First: Chittaranjan Author-X-Name-Last: Pradhan Title: GPU based reduce approach for computing faculty performance evaluation process using classification technique in opinion mining Abstract: Today's competitive market, education system plays a main role in creating better students. To create better students, main focus is given to the quality of teaching. That quality can be achieved due to better coordination among faculty and student. To get better quality of teaching, faculty performance should be measured by feedback analysis. Performance of faculty should be evaluated so that we can enhance our educational quality. Here we used opinion mining by which large amount of data can be available in the form of reviews, opinions, feedbacks, remarks, observations, comments, explanations and clarifications. So, we collected feedback about faculty from students through feedback form. To measure the performance of faculty, we used a classification technique by using opinion mining. We also used this technique on graphics processing unit (GPU) architecture using compute unified device architecture using C (CUDA-C) programming model as well as map reduce programming model to evaluate performance of a faculty. Then we compared between GPU with reduce approach and map reduce approach for getting faster result. This paper uses GPU architecture for CUDA-C programming and Hadoop framework tool for map reduce programming for faster computation of faculty performance evaluation. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 208-222 Issue: 3 Volume: 10 Year: 2018 Keywords: classification; compute unified device architecture using C; CUDA-C; education system; feedback; graphics processing unit; GPU; Hadoop; map reduce; opinion mining. File-URL: http://www.inderscience.com/link.php?id=94134 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:208-222 Template-Type: ReDIF-Article 1.0 Author-Name: Paweł Bartoszczuk Author-X-Name-First: Paweł Author-X-Name-Last: Bartoszczuk Title: The risk in eco-innovation introduction at the enterprises Abstract: The goal of this paper is to present eco-innovation implementation risk at enterprises. Eco-innovation is a rather modern term and can be a method to solve emerging environmental problems as consequences of economic growth. As for any innovation, eco-innovations have several types and can result therefore in a new or significantly improved product (good or service), process, a new marketing or organisational methods. Eco-innovation should be seen as an integral part of innovation efforts across all the economy sectors. European countries observe many barriers for implementation of eco-innovation, mainly associated with the high investments risk and limited interest. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 257-272 Issue: 3 Volume: 10 Year: 2018 Keywords: eco-innovation; environment; ecological risk; economy; enterprise; environment. File-URL: http://www.inderscience.com/link.php?id=94135 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:257-272 Template-Type: ReDIF-Article 1.0 Author-Name: Panagiotis Tsarouhas Author-X-Name-First: Panagiotis Author-X-Name-Last: Tsarouhas Title: Reliability, availability and maintainability - RAM analysis of cake production lines: a case study Abstract: In this study reliability, availability and maintainability analysis was conducted for cake production line by applying statistical techniques on failure data. Data collection from the line and their analysis were valid over a long time of seventeen months. The reliability, availability and maintainability analysis of the failure data were determined to provide an estimate of the current operation management and improve the line efficiency. It was found out that: (a) the availability of the cake production line was 95.44% and dropped to 93.15% because the equipment's failures cause an additional production gap in the line (b) the two machines with the most frequent failures and lowest availabilities are the forming/dosing machine, and the wrapping machine (c) the worst maintainability occurs at cooling tower, and at oven (d) the identification of the best distributions for the failure data and their parameters of the cake production line were made. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 381-405 Issue: 4 Volume: 10 Year: 2018 Keywords: cake production line; reliability; maintainability; performance evaluation; quality; field failure; repair data. File-URL: http://www.inderscience.com/link.php?id=95214 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:4:p:381-405 Template-Type: ReDIF-Article 1.0 Author-Name: Bayu Adhi Tama Author-X-Name-First: Bayu Adhi Author-X-Name-Last: Tama Author-Name: Kyung-Hyune Rhee Author-X-Name-First: Kyung-Hyune Author-X-Name-Last: Rhee Title: A comparative study of classifier ensembles for detecting inactive learner in university Abstract: Prediction of undesirable learner's behaviours is an important issue in the distance learning system as well as the conventional university. This paper is devoted to benchmark ensemble of weak classifiers (decision tree, random forest, logistic regression, and CART) against single classifier models to predict inactive student. Two real-world datasets were obtained from a distance learning system and a computer science college in Indonesia. To evaluate the performance of the classifier ensembles, several performance metrics such as average accuracy, precision, recall, fall-out, F1, and area under ROC curve (AUC) value were involved. Our experiments reveal that classifier ensembles outperform single classifier in all evaluation metrics. This study contributes to the literature on making a comparative study of ensemble learners in the purview of educational data mining. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 351-368 Issue: 4 Volume: 10 Year: 2018 Keywords: classifier ensemble; educational data mining; EDM; distance learning; benchmark. File-URL: http://www.inderscience.com/link.php?id=95215 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:4:p:351-368 Template-Type: ReDIF-Article 1.0 Author-Name: S. Lovelyn Rose Author-X-Name-First: S. Lovelyn Author-X-Name-Last: Rose Author-Name: R. Venkatesan Author-X-Name-First: R. Author-X-Name-Last: Venkatesan Author-Name: Girish Pasupathy Author-X-Name-First: Girish Author-X-Name-Last: Pasupathy Author-Name: P. Swaradh Author-X-Name-First: P. Author-X-Name-Last: Swaradh Title: A lexicon-based term weighting scheme for emotion identification of tweets Abstract: Detecting emotions in tweets is a huge challenge due to its limited 140 characters and extensive use of twitter language with evolving terms and slangs. This paper uses various preprocessing techniques, forms a feature vector using lexicons and classifies tweets into Paul Ekman's basic emotions namely, happy, sad, anger, fear, disgust and surprise using machine learning. Preprocessing is done using the dictionaries available for emoticons, interjections and slangs and by handling punctuation marks and hashtags. The feature vector is created by combining words from the NRC Emotion lexicon, WordNet-Affect and online thesaurus. Feature vectors are assigned weight based on the presence of punctuations and negations in the feature and the tweets are classified using naive Bayes, SVM and random forests. The use of lexicon features and a novel weighting scheme has produced a considerable gain in terms of accuracy with random forest achieving maximum accuracy of almost 73%. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 369-380 Issue: 4 Volume: 10 Year: 2018 Keywords: emotion classification; Twitter; preprocessing; feature selection; dictionaries; lexicons; term weighting; random forest; SVM; naive Bayes. File-URL: http://www.inderscience.com/link.php?id=95216 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:4:p:369-380 Template-Type: ReDIF-Article 1.0 Author-Name: Sergei P. Sidorov Author-X-Name-First: Sergei P. Author-X-Name-Last: Sidorov Author-Name: Alexey R. Faizliev Author-X-Name-First: Alexey R. Author-X-Name-Last: Faizliev Author-Name: Vladimir A. Balash Author-X-Name-First: Vladimir A. Author-X-Name-Last: Balash Title: A long memory property of economic and financial news flows Abstract: One of the tools for examining the processes and time series with self-similarity is the long-range correlation exponent (the Hurst exponent). Many methods have been developed for estimating the long-range correlation exponent using experimental time series over the last years. In this paper we estimate the Hurst exponent parameter obtained by different methods using news analytics time series. We exploit the most commonly used methods for estimating the Hurst exponents: fluctuation analysis, the detrended fluctuation analysis and the detrending moving average analysis. Following some previous studies, empirical results show the presence of long-range correlations for the time series of news intensity data. In particular, the paper shows that the behaviour of long range dependence for time series of news intensity in the recent period from 1 January 2015 to 22 September 2015 did not change in comparison to the period from 1 September 2010 to 29 October 2010. Moreover, the change of the news analytics provider and the consideration of more recent data did not significantly affect estimates of the Hurst exponent. The results show that the self-similarity property is a stable characteristic of the news flow of information which serves the financial industry and stock markets. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 406-420 Issue: 4 Volume: 10 Year: 2018 Keywords: long-range correlation; detranded fluctuation analysis; time series; auto-correlation. File-URL: http://www.inderscience.com/link.php?id=95218 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:10:y:2018:i:4:p:406-420