Template-Type: ReDIF-Article 1.0
Author-Name: Amirhossein Amiri
Author-X-Name-First: Amirhossein
Author-X-Name-Last: Amiri
Author-Name: Mohammad Reza Maleki
Author-X-Name-First: Mohammad Reza
Author-X-Name-Last: Maleki
Author-Name: Fatemeh Sogandi
Author-X-Name-First: Fatemeh
Author-X-Name-Last: Sogandi
Title: Estimating the time of a step change in the multivariate-attribute process mean using ANN and MLE
Abstract:
In this paper, we consider correlated multivariate-attribute quality characteristics and provide two methods including a modular method based on artificial neural network (ANN) as well as maximum likelihood estimation (MLE) method to estimate the time of change in the parameters of the process mean. We evaluate the performance of the estimators in terms of some criteria in change point estimation and compare them through simulation studies. The results show that the proposed ANN-based model outperforms the MLE approach under most step shifts in the mean vector of the multivariate-attribute process.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 81-98
Issue: 1
Volume: 10
Year: 2018
Keywords: artificial neural network; ANN; step-change point estimation; multivariate-attribute quality characteristics; maximum likelihood estimation; MLE.
File-URL: http://www.inderscience.com/link.php?id=90630
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:1:p:81-98

Template-Type: ReDIF-Article 1.0
Author-Name: Dhruv Kumar
Author-X-Name-First: Dhruv
Author-X-Name-Last: Kumar
Author-Name: Poonam Goyal
Author-X-Name-First: Poonam
Author-X-Name-Last: Goyal
Author-Name: Navneet Goyal
Author-X-Name-First: Navneet
Author-X-Name-Last: Goyal
Title: An efficient method for batch updates in OPTICS cluster ordering
Abstract:
DBSCAN is one of the popular density-based clustering algorithms, but requires re-clustering the entire data when the input parameters are changed. OPTICS overcomes this limitation. In this paper, we propose a batch-wise incremental OPTICS algorithm which performs efficient insertion and deletion of a batch of points in a hierarchical cluster ordering, which is the output of OPTICS. Only a couple of algorithms are available in the literature on incremental versions of OPTICS. This can be attributed to the sequential access patterns of OPTICS. The existing incremental algorithms address the problem of incrementally updating the hierarchical cluster ordering for point-wise insertion/deletion, but these algorithms are only good for infrequent updates. The proposed incremental OPTICS algorithm performs batch-wise insertions/deletions and is suitable for frequent updates. It produces exactly the same hierarchical cluster ordering as that of classical OPTICS. Real datasets have been used for experimental evaluation of the proposed algorithm and results show remarkable performance improvement over the classical and other existing incremental OPTICS algorithms.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 57-80
Issue: 1
Volume: 10
Year: 2018
Keywords: OPTICS; incremental clustering; batch updates; density-based clustering.
File-URL: http://www.inderscience.com/link.php?id=90631
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:1:p:57-80

Template-Type: ReDIF-Article 1.0
Author-Name: Alagirisamy Kamatchi Subbiah Sukumaran
Author-X-Name-First: Alagirisamy Kamatchi Subbiah
Author-X-Name-Last: Sukumaran
Title: The consumer choice between the private doctors and the healthcare clinics
Abstract:
There are not many studies on healthcare from the point of view of the healthcare consumers. The study did not reveal substantial difference between the two healthcare service providers in the eyes of the consumers on the basis of their preference for the healthcare attributes included in the study. Higher income groups of consumers did not attach importance to the cost of healthcare and, the time spent by the doctors with them. Similarly, the eldest consumers were not much worried about the cost of the healthcare. The youngest consumers preferred 'convenient location'. The consumers in the middle age belong neither to the youngest nor to the eldest in their preference towards 'convenient location', 'friendly staff', and 'quick appointment'. The study concludes that the consumers cannot be segmented on the basis of 'doctor consumers' and 'clinic consumers', but they can be segmented on the basis of the demographic characteristics of income and age.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 21-37
Issue: 1
Volume: 10
Year: 2018
Keywords: healthcare; private doctor; healthcare clinic; consumer choice; MANOVA; multiple discriminant analysis; neural networks.
File-URL: http://www.inderscience.com/link.php?id=90632
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:1:p:21-37

Template-Type: ReDIF-Article 1.0
Author-Name: Mohammad Mehdi Parhizgar
Author-X-Name-First: Mohammad Mehdi
Author-X-Name-Last: Parhizgar
Author-Name: Elham Keshavarz
Author-X-Name-First: Elham
Author-X-Name-Last: Keshavarz
Title: Identifying and prioritisation entrepreneurial behaviour factors using fuzzy AHP approach
Abstract:
The purpose of this research is to realise how organisations have been sustaining their growth through applying entrepreneurial behaviour factors. Regarding the thematic nature of research model, experts' opinion in oil industry, companies in oil industry have been examined in the current study included: 1) oil pipeline and telecommunication company; 2) oil products distribution company; 3) National Gas Company in Semnan and Khorasan province have been brought for this research as statistic population. The number of experts participating in the study is 30 persons who were interested in improving discussion. The main tools used for gathering the data in this study were company records and questionnaire. In this study, structural factors, underlying factors, behaviour factors sub-criteria were ranked regarding to criteria related to different levels of entrepreneurial behaviour sub-criteria of oil industry by using fuzzy analytic hierarchy process (FAHP). The results obtained from fuzzy AHP method according to the entrepreneurial behaviour factors indicate that structural factors are more important than underlying factors and behaviour factors. According to the structural factors scale, it is concluded that entrepreneur organisation structure is more important than other factors.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 38-56
Issue: 1
Volume: 10
Year: 2018
Keywords: entrepreneur; entrepreneurship benefits; entrepreneurial behaviour; fuzzy analytic hierarchy process; FAHP.
File-URL: http://www.inderscience.com/link.php?id=90633
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:1:p:38-56

Template-Type: ReDIF-Article 1.0
Author-Name: Yu Sang Chang
Author-X-Name-First: Yu Sang
Author-X-Name-Last: Chang
Author-Name: Jinsoo Lee
Author-X-Name-First: Jinsoo
Author-X-Name-Last: Lee
Author-Name: Hyuk Ju Kwon
Author-X-Name-First: Hyuk Ju
Author-X-Name-Last: Kwon
Title: When will the 2015 millennium development goal of infant mortality rate be finally realised? &#45; Projections for 21 OECD countries through 2050
Abstract:
According to the United Nations Children's Fund (UNICEF), the number of global infant deaths for those under the age of one year was down from 8.4 million in 1990 to 5.4 million in 2010. However, the declining trend of infant mortality rate varies significantly from country to country based on the vastly different environmental elements they face. This paper attempts to predict the future infant mortality rate of 21 OECD countries through 2015 and 2050 by the use of experience curve model and compare the results to the two other well-known projections by the United Nations Population Division and the US Census Bureau in the context of the millennium development goal targets. The results from all three projections indicate that only one or two countries will meet the two-thirds reduction target of the 2015 millennium development goal. By 2050, four to 18 countries will still not be able to meet the target. Therefore, each country may need to undertake a comprehensive review of its policies and programs of infant mortality control to generate many alternative plans for major improvement.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 1-20
Issue: 1
Volume: 10
Year: 2018
Keywords: child health; health policy; infant mortality rate; IMR; experience curve model; millennium development goals.
File-URL: http://www.inderscience.com/link.php?id=90634
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:1:p:1-20

Template-Type: ReDIF-Article 1.0
Author-Name: Andri Mirzal
Author-X-Name-First: Andri
Author-X-Name-Last: Mirzal
Title: Clustering and latent semantic indexing aspects of the non-negative matrix factorisation
Abstract:
This paper proposes a theoretical support for clustering aspect of non-negative matrix factorisation (NMF). By utilising Karush-Kuhn-Tucker optimality conditions, we show that NMF objective is equivalent to graph clustering objective, so clustering aspect of NMF has a solid justification. Different from previous approaches &#45; which either ignore non-negativity constraints or assume absolute orthonormality on coefficient matrix in order to derive the equivalency &#45; our approach takes non-negativity constraints into account and makes no assumption about orthonormality of coefficient matrix. Thus, not only stationary point being used in deriving the equivalency is guaranteed to be located on NMFs feasible region, but also the result is more realistic since NMF does not produce orthonormal matrix. Furthermore, because clustering capability of a matrix decomposition technique may imply its latent semantic indexing (LSI) aspect, we also study LSI aspect of NMF.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 153-181
Issue: 2
Volume: 10
Year: 2018
Keywords: bound-constrained optimisation; clustering method; non-negative matrix factorisation; NMF; Karush-Kuhn-Tucker conditions; latent semantic indexing; LSI; singular value decomposition; SVD.
File-URL: http://www.inderscience.com/link.php?id=92443
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:2:p:153-181

Template-Type: ReDIF-Article 1.0
Author-Name: Shilpa Jain
Author-X-Name-First: Shilpa
Author-X-Name-Last: Jain
Author-Name: Dinesh C.S. Bisht
Author-X-Name-First: Dinesh C.S.
Author-X-Name-Last: Bisht
Author-Name: Prakash Chandra Mathpal
Author-X-Name-First: Prakash Chandra
Author-X-Name-Last: Mathpal
Title: Particle swarm optimised fuzzy method for prediction of water table elevation fluctuation
Abstract:
Particle swarm optimisation (PSO) is a population-based powerful evolutionary computational technique inspired by social behaviour simulation of bird flocking and fish schooling. PSO has been applied successfully to a wide range of applications like scheduling, artificial neural networks (ANN) training, control strategy determination and ingredient mix optimisation. Fuzzy logic can easily cope up with vagueness and uncertainty in time series data. This has been applied for prediction of water table elevation, in our earlier work and results are quite promising. But, the optimisation of length of fuzzy intervals was a big constraint for researchers. In this research paper, the optimal length of fuzzy intervals in the universe of discourse is been selected using particle swarm optimisation. The results obtained after applying this combined approach to prediction of water table elevation are better than the previous method.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 99-110
Issue: 2
Volume: 10
Year: 2018
Keywords: fuzzy logic; particle swarm optimisation; PSO; mean square error; water table; forecasting.
File-URL: http://www.inderscience.com/link.php?id=92444
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:2:p:99-110

Template-Type: ReDIF-Article 1.0
Author-Name: Malaya Dutta Borah
Author-X-Name-First: Malaya Dutta
Author-X-Name-Last: Borah
Author-Name: Rajni Jindal
Author-X-Name-First: Rajni
Author-X-Name-Last: Jindal
Title: An approach for high utility pattern mining
Abstract:
Mining high utility pattern has become prominent as it provides semantic significance (utility/weighted patterns) associated with items in a transaction. Data analysis and respective strategies for mining high utility patterns is important in real world scenarios. Recent researches focused on high utility pattern mining using tree-based data structure which suffers greater computation time, since they generate multiple tree branches. To cope up with these problems, this work proposes a novel binary tree-based data structure with average maximum utility (AvgMU) and mining algorithm to mine high utility patterns from incremental data which reduces tree constructions and computation time. The proposed algorithms are implemented using synthetic, real datasets and compared with state-of-the-art tree-based algorithms. Experimental results show that the proposed work has better performance in terms of running time, scalability and memory consumption than the other algorithms compared in this research work.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 124-152
Issue: 2
Volume: 10
Year: 2018
Keywords: high utility pattern mining; frequent pattern mining; tree-based data structure; incremental mining; data analysis; average maximum utility; AvgMU.
File-URL: http://www.inderscience.com/link.php?id=92445
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:2:p:124-152

Template-Type: ReDIF-Article 1.0
Author-Name: Manasi Vinayak Harshe
Author-X-Name-First: Manasi Vinayak
Author-X-Name-Last: Harshe
Author-Name: Rajesh H. Kulkarni
Author-X-Name-First: Rajesh H.
Author-X-Name-Last: Kulkarni
Title: Outlier detection using weighted holoentropy with hyperbolic tangent function
Abstract:
Numerous research works has been carried out in the literature to detect the outlier's a.k.a anomalies. Outlier detection is considered as a pre-processing step for locating those objects in a dataset that do not conform to well-defined notions of expected behaviour. In the proposed method, logistic sigmoid function related to hyperbolic tangent will be used as weightage function for finding the outlier data point(s). It can distribute the outlier data points effectively as compared with the reverse sigmoid function. The method is implemented with four phases. In the first phase, data is read out and dynamic entropy is calculated. In the second phase, probability and dynamic entropy computations using logistic sigmoid function related to hyperbolic tangent are performed. In the third phase, dynamic entropies are sorted and top N point is selected as outlier data point(s) and finally, the accuracy for correct outliers is computed for the proposed method.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 182-203
Issue: 2
Volume: 10
Year: 2018
Keywords: outlier; holoentropy; weightage function.
File-URL: http://www.inderscience.com/link.php?id=92446
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:2:p:182-203

Template-Type: ReDIF-Article 1.0
Author-Name: Narges Sadat Bathaeian
Author-X-Name-First: Narges Sadat
Author-X-Name-Last: Bathaeian
Title: Using imputation algorithms when missing values appear in the test data in contrast with the training data
Abstract:
Real datasets suffer from the problem of missing data. Imputation is a common solution for this problem. Most of research works perform imputation algorithms to training data. Therefore, the output variable of samples might influence the imputation model. This paper aims to compare different imputation algorithms when they are applied to test data and training data. In this research, first, the relations between output variable and different imputation algorithms are investigated. Then six different types of imputation algorithms are applied to both training data and test data. Chosen datasets are globally available, and cover both classification and regression tasks. Also missing values are injected artificially to them. The results showed that performance of all algorithms will reduce in the case of elimination of output variable. Particularly, decline in algorithm, which uses k nearest neighbour for imputation in the classification datasets is not ignorable. Nevertheless, algorithms that are based on random forests have less decline and show better results compared with other five types of algorithms.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 111-123
Issue: 2
Volume: 10
Year: 2018
Keywords: missing values; imputation algorithms; regression; kNN; MICE; random forest; tree; EM.
File-URL: http://www.inderscience.com/link.php?id=92447
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:2:p:111-123

Template-Type: ReDIF-Article 1.0
Author-Name: Abolfazl Kazemi
Author-X-Name-First: Abolfazl
Author-X-Name-Last: Kazemi
Author-Name: Ghazaleh Khodabandehlouie
Author-X-Name-First: Ghazaleh
Author-X-Name-Last: Khodabandehlouie
Title: A new initialisation method for k-means algorithm in the clustering problem: data analysis
Abstract:
Clustering is one of the most important tasks in exploratory data analysis. One of the simplest and the most widely used clustering algorithm is K-means which was proposed in 1955. K-means algorithm is conceptually simple and easy to implement. This is evidenced by hundreds of publications over the last 50 years that extend k-means in various ways. Unfortunately, because of its nature, this algorithm is very sensitive to the initial placement of the cluster centres. In order to address this problem, many initialisation methods (IMs) have been proposed. In this thesis, we first provide a historical overview of these methods. Then we present two new non-random initialisation methods for k-means algorithm. Finally, we analyse the experimental results using real datasets and then the performance of IMs is evaluated by TOPSIS multi-criteria decision-making method. Finally, we prove that not only famous k-means IMs often have poor performance but also there are in fact strong alternative approaches.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 291-304
Issue: 3
Volume: 10
Year: 2018
Keywords: clustering; K-means algorithm; cluster centre initialisation; sum of squared error criterion; data analysis.
File-URL: http://www.inderscience.com/link.php?id=94127
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:291-304

Template-Type: ReDIF-Article 1.0
Author-Name: Anusuya Kirubakaran
Author-X-Name-First: Anusuya
Author-X-Name-Last: Kirubakaran
Author-Name: M. Aramudhan
Author-X-Name-First: M.
Author-X-Name-Last: Aramudhan
Title: A watchdog approach &#45; name-matching algorithm for big data risk intelligence
Abstract:
Even though modern world is ruled by data and preventive measures are in place to keep the data quality higher, risk intelligence teams are challenged for one of the risk analysis task aimed at record linkages on heterogeneous data from multiple data sources due higher ratio of non-standard and poor quality data present in big data systems caused by variety of data format across regions, data platforms, data storage systems, data migration, etc. To keep these record linkages in mind, in this paper, we try to address the complications in name matching process irrespective of spelling, structure and phonetic variations. Success of name matching is achieved when the algorithm is capable of handling names with discrepancies due to naming conventions, cross language translation, operating system transformation, data migration, batch feeds, typos and other external factors. In this paper, we have discussed the varieties of name representation in data source and the methods to parse and find the maximum probabilities of name match comparable to watchdog security with high accuracy as well as the percentage of false negative rate being reduced. The proposed methods can be applied to financial sector's risk intelligence analysis like know your customer (KYC), anti-money laundering (AML), customer due diligence (CDD), anti-terrorism, watchlist screening and fraud detection.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 273-290
Issue: 3
Volume: 10
Year: 2018
Keywords: hybrid name matching; string similarity measure; data matching; risk intelligence.
File-URL: http://www.inderscience.com/link.php?id=94128
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:273-290

Template-Type: ReDIF-Article 1.0
Author-Name: Hongying Sun
Author-X-Name-First: Hongying
Author-X-Name-Last: Sun
Author-Name: Yu Tian
Author-X-Name-First: Yu
Author-X-Name-Last: Tian
Title: Using improved genetic algorithm under uncertain circumstance of site selection of O2O customer returns
Abstract:
Online-to-offline (O2O) e-commerce supports online purchase and offline servicing. In recent years, with the growth of online shopping in China, O2O has become a new popular mode of e-commerce appliance. Buying online and returning offline are becoming a dominant shopping mode. The returns of customer should be collected to be treated in a more cost-efficient manner. To this end, this paper aims to propose an integer programming model to minimise the cost in construction couple with operating charges by optimising the sites of reverse logistics with the customer returns. For lowering storage costs, physical stores and their geographical sites should be far away from the residential area. In addition, this paper designs an improved genetic algorithm for solving two-stage heredity under random circumstance in that this model builds up multilayer reverse logistics network for recycling customer returns. Both the simulation and numerical examples prove the effectiveness and feasibility of this improved genetic algorithm.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 241-256
Issue: 3
Volume: 10
Year: 2018
Keywords: reverse logistics; site selection; improved genetic algorithm; O2O e-commerce.
File-URL: http://www.inderscience.com/link.php?id=94129
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:241-256

Template-Type: ReDIF-Article 1.0
Author-Name: J.V.N. Lakshmi
Author-X-Name-First: J.V.N.
Author-X-Name-Last: Lakshmi
Title: Data analysis on big data: improving the map and shuffle phases in Hadoop Map Reduce
Abstract:
The data management has become a challenging issue for network centric applications which need to process large amount of datasets. System requires advanced tools to analyse these datasets. As an efficient parallel computing programming model Map Reduce and Hadoop are used for large-scale data analysis. However, Map Reduce still suffers with performance problems Map Reduce uses a shuffle phase individual shuffle service component with efficient I/O policy. The map phase requires an improvement in its performance as this phase's output acts as an input to the next phase. Its result reveals the efficiency, so map phase needs some intermediate check points which regularly monitor all the splits generated by intermediate phases. This acts as a barrier for effective resource utilisation. This paper implements shuffle as a service component to decrease the overall execution time of jobs, monitor map phase by skew handling and increase resource utilisation in a cluster.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 305-316
Issue: 3
Volume: 10
Year: 2018
Keywords: Map Reduce; Hadoop; shuffle; big data; data analytics; Hadoop distributed file system; HDFS; rack awareness; stragglers; light weight processing; OLAP; OLTP.
File-URL: http://www.inderscience.com/link.php?id=94130
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:305-316

Template-Type: ReDIF-Article 1.0
Author-Name: Naveen Dahiya
Author-X-Name-First: Naveen
Author-X-Name-Last: Dahiya
Author-Name: Vishal Bhatnagar
Author-X-Name-First: Vishal
Author-X-Name-Last: Bhatnagar
Author-Name: Manjeet Singh
Author-X-Name-First: Manjeet
Author-X-Name-Last: Singh
Title: A fuzzy-based automatic prediction system for quality evaluation of conceptual data warehouse models
Abstract:
In the paper, we present an automatic system based on fuzzy logic to predict the understanding time of conceptual data warehouse models. The system takes as input the values of quality metrics for a model and gives understanding time as output. The metrics used for quality evaluation have been proposed and validated by Manuel Serrano. The results of automatic system are compared with the results of actual data collection made manually. The predicted results are highly significant to prove the validity and efficiency of the designed automatic system.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 317-333
Issue: 3
Volume: 10
Year: 2018
Keywords: fuzzy logic; quality metrics; data warehouse; understanding time; conceptual models.
File-URL: http://www.inderscience.com/link.php?id=94131
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:317-333

Template-Type: ReDIF-Article 1.0
Author-Name: C. Seelammal
Author-X-Name-First: C.
Author-X-Name-Last: Seelammal
Author-Name: K. Vimala Devi
Author-X-Name-First: K. Vimala
Author-X-Name-Last: Devi
Title: Multi-criteria decision support for feature selection in network anomaly detection system
Abstract:
The growth of computer networks from LAN to cloud, virtualisation and mobility always keeps intrusion detection system (IDS) as a critical component in the field of network security infrastructure. Tremendous growth and usage of internet raises concerns about how to protect and communicate the digital information in a safe manner. The market for security solutions for next-generation is rapidly evolving and constantly changing to accommodate today's threat. Many intrusion detection techniques, methods and algorithms are implemented to detect these novel attacks. But there's no clear feature set, uncertainty bounds established as a baseline for dynamic environments. The main objective of this paper is to determine and provide the best feature selection for next generation dynamic environments using multi-criteria decision making, decision tree learning with emphasis on optimisation (contingency of weight allocation) and handling large datasets.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 334-350
Issue: 3
Volume: 10
Year: 2018
Keywords: intrusion detection; multi-criteria; classification; anomaly; data mining; feature selection; machine learning.
File-URL: http://www.inderscience.com/link.php?id=94132
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:334-350

Template-Type: ReDIF-Article 1.0
Author-Name: K.G. Srinivasa
Author-X-Name-First: K.G.
Author-X-Name-Last: Srinivasa
Author-Name: R. Sharath
Author-X-Name-First: R.
Author-X-Name-Last: Sharath
Author-Name: S. Krishna Chaitanya
Author-X-Name-First: S. Krishna
Author-X-Name-Last: Chaitanya
Author-Name: K.N. Nirupam
Author-X-Name-First: K.N.
Author-X-Name-Last: Nirupam
Author-Name: B.J. Sowmya
Author-X-Name-First: B.J.
Author-X-Name-Last: Sowmya
Title: Data analytics on census data to predict the income and economic hierarchy
Abstract:
The US Census Bureau conducts the American Community Survey generating a massive dataset with millions of data points. The rich dataset contains detailed information of approximately 3.5 million households about who they are and how they live including ancestry, education, work, transportation, internet use and residency. This enormous data encourages the need to know more about the population and to derive insight. The ever demanding requirement in exposing the subtlety in case of economic issues is the motivation behind to construe meaningful conclusions in income domain. Hence the focus is to concentrate on bringing out unique insights into the financial status of the people living in the country. These conclusions delineated might aid in delivering wiser decisions in regard to economic growth of the country. Using relevant attributes, demographic graphs are plotted, aiding the conclusions drawn. Also classifications into various economic classes are done using well known classifiers.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 223-240
Issue: 3
Volume: 10
Year: 2018
Keywords: index terms-demographic graphs; Benford's law; income; K-means clustering; Naive Bayes classifier.
File-URL: http://www.inderscience.com/link.php?id=94133
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:223-240

Template-Type: ReDIF-Article 1.0
Author-Name: Brojo Kishore Mishra
Author-X-Name-First: Brojo Kishore
Author-X-Name-Last: Mishra
Author-Name: Abhaya Kumar Sahoo
Author-X-Name-First: Abhaya Kumar
Author-X-Name-Last: Sahoo
Author-Name: Chittaranjan Pradhan
Author-X-Name-First: Chittaranjan
Author-X-Name-Last: Pradhan
Title: GPU based reduce approach for computing faculty performance evaluation process using classification technique in opinion mining
Abstract:
Today's competitive market, education system plays a main role in creating better students. To create better students, main focus is given to the quality of teaching. That quality can be achieved due to better coordination among faculty and student. To get better quality of teaching, faculty performance should be measured by feedback analysis. Performance of faculty should be evaluated so that we can enhance our educational quality. Here we used opinion mining by which large amount of data can be available in the form of reviews, opinions, feedbacks, remarks, observations, comments, explanations and clarifications. So, we collected feedback about faculty from students through feedback form. To measure the performance of faculty, we used a classification technique by using opinion mining. We also used this technique on graphics processing unit (GPU) architecture using compute unified device architecture using C (CUDA-C) programming model as well as map reduce programming model to evaluate performance of a faculty. Then we compared between GPU with reduce approach and map reduce approach for getting faster result. This paper uses GPU architecture for CUDA-C programming and Hadoop framework tool for map reduce programming for faster computation of faculty performance evaluation.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 208-222
Issue: 3
Volume: 10
Year: 2018
Keywords: classification; compute unified device architecture using C; CUDA-C; education system; feedback; graphics processing unit; GPU; Hadoop; map reduce; opinion mining.
File-URL: http://www.inderscience.com/link.php?id=94134
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:208-222

Template-Type: ReDIF-Article 1.0
Author-Name: Paweł Bartoszczuk
Author-X-Name-First: Paweł
Author-X-Name-Last: Bartoszczuk
Title: The risk in eco-innovation introduction at the enterprises
Abstract:
The goal of this paper is to present eco-innovation implementation risk at enterprises. Eco-innovation is a rather modern term and can be a method to solve emerging environmental problems as consequences of economic growth. As for any innovation, eco-innovations have several types and can result therefore in a new or significantly improved product (good or service), process, a new marketing or organisational methods. Eco-innovation should be seen as an integral part of innovation efforts across all the economy sectors. European countries observe many barriers for implementation of eco-innovation, mainly associated with the high investments risk and limited interest.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 257-272
Issue: 3
Volume: 10
Year: 2018
Keywords: eco-innovation; environment; ecological risk; economy; enterprise; environment.
File-URL: http://www.inderscience.com/link.php?id=94135
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:3:p:257-272

Template-Type: ReDIF-Article 1.0
Author-Name: Panagiotis Tsarouhas
Author-X-Name-First: Panagiotis
Author-X-Name-Last: Tsarouhas
Title: Reliability, availability and maintainability &#45; RAM analysis of cake production lines: a case study
Abstract:
In this study reliability, availability and maintainability analysis was conducted for cake production line by applying statistical techniques on failure data. Data collection from the line and their analysis were valid over a long time of seventeen months. The reliability, availability and maintainability analysis of the failure data were determined to provide an estimate of the current operation management and improve the line efficiency. It was found out that: (a) the availability of the cake production line was 95.44% and dropped to 93.15% because the equipment's failures cause an additional production gap in the line (b) the two machines with the most frequent failures and lowest availabilities are the forming/dosing machine, and the wrapping machine (c) the worst maintainability occurs at cooling tower, and at oven (d) the identification of the best distributions for the failure data and their parameters of the cake production line were made.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 381-405
Issue: 4
Volume: 10
Year: 2018
Keywords: cake production line; reliability; maintainability; performance evaluation; quality; field failure; repair data.
File-URL: http://www.inderscience.com/link.php?id=95214
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:4:p:381-405

Template-Type: ReDIF-Article 1.0
Author-Name: Bayu Adhi Tama
Author-X-Name-First: Bayu Adhi
Author-X-Name-Last: Tama
Author-Name: Kyung-Hyune Rhee
Author-X-Name-First: Kyung-Hyune
Author-X-Name-Last: Rhee
Title: A comparative study of classifier ensembles for detecting inactive learner in university
Abstract:
Prediction of undesirable learner's behaviours is an important issue in the distance learning system as well as the conventional university. This paper is devoted to benchmark ensemble of weak classifiers (decision tree, random forest, logistic regression, and CART) against single classifier models to predict inactive student. Two real-world datasets were obtained from a distance learning system and a computer science college in Indonesia. To evaluate the performance of the classifier ensembles, several performance metrics such as average accuracy, precision, recall, fall-out, F1, and area under ROC curve (AUC) value were involved. Our experiments reveal that classifier ensembles outperform single classifier in all evaluation metrics. This study contributes to the literature on making a comparative study of ensemble learners in the purview of educational data mining.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 351-368
Issue: 4
Volume: 10
Year: 2018
Keywords: classifier ensemble; educational data mining; EDM; distance learning; benchmark.
File-URL: http://www.inderscience.com/link.php?id=95215
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:4:p:351-368

Template-Type: ReDIF-Article 1.0
Author-Name: S. Lovelyn Rose
Author-X-Name-First: S. Lovelyn
Author-X-Name-Last: Rose
Author-Name: R. Venkatesan
Author-X-Name-First: R.
Author-X-Name-Last: Venkatesan
Author-Name: Girish Pasupathy
Author-X-Name-First: Girish
Author-X-Name-Last: Pasupathy
Author-Name: P. Swaradh
Author-X-Name-First: P.
Author-X-Name-Last: Swaradh
Title: A lexicon-based term weighting scheme for emotion identification of tweets
Abstract:
Detecting emotions in tweets is a huge challenge due to its limited 140 characters and extensive use of twitter language with evolving terms and slangs. This paper uses various preprocessing techniques, forms a feature vector using lexicons and classifies tweets into Paul Ekman's basic emotions namely, happy, sad, anger, fear, disgust and surprise using machine learning. Preprocessing is done using the dictionaries available for emoticons, interjections and slangs and by handling punctuation marks and hashtags. The feature vector is created by combining words from the NRC Emotion lexicon, WordNet-Affect and online thesaurus. Feature vectors are assigned weight based on the presence of punctuations and negations in the feature and the tweets are classified using naive Bayes, SVM and random forests. The use of lexicon features and a novel weighting scheme has produced a considerable gain in terms of accuracy with random forest achieving maximum accuracy of almost 73%.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 369-380
Issue: 4
Volume: 10
Year: 2018
Keywords: emotion classification; Twitter; preprocessing; feature selection; dictionaries; lexicons; term weighting; random forest; SVM; naive Bayes.
File-URL: http://www.inderscience.com/link.php?id=95216
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:4:p:369-380

Template-Type: ReDIF-Article 1.0
Author-Name: Sergei P. Sidorov
Author-X-Name-First: Sergei P.
Author-X-Name-Last: Sidorov
Author-Name: Alexey R. Faizliev
Author-X-Name-First: Alexey R.
Author-X-Name-Last: Faizliev
Author-Name: Vladimir A. Balash
Author-X-Name-First: Vladimir A.
Author-X-Name-Last: Balash
Title: A long memory property of economic and financial news flows
Abstract:
One of the tools for examining the processes and time series with self-similarity is the long-range correlation exponent (the Hurst exponent). Many methods have been developed for estimating the long-range correlation exponent using experimental time series over the last years. In this paper we estimate the Hurst exponent parameter obtained by different methods using news analytics time series. We exploit the most commonly used methods for estimating the Hurst exponents: fluctuation analysis, the detrended fluctuation analysis and the detrending moving average analysis. Following some previous studies, empirical results show the presence of long-range correlations for the time series of news intensity data. In particular, the paper shows that the behaviour of long range dependence for time series of news intensity in the recent period from 1 January 2015 to 22 September 2015 did not change in comparison to the period from 1 September 2010 to 29 October 2010. Moreover, the change of the news analytics provider and the consideration of more recent data did not significantly affect estimates of the Hurst exponent. The results show that the self-similarity property is a stable characteristic of the news flow of information which serves the financial industry and stock markets.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 406-420
Issue: 4
Volume: 10
Year: 2018
Keywords: long-range correlation; detranded fluctuation analysis; time series; auto-correlation.
File-URL: http://www.inderscience.com/link.php?id=95218
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:10:y:2018:i:4:p:406-420