Template-Type: ReDIF-Article 1.0
Author-Name: Simone Fiori
Author-X-Name-First: Simone
Author-X-Name-Last: Fiori
Title: A comprehensive comparison of algorithms for the statistical modelling of non-monotone relationships via isotonic regression of transformed data
Abstract:
The paper treats the problem of nonlinear, non-monotonic regression of bivariate datasets by means of a statistical regression method known from the literature. In particular, the present paper introduces two new regression methods and illustrates the results of a comprehensive comparison of the performances of the best two previous methods, the two new methods introduced here and as much as ten standard regression methods known from the specialised literature. The comparison is performed over nine different datasets, ranging from electrocardiogram data to text analysis data, by means of four figures of merit, that include regression precision as well as runtime.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 29-57
Issue: 1
Volume: 11
Year: 2019
Keywords: non-monotone nonlinear data-fitting; data transformation; isotonic regression; statistical regression.
File-URL: http://www.inderscience.com/link.php?id=96617
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:1:p:29-57

Template-Type: ReDIF-Article 1.0
Author-Name: Ned Kock
Author-X-Name-First: Ned
Author-X-Name-Last: Kock
Title: Factor-based structural equation modelling: going beyond PLS and composites
Abstract:
Partial least squares (PLS) methods offer many advantages for path modelling, such as fast convergence to solutions and relaxed requirements in terms of sample size and multivariate normality. However, they do not deal with factors, but with composites. As a result, they typically underestimate path coefficients and overestimate loadings. Given these, it is difficult to fully justify their use for confirmatory factor analyses or factor-based structural equation modelling (SEM). We addressed this problem through the development of a new method that generates estimates of the true composites and factors, potentially placing researchers in a position where they can obtain consistent estimates of a wide range of model parameters in SEM analyses. A Monte Carlo experiment suggests that this new method represents a solid step in the direction of achieving this ambitious goal.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 1-28
Issue: 1
Volume: 11
Year: 2019
Keywords: partial least squares; PLS; structural equation modelling; measurement error; path bias; variation sharing; Monte Carlo simulation.
File-URL: http://www.inderscience.com/link.php?id=96618
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:1:p:1-28

Template-Type: ReDIF-Article 1.0
Author-Name: Rita Yi Man Li
Author-X-Name-First: Rita Yi Man
Author-X-Name-Last: Li
Author-Name: Edward Chi Ho Tang
Author-X-Name-First: Edward Chi Ho
Author-X-Name-Last: Tang
Author-Name: Tat Ho Leung
Author-X-Name-First: Tat Ho
Author-X-Name-Last: Leung
Title: Democracy and economic growth
Abstract:
Many nations consider democracy to be an important social value. Nevertheless, does it mean that countries with more democracy are often wealthier? What are the relationships between economic growth and democracy? This research includes 167 countries to study the issue. We employ the data of the democracy index, corruption perception index, inflation, population, number of internet users, balance of trade, foreign direct investment, etc. We have also included sub-indices such as the electoral process and pluralism, functioning of government, political participation, culture, and civil liberties. An innovative part of the paper is how the corruption perception index has been included in our analysis. Besides, principal component analysis is applied to study the relationship between democracy and economic growth. We conclude that it takes democracy a very long time to affect the macro-economy. The fast pace of change in democracy even harms the macro-economy. If the economy reaches a well-developed stage, the economy will gradually transform into a democratic city automatically in the absence of any external pressure.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 58-80
Issue: 1
Volume: 11
Year: 2019
Keywords: democracy; economic growth; corruption perception index; liberalisation.
File-URL: http://www.inderscience.com/link.php?id=96622
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:1:p:58-80

Template-Type: ReDIF-Article 1.0
Author-Name: T. Sheik Yousuf
Author-X-Name-First: T. Sheik
Author-X-Name-Last: Yousuf
Author-Name: M. Indra Devi
Author-X-Name-First: M. Indra
Author-X-Name-Last: Devi
Title: A novel single scan distributed pattern mining algorithm for frequent pattern identification
Abstract:
In data mining, the extraction of frequent patterns from large databases is still a challenging and difficult task due to the various drawbacks such as, high response time, communication cost to alleviates such issues, a new algorithm namely single scan distributed pattern mining algorithm (SSDPMA) is proposed in this paper for frequent mining. The frequent patterns are extracted in a single scan of the database. Then, it is split into multiple files, which will be shared to multiple virtual machines (VMs) to store and compute the weight for the distinct records. Then, the support, confidence and threshold values are estimated. If the limit is greater than the given data, the frequent data are mined by using the proposed SSDPMA algorithm. The experimental results evaluate the performance of the proposed system in terms of response time, message size, execution time, run time and memory usage.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 81-100
Issue: 1
Volume: 11
Year: 2019
Keywords: data mining; frequent pattern mining; single scan distributed pattern mining algorithm; SSDPMA; virtual machine; VM; file split algorithm; item sets; infrequent items; connect 4 dataset.
File-URL: http://www.inderscience.com/link.php?id=96623
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:1:p:81-100

Template-Type: ReDIF-Article 1.0
Author-Name: Tawfik Thelaidjia
Author-X-Name-First: Tawfik
Author-X-Name-Last: Thelaidjia
Author-Name: Abdelkrim Moussaoui
Author-X-Name-First: Abdelkrim
Author-X-Name-Last: Moussaoui
Author-Name: Salah Chenikher
Author-X-Name-First: Salah
Author-X-Name-Last: Chenikher
Title: An effective feature selection method based on maximum class separability for fault diagnosis of ball bearing
Abstract:
The paper deals with the development of a novel feature selection approach for bearing fault diagnosis to overcome drawbacks of the distance evaluation technique (DET); one of the well-established feature selection approaches. Its drawbacks are the influence of its effectiveness by the noise and the selection of salient features regardless of the classification system. To overcome these shortcomings, an optimal discrete wavelet transform (DWT) is firstly used to decompose the bearing vibration signal at different decomposition depths to enhance the signal to noise ratio. Then, a combination of DET with binary particle swarm optimisation (BPSO) algorithm and a criterion based on scatter matrices employed as an objective function are suggested to improve the classification performances and to reduce the computational time. Finally, support vector machine is utilised to automate the identification of different bearing conditions. From the obtained results, the effectiveness of the suggested method is proven.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 115-132
Issue: 2
Volume: 11
Year: 2019
Keywords: ball bearing; binary particle swarm optimisation; BPSO; discrete wavelet transform; DWT; data analysis; distance evaluation technique; DET; fault diagnosis; feature selection; scatter matrices.
File-URL: http://www.inderscience.com/link.php?id=98817
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:2:p:115-132

Template-Type: ReDIF-Article 1.0
Author-Name: Dharmendra Singh Rajput
Author-X-Name-First: Dharmendra Singh
Author-X-Name-Last: Rajput
Title: Review on recent developments in frequent itemset based document clustering, its research trends and applications
Abstract:
The document data is growing at an exponential rate. It is heterogeneous, dynamic and highly unstructured in nature. These characteristics of document data pose new challenges and opportunities for the development of various models and approaches for documents clustering. Different methods adopted for the development of these models. But these techniques have their advantages and disadvantages. The primary focus of the study is to the analysis of existing methods and approaches for document clustering based on frequent itemsets. Subsequently, this research direction facilitates the exploration of the emerging trends for each extension with applications. In this paper, more than 90 recent (published after 1990) research papers are summarised that are published in various reputed journals like IEEE Transaction, ScienceDirect, Springer-link, ACM and few fundamental authoritative articles.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 176-195
Issue: 2
Volume: 11
Year: 2019
Keywords: document clustering; association rule mining; unstructured data; uncertain data.
File-URL: http://www.inderscience.com/link.php?id=98818
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:2:p:176-195

Template-Type: ReDIF-Article 1.0
Author-Name: Marziye Mirzadeh Tahroodi
Author-X-Name-First: Marziye Mirzadeh
Author-X-Name-Last: Tahroodi
Author-Name: Ali Payan
Author-X-Name-First: Ali
Author-X-Name-Last: Payan
Title: A method to rank the efficient units based on cross efficiency matrix without involving the zero weights
Abstract:
One of the basic objections of the previous models of cross efficiency (CE) is the possibility for the weights to equal zero. This case takes place for the inputs and the outputs in the efficient responses in CE models. Therefore, the input and the output weights which equal zero do not play a role in computing the score of the CE. In this paper, to overcome this problem, an idea to prevent the optimal weights to equal zero in the CE method is offered. This new method can be expanded to all CE models. Based on the offered method, a zero-one mixed linear programming problem is proposed to obtain a set of non-zero weights among the optimal solutions of the preliminary CE model. Following, the zero-one mixed linear programming problem is changed into an equivalent linear program. Then, according to a consistent CE matrix the efficient units are ranked. In order to explain the model and indicate its advantage, an example is given.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 101-114
Issue: 2
Volume: 11
Year: 2019
Keywords: ranking; cross efficiency; CE; zero weights; preference matrix; fuzzy preference relation; zero-one mixed linear programming problem.
File-URL: http://www.inderscience.com/link.php?id=98819
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:2:p:101-114

Template-Type: ReDIF-Article 1.0
Author-Name: Fahmi Bargui
Author-X-Name-First: Fahmi
Author-X-Name-Last: Bargui
Author-Name: Hanêne Ben-Abdallah
Author-X-Name-First: Hanêne
Author-X-Name-Last: Ben-Abdallah
Author-Name: Jamel Feki
Author-X-Name-First: Jamel
Author-X-Name-Last: Feki
Title: Enhancing the involvement of decision makers in data mart design
Abstract:
The design phase of a data warehousing project remains difficult for both decision makers and requirements analysts. In this paper, we tackle this difficulty through two contributions. First, we propose a natural language based and goal-oriented template for requirements specification that includes all concepts of the decision-making process. The use of familiar concepts and natural language makes our template more accessible and helps decision makers in validating the specified requirements, which avoids producing data mart that does not meet their needs. Secondly, we propose a decision-making ontology that provides for a systematic decomposition of decision-making goals, which allows new requirements to emerge. This automatic requirements elicitation helps analysts to overcome their lack of domain knowledge, which avoids producing erroneous requirements.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 148-175
Issue: 2
Volume: 11
Year: 2019
Keywords: decision support system; data warehouse; data mart; requirements engineering; multidimensional modelling; goal-oriented requirements engineering; automatic reasoning; ontology.
File-URL: http://www.inderscience.com/link.php?id=98820
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:2:p:148-175

Template-Type: ReDIF-Article 1.0
Author-Name: Naoual El Aboudi
Author-X-Name-First: Naoual El
Author-X-Name-Last: Aboudi
Author-Name: Laila Benhlima
Author-X-Name-First: Laila
Author-X-Name-Last: Benhlima
Title: A new feature subset selection model based on migrating birds optimisation
Abstract:
Feature selection represents a fundamental preprocessing phase in machine learning as well as data mining applications. It reduces the dimensionality of feature space by dismissing irrelevant and redundant features, which leads to better classification accuracy and less computational cost. This paper presents a new wrapper feature subset selection model based on a recently designed optimisation technique called migrating birds optimisation (MBO). Initialisation issue regarding MBO is explored to study its implications on the model behaviour by experimenting different initialisation strategies. A neighbourhood based on information gain was designed to improve the search effectiveness. The performance of the proposed model named MBO-FS is compared with some state-of-the-art methods regarding the task of feature selection on 11 UCI datasets. Simulation results show that MBO-FS method achieves promising classification accuracy using a smaller feature set.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 133-147
Issue: 2
Volume: 11
Year: 2019
Keywords: feature selection; migrating birds optimisation; MBO; classification.
File-URL: http://www.inderscience.com/link.php?id=98821
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:2:p:133-147

Template-Type: ReDIF-Article 1.0
Author-Name: Asmaa Benghabrit
Author-X-Name-First: Asmaa
Author-X-Name-Last: Benghabrit
Author-Name: Brahim Ouhbi
Author-X-Name-First: Brahim
Author-X-Name-Last: Ouhbi
Author-Name: Bouchra Frikh
Author-X-Name-First: Bouchra
Author-X-Name-Last: Frikh
Author-Name: El Moukhtar Zemmouri
Author-X-Name-First: El Moukhtar
Author-X-Name-Last: Zemmouri
Author-Name: Hicham Behja
Author-X-Name-First: Hicham
Author-X-Name-Last: Behja
Title: Feature selection methods for document clustering: a comparative study and a hybrid solution
Abstract:
The web proliferation makes the exploration and the use of the huge amount of available unstructured text documents challenged, which drives the need of document clustering. Hence, improving the performances of this mechanism by using feature selection seems worth investigation. Therefore, this paper proposes an efficient way to highly benefit from feature selection for document clustering. We first present a review and comparative studies of feature selection methods in order to extract efficient ones. Then we propose a sequential and hybrid combination modes of statistical and semantic techniques in order to benefit from crucial information that each of them provides for document clustering. Extensive experiments prove the benefit of the proposed combination approaches. The performance of document clustering is highest when the measures based on Chi-square statistic and the mutual information are linearly combined. Doing so, it avoids the unwanted correlation that the sequential approach creates between the two treatments.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 246-272
Issue: 3
Volume: 11
Year: 2019
Keywords: document clustering; feature selection; statistical and semantic data analysis; chi-square statistic; mutual information; k-means algorithm; comparative study; hybrid solution.
File-URL: http://www.inderscience.com/link.php?id=101154
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:3:p:246-272

Template-Type: ReDIF-Article 1.0
Author-Name: Subramanian Kannimuthu
Author-X-Name-First: Subramanian
Author-X-Name-Last: Kannimuthu
Author-Name: Kandhasamy Premalatha
Author-X-Name-First: Kandhasamy
Author-X-Name-Last: Premalatha
Title: Stellar mass black hole optimisation for utility mining
Abstract:
Major challenges in mining high utility itemsets from the transaction databases requires exponential search space and database-dependent minimum utility threshold. The search space is very large because of the large number of distinct items and size of the database. Data analysts need to specifying appropriate minimum utility thresholds for their data mining tasks though they may have no knowledge pertaining to their databases. To get rid of these problems, Stellar mass black hole optimisation (SBO) method is proposed to mine Top-K HUIs from the transaction database without specifying minimum utility threshold. To know the performance of SBO, the experiment results are compared with GA.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 222-245
Issue: 3
Volume: 11
Year: 2019
Keywords: data mining; genetic algorithm; stellar mass black hole optimisation; SBO; high utility itemsets; utility mining.
File-URL: http://www.inderscience.com/link.php?id=101155
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:3:p:222-245

Template-Type: ReDIF-Article 1.0
Author-Name: R. Sivaraj
Author-X-Name-First: R.
Author-X-Name-Last: Sivaraj
Author-Name: R. DeviPriya
Author-X-Name-First: R.
Author-X-Name-Last: DeviPriya
Title: Memetic particle swarm optimisation for missing value imputation
Abstract:
Incomplete values in databases stand as a major concern for data analysts and many methods have been devised to handle them in different missing scenarios. Many researchers are increasingly using evolutionary algorithms for handling them. In this paper, a memetic algorithm based approach is proposed which integrates the principles of particle swarm optimisation and simulated annealing, a local search method. A novel initialisation strategy for PSO is also proposed in order to seed good particles into the population. Simulated annealing prevents PSO from premature convergence and helps it in reaching global optimum. PSO algorithm exhibits explorative behaviour and SA exhibits exploitative behaviour and serves as the right combination for memetic algorithm implementation. The proposed algorithm is implemented in different datasets to estimate the missing values and the imputation accuracy and the time taken for execution is found to be better than other standard methods.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 273-289
Issue: 3
Volume: 11
Year: 2019
Keywords: memetic algorithm; tournament selection; Bayesian probability; simulated annealing.
File-URL: http://www.inderscience.com/link.php?id=101156
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:3:p:273-289

Template-Type: ReDIF-Article 1.0
Author-Name: Safa Bettoumi
Author-X-Name-First: Safa
Author-X-Name-Last: Bettoumi
Author-Name: Chiraz Jlassi
Author-X-Name-First: Chiraz
Author-X-Name-Last: Jlassi
Author-Name: Naet Arous
Author-X-Name-First: Naet
Author-X-Name-Last: Arous
Title: A comparative study of unsupervised image clustering systems
Abstract:
The purpose of clustering algorithms is to give sense and extract value from large sets of structured and unstructured data. Thus, clustering is present in all science areas that use automatic learning. Therefore, we present in this paper a comparative study and an evaluation of different clustering methods proposed in the literature such as prototype based clustering, fuzzy and probabilistic clustering, hierarchical clustering and density based clustering. We present also an analysis of advantages and disadvantages of these clustering methods based essentially on experimentation. Extensive experiments are conducted on three real-world high dimensional datasets to evaluate the potential and the effectiveness of seven well-known methods in terms of accuracy, purity and normalised mutual information.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 197-221
Issue: 3
Volume: 11
Year: 2019
Keywords: unsupervised clustering; density based clustering; partitioning clustering; fuzzy and probabilistic clustering; hierarchical clustering.
File-URL: http://www.inderscience.com/link.php?id=101157
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:3:p:197-221

Template-Type: ReDIF-Article 1.0
Author-Name: Subhas A. Meti
Author-X-Name-First: Subhas A.
Author-X-Name-Last: Meti
Author-Name: V.G. Sangam
Author-X-Name-First: V.G.
Author-X-Name-Last: Sangam
Title: Enhanced auto associative neural network using feed forward neural network: an approach to improve performance of fault detection and analysis
Abstract:
Biosensors have played a significant role in many of present day's applications ranging from military applications to healthcare sectors. However, its practicality and robustness in its usage in real time scenario is still a matter of concern. Primarily issues such as prediction of sensor data, noise estimation and channel estimation and most importantly in fault detection and analysis. In this paper an enhancement is applied to the auto associative neural network (AANN) by considering the cascade feed forward propagation. The residual noise is also computed along with fault detection and analysis of the sensor data. An experimental result shows a significant reduction in the MSE as compared to conventional AANN. The regression based correlation coefficient has improved in the proposed method as compared to conventional AANN.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 291-309
Issue: 4
Volume: 11
Year: 2019
Keywords: WBAN; fault detection and analysis; feed forward neural network; enhanced AANN; residual noise.
File-URL: http://www.inderscience.com/link.php?id=103754
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:4:p:291-309

Template-Type: ReDIF-Article 1.0
Author-Name: P.M. Arunkumar
Author-X-Name-First: P.M.
Author-X-Name-Last: Arunkumar
Author-Name: S. Chandramathi
Author-X-Name-First: S.
Author-X-Name-Last: Chandramathi
Author-Name: S. Kannimuthu
Author-X-Name-First: S.
Author-X-Name-Last: Kannimuthu
Title: Sentiment analysis-based framework for assessing internet telemedicine videos
Abstract:
Telemedicine services through internet and mobile devices need effective medical video delivery systems. This work describes a novel framework to study the assessment of internet-based telemedicine videos using sentiment analysis. The dataset comprises more than 1,000 text comments of medical experts collected from various medical animation videos of Youtube repository. The proposed framework deploys machine learning classifiers such as Bayes net, KNN, C 4.5 decision tree, support vector machine (SVM) and SVM with particle swarm optimisation (SVM-PSO) to infer opinion mining outputs. The results portray that SVM-PSO classifier performs better in assessing the reviews of medical video content with more than 80% accuracy. The model's inference of precision and recall values using SVM-PSO algorithm shows 87.8% and 85.57% respectively and henceforth underlines its superiority over other classifiers. The concepts of sentiment analysis can be applied effectively to the web-based user comments of medical videos and the end results can be highly critical to enhance the reputation of telemedicine education across the globe.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 328-336
Issue: 4
Volume: 11
Year: 2019
Keywords: machine learning; telemedicine; medical videos; sentiment analysis; data analysis.
File-URL: http://www.inderscience.com/link.php?id=103755
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:4:p:328-336

Template-Type: ReDIF-Article 1.0
Author-Name: Richa Sharma
Author-X-Name-First: Richa
Author-X-Name-Last: Sharma
Author-Name: Shailendra Narayan Singh
Author-X-Name-First: Shailendra Narayan
Author-X-Name-Last: Singh
Author-Name: Sujata Khatri
Author-X-Name-First: Sujata
Author-X-Name-Last: Khatri
Title: Data mining classification techniques &#45; comparison for better accuracy in prediction of cardiovascular disease
Abstract:
Cardiovascular disease is a broad term which includes stroke or any disorder in the cardiovascular system that has the heart at its centre. This disease is a critical cause of mortality every year across the globe. Data mining utilises a variety of techniques and algorithms that could help to draw some interesting conclusions about cardiovascular disease. Data mining in healthcare can assist in predicting disease. This study aims to gain knowledge from a heart disease dataset and analyse several data mining classification techniques seeking improved accuracy and a lesser error rate in the results. The data set for the experiment is chosen from the UCI machine learning repository database. The dataset is analysed using two different data mining tools, i.e., WEKA and Tanagra. The analysis was done using the 10 fold cross validation technique. The results show that the Naive Bayes algorithm and the C-PLS algorithm outperform others with an accuracy of 83.71% and 84.44% respectively.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 356-373
Issue: 4
Volume: 11
Year: 2019
Keywords: data mining; classification techniques; machine learning tools; cardiovascular disease; KNN; Na&#239;ve Bayes; C-PLS; decision tree.
File-URL: http://www.inderscience.com/link.php?id=103756
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:4:p:356-373

Template-Type: ReDIF-Article 1.0
Author-Name: Hanen Bouali
Author-X-Name-First: Hanen
Author-X-Name-Last: Bouali
Author-Name: Jalel Akaichi
Author-X-Name-First: Jalel
Author-X-Name-Last: Akaichi
Author-Name: Ala Gaaloul
Author-X-Name-First: Ala
Author-X-Name-Last: Gaaloul
Title: Real-time data warehouse loading methodology and architecture: a healthcare use case
Abstract:
In the healthcare context, existing systems suffer from the lack of supporting heterogeneity and dynamism. Consequently, resulting from sensors, streaming data brought another dimension to data mining research. This is due to the fact that, in data streams, only a time window is available. Contrary to the traditional data sources, data streams present new characteristics as being continuous, high-volume, open-ended and concept drift. To analyse event streams, data warehouse seems to be the answer to this problematic. However, classical data warehouse does not incorporate the specificity of event streams that are spatial, temporal, semantic and real-time. For these reasons, we focus inhere on presenting the conceptual modelling, the architecture and loading methodology of the real-time data warehouse by defining a new dimensionality and stereotype for classical data warehouse. To prove the efficiency of our real-time data warehouse, we adapt the model to a medical unit pregnancy care case study which show promising results.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 310-327
Issue: 4
Volume: 11
Year: 2019
Keywords: data warehouse; data analysis; real-time; healthcare.
File-URL: http://www.inderscience.com/link.php?id=103757
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:4:p:310-327

Template-Type: ReDIF-Article 1.0
Author-Name: Poornima Mehta
Author-X-Name-First: Poornima
Author-X-Name-Last: Mehta
Author-Name: Satish Chandra
Author-X-Name-First: Satish
Author-X-Name-Last: Chandra
Title: Enhancement of SentiWordNet using contextual valence shifters
Abstract:
Sentence structure has a considerable impact on the sentiment polarity of a sentence. In the presence of contextual valence shifters like conjunctions, conditionals and intensifiers some parts of the sentence are more relevant to determine the sentence polarity. In this work we have used valence shifters in sentences to enhance the sentiment lexicon SentiWordNet in a given document set. They have also been used to improve the sentiment analysis at document level. In the near past, micro blogging services like Twitter have become an important data source for sentiment analysis. Tweets, being restricted to 140 characters have slangs, are grammatically incorrect, have spelling mistakes and have informal expressions. The method is aimed at noisy and unstructured data like tweets on which computationally intensive tools like dependency parsers are not very successful. Our proposed system works better on both noisy (Stanford and airlines datasets of Twitter) and structured (movie review) datasets.
Journal: Int. J. of Data Analysis Techniques and Strategies
Pages: 337-355
Issue: 4
Volume: 11
Year: 2019
Keywords: sentiment analysis; SentiWordNet; contextual valence shifters; micro-blogs; discourse; Twitter; Lexicon enhancement; SentiWordNet enhancement; sentence level polarity.
File-URL: http://www.inderscience.com/link.php?id=103758
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:injdan:v:11:y:2019:i:4:p:337-355