Template-Type: ReDIF-Article 1.0 Author-Name: Simone Fiori Author-X-Name-First: Simone Author-X-Name-Last: Fiori Title: A comprehensive comparison of algorithms for the statistical modelling of non-monotone relationships via isotonic regression of transformed data Abstract: The paper treats the problem of nonlinear, non-monotonic regression of bivariate datasets by means of a statistical regression method known from the literature. In particular, the present paper introduces two new regression methods and illustrates the results of a comprehensive comparison of the performances of the best two previous methods, the two new methods introduced here and as much as ten standard regression methods known from the specialised literature. The comparison is performed over nine different datasets, ranging from electrocardiogram data to text analysis data, by means of four figures of merit, that include regression precision as well as runtime. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 29-57 Issue: 1 Volume: 11 Year: 2019 Keywords: non-monotone nonlinear data-fitting; data transformation; isotonic regression; statistical regression. File-URL: http://www.inderscience.com/link.php?id=96617 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:1:p:29-57 Template-Type: ReDIF-Article 1.0 Author-Name: Ned Kock Author-X-Name-First: Ned Author-X-Name-Last: Kock Title: Factor-based structural equation modelling: going beyond PLS and composites Abstract: Partial least squares (PLS) methods offer many advantages for path modelling, such as fast convergence to solutions and relaxed requirements in terms of sample size and multivariate normality. However, they do not deal with factors, but with composites. As a result, they typically underestimate path coefficients and overestimate loadings. Given these, it is difficult to fully justify their use for confirmatory factor analyses or factor-based structural equation modelling (SEM). We addressed this problem through the development of a new method that generates estimates of the true composites and factors, potentially placing researchers in a position where they can obtain consistent estimates of a wide range of model parameters in SEM analyses. A Monte Carlo experiment suggests that this new method represents a solid step in the direction of achieving this ambitious goal. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 1-28 Issue: 1 Volume: 11 Year: 2019 Keywords: partial least squares; PLS; structural equation modelling; measurement error; path bias; variation sharing; Monte Carlo simulation. File-URL: http://www.inderscience.com/link.php?id=96618 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:1:p:1-28 Template-Type: ReDIF-Article 1.0 Author-Name: Rita Yi Man Li Author-X-Name-First: Rita Yi Man Author-X-Name-Last: Li Author-Name: Edward Chi Ho Tang Author-X-Name-First: Edward Chi Ho Author-X-Name-Last: Tang Author-Name: Tat Ho Leung Author-X-Name-First: Tat Ho Author-X-Name-Last: Leung Title: Democracy and economic growth Abstract: Many nations consider democracy to be an important social value. Nevertheless, does it mean that countries with more democracy are often wealthier? What are the relationships between economic growth and democracy? This research includes 167 countries to study the issue. We employ the data of the democracy index, corruption perception index, inflation, population, number of internet users, balance of trade, foreign direct investment, etc. We have also included sub-indices such as the electoral process and pluralism, functioning of government, political participation, culture, and civil liberties. An innovative part of the paper is how the corruption perception index has been included in our analysis. Besides, principal component analysis is applied to study the relationship between democracy and economic growth. We conclude that it takes democracy a very long time to affect the macro-economy. The fast pace of change in democracy even harms the macro-economy. If the economy reaches a well-developed stage, the economy will gradually transform into a democratic city automatically in the absence of any external pressure. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 58-80 Issue: 1 Volume: 11 Year: 2019 Keywords: democracy; economic growth; corruption perception index; liberalisation. File-URL: http://www.inderscience.com/link.php?id=96622 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:1:p:58-80 Template-Type: ReDIF-Article 1.0 Author-Name: T. Sheik Yousuf Author-X-Name-First: T. Sheik Author-X-Name-Last: Yousuf Author-Name: M. Indra Devi Author-X-Name-First: M. Indra Author-X-Name-Last: Devi Title: A novel single scan distributed pattern mining algorithm for frequent pattern identification Abstract: In data mining, the extraction of frequent patterns from large databases is still a challenging and difficult task due to the various drawbacks such as, high response time, communication cost to alleviates such issues, a new algorithm namely single scan distributed pattern mining algorithm (SSDPMA) is proposed in this paper for frequent mining. The frequent patterns are extracted in a single scan of the database. Then, it is split into multiple files, which will be shared to multiple virtual machines (VMs) to store and compute the weight for the distinct records. Then, the support, confidence and threshold values are estimated. If the limit is greater than the given data, the frequent data are mined by using the proposed SSDPMA algorithm. The experimental results evaluate the performance of the proposed system in terms of response time, message size, execution time, run time and memory usage. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 81-100 Issue: 1 Volume: 11 Year: 2019 Keywords: data mining; frequent pattern mining; single scan distributed pattern mining algorithm; SSDPMA; virtual machine; VM; file split algorithm; item sets; infrequent items; connect 4 dataset. File-URL: http://www.inderscience.com/link.php?id=96623 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:1:p:81-100 Template-Type: ReDIF-Article 1.0 Author-Name: Tawfik Thelaidjia Author-X-Name-First: Tawfik Author-X-Name-Last: Thelaidjia Author-Name: Abdelkrim Moussaoui Author-X-Name-First: Abdelkrim Author-X-Name-Last: Moussaoui Author-Name: Salah Chenikher Author-X-Name-First: Salah Author-X-Name-Last: Chenikher Title: An effective feature selection method based on maximum class separability for fault diagnosis of ball bearing Abstract: The paper deals with the development of a novel feature selection approach for bearing fault diagnosis to overcome drawbacks of the distance evaluation technique (DET); one of the well-established feature selection approaches. Its drawbacks are the influence of its effectiveness by the noise and the selection of salient features regardless of the classification system. To overcome these shortcomings, an optimal discrete wavelet transform (DWT) is firstly used to decompose the bearing vibration signal at different decomposition depths to enhance the signal to noise ratio. Then, a combination of DET with binary particle swarm optimisation (BPSO) algorithm and a criterion based on scatter matrices employed as an objective function are suggested to improve the classification performances and to reduce the computational time. Finally, support vector machine is utilised to automate the identification of different bearing conditions. From the obtained results, the effectiveness of the suggested method is proven. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 115-132 Issue: 2 Volume: 11 Year: 2019 Keywords: ball bearing; binary particle swarm optimisation; BPSO; discrete wavelet transform; DWT; data analysis; distance evaluation technique; DET; fault diagnosis; feature selection; scatter matrices. File-URL: http://www.inderscience.com/link.php?id=98817 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:2:p:115-132 Template-Type: ReDIF-Article 1.0 Author-Name: Dharmendra Singh Rajput Author-X-Name-First: Dharmendra Singh Author-X-Name-Last: Rajput Title: Review on recent developments in frequent itemset based document clustering, its research trends and applications Abstract: The document data is growing at an exponential rate. It is heterogeneous, dynamic and highly unstructured in nature. These characteristics of document data pose new challenges and opportunities for the development of various models and approaches for documents clustering. Different methods adopted for the development of these models. But these techniques have their advantages and disadvantages. The primary focus of the study is to the analysis of existing methods and approaches for document clustering based on frequent itemsets. Subsequently, this research direction facilitates the exploration of the emerging trends for each extension with applications. In this paper, more than 90 recent (published after 1990) research papers are summarised that are published in various reputed journals like IEEE Transaction, ScienceDirect, Springer-link, ACM and few fundamental authoritative articles. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 176-195 Issue: 2 Volume: 11 Year: 2019 Keywords: document clustering; association rule mining; unstructured data; uncertain data. File-URL: http://www.inderscience.com/link.php?id=98818 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:2:p:176-195 Template-Type: ReDIF-Article 1.0 Author-Name: Marziye Mirzadeh Tahroodi Author-X-Name-First: Marziye Mirzadeh Author-X-Name-Last: Tahroodi Author-Name: Ali Payan Author-X-Name-First: Ali Author-X-Name-Last: Payan Title: A method to rank the efficient units based on cross efficiency matrix without involving the zero weights Abstract: One of the basic objections of the previous models of cross efficiency (CE) is the possibility for the weights to equal zero. This case takes place for the inputs and the outputs in the efficient responses in CE models. Therefore, the input and the output weights which equal zero do not play a role in computing the score of the CE. In this paper, to overcome this problem, an idea to prevent the optimal weights to equal zero in the CE method is offered. This new method can be expanded to all CE models. Based on the offered method, a zero-one mixed linear programming problem is proposed to obtain a set of non-zero weights among the optimal solutions of the preliminary CE model. Following, the zero-one mixed linear programming problem is changed into an equivalent linear program. Then, according to a consistent CE matrix the efficient units are ranked. In order to explain the model and indicate its advantage, an example is given. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 101-114 Issue: 2 Volume: 11 Year: 2019 Keywords: ranking; cross efficiency; CE; zero weights; preference matrix; fuzzy preference relation; zero-one mixed linear programming problem. File-URL: http://www.inderscience.com/link.php?id=98819 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:2:p:101-114 Template-Type: ReDIF-Article 1.0 Author-Name: Fahmi Bargui Author-X-Name-First: Fahmi Author-X-Name-Last: Bargui Author-Name: HanĂȘne Ben-Abdallah Author-X-Name-First: HanĂȘne Author-X-Name-Last: Ben-Abdallah Author-Name: Jamel Feki Author-X-Name-First: Jamel Author-X-Name-Last: Feki Title: Enhancing the involvement of decision makers in data mart design Abstract: The design phase of a data warehousing project remains difficult for both decision makers and requirements analysts. In this paper, we tackle this difficulty through two contributions. First, we propose a natural language based and goal-oriented template for requirements specification that includes all concepts of the decision-making process. The use of familiar concepts and natural language makes our template more accessible and helps decision makers in validating the specified requirements, which avoids producing data mart that does not meet their needs. Secondly, we propose a decision-making ontology that provides for a systematic decomposition of decision-making goals, which allows new requirements to emerge. This automatic requirements elicitation helps analysts to overcome their lack of domain knowledge, which avoids producing erroneous requirements. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 148-175 Issue: 2 Volume: 11 Year: 2019 Keywords: decision support system; data warehouse; data mart; requirements engineering; multidimensional modelling; goal-oriented requirements engineering; automatic reasoning; ontology. File-URL: http://www.inderscience.com/link.php?id=98820 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:2:p:148-175 Template-Type: ReDIF-Article 1.0 Author-Name: Naoual El Aboudi Author-X-Name-First: Naoual El Author-X-Name-Last: Aboudi Author-Name: Laila Benhlima Author-X-Name-First: Laila Author-X-Name-Last: Benhlima Title: A new feature subset selection model based on migrating birds optimisation Abstract: Feature selection represents a fundamental preprocessing phase in machine learning as well as data mining applications. It reduces the dimensionality of feature space by dismissing irrelevant and redundant features, which leads to better classification accuracy and less computational cost. This paper presents a new wrapper feature subset selection model based on a recently designed optimisation technique called migrating birds optimisation (MBO). Initialisation issue regarding MBO is explored to study its implications on the model behaviour by experimenting different initialisation strategies. A neighbourhood based on information gain was designed to improve the search effectiveness. The performance of the proposed model named MBO-FS is compared with some state-of-the-art methods regarding the task of feature selection on 11 UCI datasets. Simulation results show that MBO-FS method achieves promising classification accuracy using a smaller feature set. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 133-147 Issue: 2 Volume: 11 Year: 2019 Keywords: feature selection; migrating birds optimisation; MBO; classification. File-URL: http://www.inderscience.com/link.php?id=98821 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:2:p:133-147 Template-Type: ReDIF-Article 1.0 Author-Name: Asmaa Benghabrit Author-X-Name-First: Asmaa Author-X-Name-Last: Benghabrit Author-Name: Brahim Ouhbi Author-X-Name-First: Brahim Author-X-Name-Last: Ouhbi Author-Name: Bouchra Frikh Author-X-Name-First: Bouchra Author-X-Name-Last: Frikh Author-Name: El Moukhtar Zemmouri Author-X-Name-First: El Moukhtar Author-X-Name-Last: Zemmouri Author-Name: Hicham Behja Author-X-Name-First: Hicham Author-X-Name-Last: Behja Title: Feature selection methods for document clustering: a comparative study and a hybrid solution Abstract: The web proliferation makes the exploration and the use of the huge amount of available unstructured text documents challenged, which drives the need of document clustering. Hence, improving the performances of this mechanism by using feature selection seems worth investigation. Therefore, this paper proposes an efficient way to highly benefit from feature selection for document clustering. We first present a review and comparative studies of feature selection methods in order to extract efficient ones. Then we propose a sequential and hybrid combination modes of statistical and semantic techniques in order to benefit from crucial information that each of them provides for document clustering. Extensive experiments prove the benefit of the proposed combination approaches. The performance of document clustering is highest when the measures based on Chi-square statistic and the mutual information are linearly combined. Doing so, it avoids the unwanted correlation that the sequential approach creates between the two treatments. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 246-272 Issue: 3 Volume: 11 Year: 2019 Keywords: document clustering; feature selection; statistical and semantic data analysis; chi-square statistic; mutual information; k-means algorithm; comparative study; hybrid solution. File-URL: http://www.inderscience.com/link.php?id=101154 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:3:p:246-272 Template-Type: ReDIF-Article 1.0 Author-Name: Subramanian Kannimuthu Author-X-Name-First: Subramanian Author-X-Name-Last: Kannimuthu Author-Name: Kandhasamy Premalatha Author-X-Name-First: Kandhasamy Author-X-Name-Last: Premalatha Title: Stellar mass black hole optimisation for utility mining Abstract: Major challenges in mining high utility itemsets from the transaction databases requires exponential search space and database-dependent minimum utility threshold. The search space is very large because of the large number of distinct items and size of the database. Data analysts need to specifying appropriate minimum utility thresholds for their data mining tasks though they may have no knowledge pertaining to their databases. To get rid of these problems, Stellar mass black hole optimisation (SBO) method is proposed to mine Top-K HUIs from the transaction database without specifying minimum utility threshold. To know the performance of SBO, the experiment results are compared with GA. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 222-245 Issue: 3 Volume: 11 Year: 2019 Keywords: data mining; genetic algorithm; stellar mass black hole optimisation; SBO; high utility itemsets; utility mining. File-URL: http://www.inderscience.com/link.php?id=101155 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:3:p:222-245 Template-Type: ReDIF-Article 1.0 Author-Name: R. Sivaraj Author-X-Name-First: R. Author-X-Name-Last: Sivaraj Author-Name: R. DeviPriya Author-X-Name-First: R. Author-X-Name-Last: DeviPriya Title: Memetic particle swarm optimisation for missing value imputation Abstract: Incomplete values in databases stand as a major concern for data analysts and many methods have been devised to handle them in different missing scenarios. Many researchers are increasingly using evolutionary algorithms for handling them. In this paper, a memetic algorithm based approach is proposed which integrates the principles of particle swarm optimisation and simulated annealing, a local search method. A novel initialisation strategy for PSO is also proposed in order to seed good particles into the population. Simulated annealing prevents PSO from premature convergence and helps it in reaching global optimum. PSO algorithm exhibits explorative behaviour and SA exhibits exploitative behaviour and serves as the right combination for memetic algorithm implementation. The proposed algorithm is implemented in different datasets to estimate the missing values and the imputation accuracy and the time taken for execution is found to be better than other standard methods. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 273-289 Issue: 3 Volume: 11 Year: 2019 Keywords: memetic algorithm; tournament selection; Bayesian probability; simulated annealing. File-URL: http://www.inderscience.com/link.php?id=101156 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:3:p:273-289 Template-Type: ReDIF-Article 1.0 Author-Name: Safa Bettoumi Author-X-Name-First: Safa Author-X-Name-Last: Bettoumi Author-Name: Chiraz Jlassi Author-X-Name-First: Chiraz Author-X-Name-Last: Jlassi Author-Name: Naet Arous Author-X-Name-First: Naet Author-X-Name-Last: Arous Title: A comparative study of unsupervised image clustering systems Abstract: The purpose of clustering algorithms is to give sense and extract value from large sets of structured and unstructured data. Thus, clustering is present in all science areas that use automatic learning. Therefore, we present in this paper a comparative study and an evaluation of different clustering methods proposed in the literature such as prototype based clustering, fuzzy and probabilistic clustering, hierarchical clustering and density based clustering. We present also an analysis of advantages and disadvantages of these clustering methods based essentially on experimentation. Extensive experiments are conducted on three real-world high dimensional datasets to evaluate the potential and the effectiveness of seven well-known methods in terms of accuracy, purity and normalised mutual information. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 197-221 Issue: 3 Volume: 11 Year: 2019 Keywords: unsupervised clustering; density based clustering; partitioning clustering; fuzzy and probabilistic clustering; hierarchical clustering. File-URL: http://www.inderscience.com/link.php?id=101157 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:3:p:197-221 Template-Type: ReDIF-Article 1.0 Author-Name: Subhas A. Meti Author-X-Name-First: Subhas A. Author-X-Name-Last: Meti Author-Name: V.G. Sangam Author-X-Name-First: V.G. Author-X-Name-Last: Sangam Title: Enhanced auto associative neural network using feed forward neural network: an approach to improve performance of fault detection and analysis Abstract: Biosensors have played a significant role in many of present day's applications ranging from military applications to healthcare sectors. However, its practicality and robustness in its usage in real time scenario is still a matter of concern. Primarily issues such as prediction of sensor data, noise estimation and channel estimation and most importantly in fault detection and analysis. In this paper an enhancement is applied to the auto associative neural network (AANN) by considering the cascade feed forward propagation. The residual noise is also computed along with fault detection and analysis of the sensor data. An experimental result shows a significant reduction in the MSE as compared to conventional AANN. The regression based correlation coefficient has improved in the proposed method as compared to conventional AANN. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 291-309 Issue: 4 Volume: 11 Year: 2019 Keywords: WBAN; fault detection and analysis; feed forward neural network; enhanced AANN; residual noise. File-URL: http://www.inderscience.com/link.php?id=103754 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:4:p:291-309 Template-Type: ReDIF-Article 1.0 Author-Name: P.M. Arunkumar Author-X-Name-First: P.M. Author-X-Name-Last: Arunkumar Author-Name: S. Chandramathi Author-X-Name-First: S. Author-X-Name-Last: Chandramathi Author-Name: S. Kannimuthu Author-X-Name-First: S. Author-X-Name-Last: Kannimuthu Title: Sentiment analysis-based framework for assessing internet telemedicine videos Abstract: Telemedicine services through internet and mobile devices need effective medical video delivery systems. This work describes a novel framework to study the assessment of internet-based telemedicine videos using sentiment analysis. The dataset comprises more than 1,000 text comments of medical experts collected from various medical animation videos of Youtube repository. The proposed framework deploys machine learning classifiers such as Bayes net, KNN, C 4.5 decision tree, support vector machine (SVM) and SVM with particle swarm optimisation (SVM-PSO) to infer opinion mining outputs. The results portray that SVM-PSO classifier performs better in assessing the reviews of medical video content with more than 80% accuracy. The model's inference of precision and recall values using SVM-PSO algorithm shows 87.8% and 85.57% respectively and henceforth underlines its superiority over other classifiers. The concepts of sentiment analysis can be applied effectively to the web-based user comments of medical videos and the end results can be highly critical to enhance the reputation of telemedicine education across the globe. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 328-336 Issue: 4 Volume: 11 Year: 2019 Keywords: machine learning; telemedicine; medical videos; sentiment analysis; data analysis. File-URL: http://www.inderscience.com/link.php?id=103755 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:4:p:328-336 Template-Type: ReDIF-Article 1.0 Author-Name: Richa Sharma Author-X-Name-First: Richa Author-X-Name-Last: Sharma Author-Name: Shailendra Narayan Singh Author-X-Name-First: Shailendra Narayan Author-X-Name-Last: Singh Author-Name: Sujata Khatri Author-X-Name-First: Sujata Author-X-Name-Last: Khatri Title: Data mining classification techniques - comparison for better accuracy in prediction of cardiovascular disease Abstract: Cardiovascular disease is a broad term which includes stroke or any disorder in the cardiovascular system that has the heart at its centre. This disease is a critical cause of mortality every year across the globe. Data mining utilises a variety of techniques and algorithms that could help to draw some interesting conclusions about cardiovascular disease. Data mining in healthcare can assist in predicting disease. This study aims to gain knowledge from a heart disease dataset and analyse several data mining classification techniques seeking improved accuracy and a lesser error rate in the results. The data set for the experiment is chosen from the UCI machine learning repository database. The dataset is analysed using two different data mining tools, i.e., WEKA and Tanagra. The analysis was done using the 10 fold cross validation technique. The results show that the Naive Bayes algorithm and the C-PLS algorithm outperform others with an accuracy of 83.71% and 84.44% respectively. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 356-373 Issue: 4 Volume: 11 Year: 2019 Keywords: data mining; classification techniques; machine learning tools; cardiovascular disease; KNN; Naïve Bayes; C-PLS; decision tree. File-URL: http://www.inderscience.com/link.php?id=103756 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:4:p:356-373 Template-Type: ReDIF-Article 1.0 Author-Name: Hanen Bouali Author-X-Name-First: Hanen Author-X-Name-Last: Bouali Author-Name: Jalel Akaichi Author-X-Name-First: Jalel Author-X-Name-Last: Akaichi Author-Name: Ala Gaaloul Author-X-Name-First: Ala Author-X-Name-Last: Gaaloul Title: Real-time data warehouse loading methodology and architecture: a healthcare use case Abstract: In the healthcare context, existing systems suffer from the lack of supporting heterogeneity and dynamism. Consequently, resulting from sensors, streaming data brought another dimension to data mining research. This is due to the fact that, in data streams, only a time window is available. Contrary to the traditional data sources, data streams present new characteristics as being continuous, high-volume, open-ended and concept drift. To analyse event streams, data warehouse seems to be the answer to this problematic. However, classical data warehouse does not incorporate the specificity of event streams that are spatial, temporal, semantic and real-time. For these reasons, we focus inhere on presenting the conceptual modelling, the architecture and loading methodology of the real-time data warehouse by defining a new dimensionality and stereotype for classical data warehouse. To prove the efficiency of our real-time data warehouse, we adapt the model to a medical unit pregnancy care case study which show promising results. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 310-327 Issue: 4 Volume: 11 Year: 2019 Keywords: data warehouse; data analysis; real-time; healthcare. File-URL: http://www.inderscience.com/link.php?id=103757 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:4:p:310-327 Template-Type: ReDIF-Article 1.0 Author-Name: Poornima Mehta Author-X-Name-First: Poornima Author-X-Name-Last: Mehta Author-Name: Satish Chandra Author-X-Name-First: Satish Author-X-Name-Last: Chandra Title: Enhancement of SentiWordNet using contextual valence shifters Abstract: Sentence structure has a considerable impact on the sentiment polarity of a sentence. In the presence of contextual valence shifters like conjunctions, conditionals and intensifiers some parts of the sentence are more relevant to determine the sentence polarity. In this work we have used valence shifters in sentences to enhance the sentiment lexicon SentiWordNet in a given document set. They have also been used to improve the sentiment analysis at document level. In the near past, micro blogging services like Twitter have become an important data source for sentiment analysis. Tweets, being restricted to 140 characters have slangs, are grammatically incorrect, have spelling mistakes and have informal expressions. The method is aimed at noisy and unstructured data like tweets on which computationally intensive tools like dependency parsers are not very successful. Our proposed system works better on both noisy (Stanford and airlines datasets of Twitter) and structured (movie review) datasets. Journal: Int. J. of Data Analysis Techniques and Strategies Pages: 337-355 Issue: 4 Volume: 11 Year: 2019 Keywords: sentiment analysis; SentiWordNet; contextual valence shifters; micro-blogs; discourse; Twitter; Lexicon enhancement; SentiWordNet enhancement; sentence level polarity. File-URL: http://www.inderscience.com/link.php?id=103758 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:injdan:v:11:y:2019:i:4:p:337-355