Template-Type: ReDIF-Article 1.0
Author-Name: Agrima Srivastava
Author-X-Name-First: Agrima
Author-X-Name-Last: Srivastava
Author-Name: G. Geethakumari
Author-X-Name-First: G.
Author-X-Name-Last: Geethakumari
Title: Privacy preserving solution to prevent classification inference attacks in online social networks
Abstract:
In order to improve their business solutions the data holders often release the social network data and its structure to the third party. This data undergo node and attribute anonymisation before its release. This however does not prevent the users from inference attacks which an un-trusted third party or an adversary would carry out at their end by analysing the structure of the graph. Therefore, there is an utmost necessity to not only anonymise the nodes and their attributes but also to anonymise the edge sets in the released social network graph. Anonymising involves perturbing the actual data which results in utility loss. Ensuring utility and preserving privacy are inversely proportional to each other and is a challenging task. In this work we have proposed, implemented and verified an efficient utility based privacy preserving solution to prevent the third party inference attacks for an online social network graph.
Journal: Int. J. of Data Science
Pages: 31-44
Issue: 1
Volume: 4
Year: 2019
Keywords: privacy; online social networks; privacy preserving data publishing; utility; network classification.
File-URL: http://www.inderscience.com/link.php?id=98357
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:1:p:31-44

Template-Type: ReDIF-Article 1.0
Author-Name: Hasanthi A. Pathberiya
Author-X-Name-First: Hasanthi A.
Author-X-Name-Last: Pathberiya
Author-Name: Chandima D. Tilakaratne
Author-X-Name-First: Chandima D.
Author-X-Name-Last: Tilakaratne
Author-Name: Liwan L. Hansen
Author-X-Name-First: Liwan L.
Author-X-Name-Last: Hansen
Title: An improved algorithm to handle noise objects in the process of clustering
Abstract:
Cluster analysis is considered as an approach for unsupervised learning. It tends to recognise hidden grouping structure in a set of objects using a predefined set of rules. Objects occupying unusual characteristics add noise to the data space. As a result, complexities and misinterpretation in clustering structures will arise. This study aims at proposing a novel iterative approach to eradicate the effect of noise objects in the process of deriving clusters of data. Performance of the proposed approach is tested on partitioning, hierarchical and neural network based clustering algorithms using both simulated and standard datasets supplemented with noise. An improvement in the quality of clustering structure resulted from the proposed approach is witnessed, compared to that of conventional clustering algorithms.
Journal: Int. J. of Data Science
Pages: 1-17
Issue: 1
Volume: 4
Year: 2019
Keywords: clustering algorithms; handling noise data; mining methods and algorithms; k-means; Ward's method; self organising map.
File-URL: http://www.inderscience.com/link.php?id=98358
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:1:p:1-17

Template-Type: ReDIF-Article 1.0
Author-Name: Afaf G. Bin Saadon
Author-X-Name-First: Afaf G. Bin
Author-X-Name-Last: Saadon
Author-Name: Hoda M.O. Mokhtar
Author-X-Name-First: Hoda M.O.
Author-X-Name-Last: Mokhtar
Title: Survey on iterative and incremental approaches in distributed computing environment
Abstract:
Iterative computation has become increasingly needed for a large and important class of applications such as machine learning and data mining. These iterative applications typically apply computations over large-scale datasets. So it is desirable to develop efficiently distributed frameworks to process data iteratively. On the other hand, data keeps growing over time as new entries are added and existing entries are deleted or modified. This incremental nature of data makes the previously computed results of iterative applications stale and inaccurate over time. It is hence necessary to periodically refresh the computation so that the new changes can be quickly reflected in the computed results. This paper presents the existing distributed systems that support iterative and incremental computations on large-scale datasets. It describes the main optimisations and features of these systems and identifies their limitations.
Journal: Int. J. of Data Science
Pages: 18-30
Issue: 1
Volume: 4
Year: 2019
Keywords: big data; distributed systems; iterative computation; incremental processing.
File-URL: http://www.inderscience.com/link.php?id=98359
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:1:p:18-30

Template-Type: ReDIF-Article 1.0
Author-Name: Ibrahim Gomaa
Author-X-Name-First: Ibrahim
Author-X-Name-Last: Gomaa
Author-Name: Hoda M.O. Mokhtar
Author-X-Name-First: Hoda M.O.
Author-X-Name-Last: Mokhtar
Title: Continuous skyline queries in distributed environment
Abstract:
With the expanding number of communications from different mobile applications that acquire location information, the demand for continuous skyline queries has increased. In addition, the extremely fast increase in the data volume and mobile applications that deal with such volume of data such as check-ins recommendation, information services and applications of road networks; have both driven the need to adapt new processing environments to deal with huge amounts of data. In this paper, we present a number of efficient algorithms for processing continuous skyline queries on large datasets using MapReduce framework. The main idea of our proposed algorithms is to compute the skyline query only once at the starting position; then update on the result at the movement of the query point rather than computing the skyline at every time from scratch. In addition, experimental results are conducted which demonstrate the accuracy, performance and efficiency of the proposed algorithms.
Journal: Int. J. of Data Science
Pages: 45-62
Issue: 1
Volume: 4
Year: 2019
Keywords: continuous query processing; moving object; parallel computation; skyline queries; big data management.
File-URL: http://www.inderscience.com/link.php?id=98360
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:1:p:45-62

Template-Type: ReDIF-Article 1.0
Author-Name: Sanjay Chakraborty
Author-X-Name-First: Sanjay
Author-X-Name-Last: Chakraborty
Author-Name: Subham Raj
Author-X-Name-First: Subham
Author-X-Name-Last: Raj
Author-Name: Shreya Garg
Author-X-Name-First: Shreya
Author-X-Name-Last: Garg
Title: Selection of 'K' in K-means clustering using GA and VMA
Abstract:
The K-means algorithm is the most widely used partitional clustering algorithms. In spite of several advances in K-means clustering algorithm, it suffers in some drawbacks like initial cluster centres, stuck in local optima etc. The initial guessing of cluster centres lead to the bad clustering results in K-means and this is one of the major drawbacks of K-means algorithm. In this paper, a new strategy is proposed where we have blended K-means algorithm with genetic algorithm (GA) and volume metric algorithm (VMA) to predict the best value of initial cluster centres, which is not in the case of only K-means algorithm. The paper concludes with the analysis of the results of using the proposed measure to determine the number of clusters for the K-means algorithm for different well-known datasets from UCI machine learning repository.
Journal: Int. J. of Data Science
Pages: 63-81
Issue: 1
Volume: 4
Year: 2019
Keywords: clustering; initial cluster centres; K-means; GA; VMA; volume metric algorithm.
File-URL: http://www.inderscience.com/link.php?id=98361
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:1:p:63-81

Template-Type: ReDIF-Article 1.0
Author-Name: Eliona Gkika
Author-X-Name-First: Eliona
Author-X-Name-Last: Gkika
Author-Name: Dimosthenis Chochlakis
Author-X-Name-First: Dimosthenis
Author-X-Name-Last: Chochlakis
Author-Name: Yannis Tselentis
Author-X-Name-First: Yannis
Author-X-Name-Last: Tselentis
Author-Name: Constantin Zopounidis
Author-X-Name-First: Constantin
Author-X-Name-Last: Zopounidis
Author-Name: Vassilis S. Kouikoglou
Author-X-Name-First: Vassilis S.
Author-X-Name-Last: Kouikoglou
Author-Name: Kitsos Gkikas
Author-X-Name-First: Kitsos
Author-X-Name-Last: Gkikas
Author-Name: Anna Psaroulaki
Author-X-Name-First: Anna
Author-X-Name-Last: Psaroulaki
Title: A retrospective data analysis of Legionella pneumophila diagnostic procedures and their impact on patients' management: the experience of a rapid point-of-care test
Abstract:
We compare a conventional and a rapid point of care test (POCT) for the diagnosis of Legionella pneumophila, considering various performance criteria. We used data of patients with positive test for L. pneumophila (confirmed cases), registered by the microbiology laboratories of two hospitals in Crete, Greece. Hospital A adopts a conventional, indirect fluorescent-antibody technique and Hospital B uses a urinary antigen POCT. The mean laboratory turnaround time was 4.45 days for the conventional test and 0.11 days for POCT. A total of 24 laboratory positive cases (11 inpatients, 13 outpatients) were identified out of 905 samples taken from 751 people. The mean daily hospitalisation cost per inpatient was &#128;79.86 for Hospital B and &#128;127.45 for Hospital A; for the latter a much higher antibiotic treatment cost/patient was recorded. The analysis suggests that a rapid POCT for L. pneumophila could significantly decrease time to diagnosis, improve treatment and reduce hospitalisation charges.
Journal: Int. J. of Data Science
Pages: 101-116
Issue: 2
Volume: 4
Year: 2019
Keywords: Legionella pneumophila; point of care testing; turnaround time; length of stay; cost reduction.
File-URL: http://www.inderscience.com/link.php?id=100319
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:2:p:101-116

Template-Type: ReDIF-Article 1.0
Author-Name: Yeturu Jahnavi
Author-X-Name-First: Yeturu
Author-X-Name-Last: Jahnavi
Title: Analysis of weather data using various regression algorithms
Abstract:
Weather forecasting is a vital application in meteorology and has been one of the most challenging problems around the world. Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid predictions. This is carried out using several regression algorithms. This paper focuses on weather analysis using various regression algorithms in data mining. In this work, linear regression, classification and regression tree, multilayer perceptron neural network and support vector machine (SVM) are used. For weather analysis various primary atmospheric parameters such as average temperature, average pressure and relative humidity are considered. The performance is analysed using various evaluation measures. Evaluation criteria like root mean square error, mean absolute error, relative absolute error and root relative square error are used for measuring the performance of regression algorithms.
Journal: Int. J. of Data Science
Pages: 117-141
Issue: 2
Volume: 4
Year: 2019
Keywords: data mining; weather prediction; regression algorithms.
File-URL: http://www.inderscience.com/link.php?id=100321
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:2:p:117-141

Template-Type: ReDIF-Article 1.0
Author-Name: Kannan Balakrishnan
Author-X-Name-First: Kannan
Author-X-Name-Last: Balakrishnan
Author-Name: Divya Sindhu Lekha
Author-X-Name-First: Divya Sindhu
Author-X-Name-Last: Lekha
Author-Name: R. Sunil Kumar
Author-X-Name-First: R. Sunil
Author-X-Name-Last: Kumar
Title: Analysis of co-authorship network based on some betweenness centrality concepts
Abstract:
Reliant components of a network are the connector nodes which aid in establishing a strongly connected network. Betweenness centrality of a node well captures its connecting capability. We suggest some new betweenness centrality measures which could be useful in analysing the structural connectivity of a network. In this paper we study the behaviour of collaboration in a co-authorship network, namely the NetScience network, from the perspective of these measures. We analyse the network from a micro perspective, where we consider small groups of scientists doing research in a common subdiscipline. We show that each group is formed by the influence of only one or two highly collaborating authors. Another speculation was that even though these authors are highly influential in smaller groups they do not possess notable contribution to the overall research of main discipline.
Journal: Int. J. of Data Science
Pages: 162-179
Issue: 2
Volume: 4
Year: 2019
Keywords: complex networks; network centrality; graph theory; betweenness center; collaboration network; co-authorship network.
File-URL: http://www.inderscience.com/link.php?id=100322
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:2:p:162-179

Template-Type: ReDIF-Article 1.0
Author-Name: Anthony T. Odoemena
Author-X-Name-First: Anthony T.
Author-X-Name-Last: Odoemena
Title: An application of the logic of explanatory power in rough set analysis: implications for the classification of decision rules
Abstract:
This paper uses the logic of explanatory power to address the question of uncertain decision rule classification and interpretation in rough set data analysis. A set theoretic configuration of the measure of explanatory power is introduced. The usefulness of the measure is then examined in the context of two datasets &#45; one related to car evaluation and the other related to the provision of extra educational supports. It is found that the explanatory power measure has some interesting properties that enhance the informativeness and interpretation of non-deterministic decision rules. The result of the numerical analysis shows that the explanatory power index is unique. The index can also facilitate the establishment of an objective threshold that determines whether the explanatory relevance of the premise in a given decision rule is positive, negative, or neutral.
Journal: Int. J. of Data Science
Pages: 85-100
Issue: 2
Volume: 4
Year: 2019
Keywords: rough sets; explanatory power; data analysis; decision rules.
File-URL: http://www.inderscience.com/link.php?id=100329
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:2:p:85-100

Template-Type: ReDIF-Article 1.0
Author-Name: Tiffany Maldonado
Author-X-Name-First: Tiffany
Author-X-Name-Last: Maldonado
Author-Name: Ray Qing Cao
Author-X-Name-First: Ray Qing
Author-X-Name-Last: Cao
Author-Name: Lila L. Carden
Author-X-Name-First: Lila L.
Author-X-Name-Last: Carden
Title: Sentiment analysis on organisational resilience
Abstract:
By applying a sentiment analysis, we examine how firms can achieve organisational resilience by focusing on two different operational strategies in their responses to adverse events: anticipatory responses or reactionary responses. We examined 210 firms and found that firms that focus on an anticipatory strategy of investing in corporate social responsibility benefited from increased organisational resilience. We also found that firms that focus on a reactionary focus of risk management practice in their daily operations also benefited from increased organisational resilience. Furthermore, our study revealed that firms that focus on the economic and environmental aspects of corporate social responsibility and the risk assessment process benefited from higher levels of organisational resilience.
Journal: Int. J. of Data Science
Pages: 142-161
Issue: 2
Volume: 4
Year: 2019
Keywords: sentiment analysis; texting mining; big data; data analytics; organisational resilience; corporate social responsibility.
File-URL: http://www.inderscience.com/link.php?id=100330
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:2:p:142-161

Template-Type: ReDIF-Article 1.0
Author-Name: Obinna Damain Adubisi
Author-X-Name-First: Obinna Damain
Author-X-Name-Last: Adubisi
Author-Name: Ikwuoche John David
Author-X-Name-First: Ikwuoche John
Author-X-Name-Last: David
Author-Name: Ogbaji Eka
Author-X-Name-First: Ogbaji
Author-X-Name-Last: Eka
Author-Name: Awa Erinma Uduma
Author-X-Name-First: Awa Erinma
Author-X-Name-Last: Uduma
Title: State space and Box-Jenkins approaches: a comparison of models prediction performance in finance
Abstract:
This paper describes a study that used data collected from the Central Bank statistical web database system in Nigeria to evaluate and compare the forecasting performance of the nonstationary linear state space model and Box-Jenkins (ARIMA) model at different historic time periods. The comparison uses data series on inflation rates (core and non-core) in Nigeria for a specified period. The performances were evaluated based on three metrics: mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean square percentage error (RMSPE). The one-year forecast evaluation results indicated that predictions from the nonstationary linear state space model outperformed the seasonal ARIMA model at different time periods. Furthermore, the proposed nonstationary linear state space model captured the dynamic structure of the inflationary series reasonably and requires no new cycle of identification and model estimation given the availability of new data.
Journal: Int. J. of Data Science
Pages: 181-195
Issue: 3
Volume: 4
Year: 2019
Keywords: ARIMA; autoregressive integrated moving average; filtering; inflation rate; smoothing; state space model.
File-URL: http://www.inderscience.com/link.php?id=102789
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:3:p:181-195

Template-Type: ReDIF-Article 1.0
Author-Name: Ritesh Srivastava
Author-X-Name-First: Ritesh
Author-X-Name-Last: Srivastava
Author-Name: Veena Mittal
Author-X-Name-First: Veena
Author-X-Name-Last: Mittal
Title: Most preferable combination of explicit drift detection approaches with different classifiers for mining concept drifting data streams
Abstract:
Sensors in the real-world applications are the major sources of big data streams with varying underlying data distribution. Continuously generated time varying data streams are commonly referred as concept drifting data streams. Many concept drifting data mining algorithms explicitly utilise the drift detection algorithms for ensuring the forgetting of out-dated concepts and learn new concepts upon occurrence of drifts. In concept drifting data streams, the accuracy of the learner depends on the accuracy of the drift detection algorithm and its promptness towards drifts detection. For maintaining the consistent high accuracy in the classification of concept drifting data streams, it is very important to understand the preferable combinations of drift detection algorithms with the classification algorithms. In order to explore such preferable combinations, this work presents an empirical evaluation of some popular drift detection methods with some state-of-art classification algorithms on some standard benchmark datasets of real world.
Journal: Int. J. of Data Science
Pages: 196-214
Issue: 3
Volume: 4
Year: 2019
Keywords: concept drifts; online learning; data stream mining; big data; machine learning; classification; drift detection methods; incremental learning; online learning; ensemble.
File-URL: http://www.inderscience.com/link.php?id=102790
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:3:p:196-214

Template-Type: ReDIF-Article 1.0
Author-Name: Margaret F. Shipley
Author-X-Name-First: Margaret F.
Author-X-Name-Last: Shipley
Author-Name: Ray Q. Cao
Author-X-Name-First: Ray Q.
Author-X-Name-Last: Cao
Author-Name: G. Jonathan Davis
Author-X-Name-First: G. Jonathan
Author-X-Name-Last: Davis
Title: Addressing uncertainty in buyer-supplier interfaces by supply chain phase and decision-making level: a fuzzy goal-fitting approach
Abstract:
This exploratory study addresses uncertainty in supply chain management (SCM) interfaces required for effective buyer-supplier partnerships which may be more critical at different phases and organisational levels of decision making. The phases of plan, source, make and deliver, and the operational, tactical and strategic levels of decision making are considered. Fuzzy probabilities of degree of fit to goals set to statistical confidence intervals are applied to survey data from over 400 buyers comparing seven suppliers in the electronics industry. Results showed that the source, plan, and deliver phases at different levels of decision making are to varying degrees important in SCM interfaces. The make phase was less important overall for interfacing. A heuristic is presented for maximising supply chain performance gains with timely attention to partnership interfaces.
Journal: Int. J. of Data Science
Pages: 215-236
Issue: 3
Volume: 4
Year: 2019
Keywords: MCDM; multi-criteria decision making; data analytics; fuzzy sets; supply chain partnership; SCM; supply chain management; supplier selection; data science.
File-URL: http://www.inderscience.com/link.php?id=102791
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:3:p:215-236

Template-Type: ReDIF-Article 1.0
Author-Name: Maliha Momtaz
Author-X-Name-First: Maliha
Author-X-Name-Last: Momtaz
Author-Name: Abu Ahmed Ferdaus
Author-X-Name-First: Abu Ahmed
Author-X-Name-Last: Ferdaus
Author-Name: Chowdhury Farhan Ahmed
Author-X-Name-First: Chowdhury Farhan
Author-X-Name-Last: Ahmed
Author-Name: Mohammad Samiullah
Author-X-Name-First: Mohammad
Author-X-Name-Last: Samiullah
Title: Maximal and closed frequent itemsets mining from uncertain database and data stream
Abstract:
Frequent itemsets (FIs) mining from uncertain database is a very popular research area nowadays. Many algorithms have been proposed to mine FI from uncertain database. But in typical FI mining process, all the FIs have to be mined individually, which needs a huge memory. Four trees are proposed in this paper which are: (i) maximal frequent itemset from uncertain database (MFU) tree which contains only the maximal frequent itemsets generated from uncertain database, (ii) closed frequent itemset from uncertain database (CFU) tree which contains only closed frequent itemsets generated from uncertain database, (iii) maximal frequent itemset from uncertain data stream (MFUS) tree which contains maximal frequent itemsets generated from uncertain data stream and (iv) closed frequent itemset from uncertain data stream (CFUS) tree which contains closed frequent itemsets generated from uncertain data stream. Experimental results are also presented which show that maximal and closed frequent itemsets mining requires less time and memory than typical frequent itemsets mining.
Journal: Int. J. of Data Science
Pages: 237-259
Issue: 3
Volume: 4
Year: 2019
Keywords: FI; frequent itemset; uncertain database; FU; frequent itemset from uncertain database; MFI; maximal frequent itemset; CFI; closed frequent itemset; MFU; maximal frequent itemset from uncertain database; CFU; closed frequent itemset from uncertain database; FUS; frequent itemset from uncertain data stream; MFUS; maximal frequent itemset from uncertain data stream; CFUS; closed frequent itemset from uncertain data stream.
File-URL: http://www.inderscience.com/link.php?id=102792
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:3:p:237-259

Template-Type: ReDIF-Article 1.0
Author-Name: Sharifah Sakinah Syed Ahmad
Author-X-Name-First: Sharifah Sakinah Syed
Author-X-Name-Last: Ahmad
Author-Name: Anis Naseerah Binti Shaik Osman
Author-X-Name-First: Anis Naseerah Binti Shaik
Author-X-Name-Last: Osman
Author-Name: Halizah Basiron
Author-X-Name-First: Halizah
Author-X-Name-Last: Basiron
Title: The impact of social media on human interaction in an organisation based on real-time social media data
Abstract:
The growth of online social networks around the world has created a new place of interaction and communication among people. Individuals can share their knowledge, opinions, and experiences with one another through the features provided where it gives an impact on people's behaviour in terms of interaction, communication and decision making. Twitter is one of the examples of social media provider that empowers users to send and read short messages called 'tweets'. By trying to connect to the outside world, the user would probably disconnect with people around them and this will also affect the human interaction in an organisation. This research offers possible insights for the organisation to identify the pitfalls and opportunity that lies in the new digital human interaction era. This research will attempt to discuss these issues drawing from social media interaction on physical/digital interactions based on data science approach.
Journal: Int. J. of Data Science
Pages: 260-271
Issue: 3
Volume: 4
Year: 2019
Keywords: social media; Twitter; data science; human interaction.
File-URL: http://www.inderscience.com/link.php?id=102793
File-Format: text/html
File-Restriction: Access to full text is restricted to subscribers.
Handle: RePEc:ids:ijdsci:v:4:y:2019:i:3:p:260-271