International Journal of Knowledge Engineering and Soft Data Paradigms (6 papers in press)
Information Criterion-Based Nonhierarchical Clustering
by Isamu Nagai, Katsuyuki Takahashi, Hirokazu Yanagihara
Abstract: In the analysis of actual data, it is important to determine whether there are clusters in the data. This can be done using one of several methods of cluster analysis, which can be roughly divided into hierarchical and nonhierarchical clustering methods.
Nonhierarchical clustering can be applied to more types of data than can hierarchical clustering (see e.g., Saito and Yadohisa, 2006), and hence, in this paper, we focus on nonhierarchical clustering. In nonhierarchical clustering, the results heavily depend on the number of clusters, and thus it is very important to select the appropriate number of clusters. Bozdogan (1986) and Manning, Reghavan, and Schutze (2009; Section 16.4.1) used formal information criteria, e.g., Aakaike's information criterion (AIC) and so on, for selecting the number of clusters. In this paper, we verify that such formal information criteria work poorly for selecting the number of clusters by conducting numerical examinations. Hence, we extend a formal AIC by adding a new penalty term, and search for an additional penalty with an acceptable selection-performance through numerical experiments.
Keywords: AIC; Cluster analysis; Information criterion; k-means procedure; Multivariate linear regression model; Nonhierarchical clustering.
A Crowdsourced System for User Studies in Information Extraction
by Zohreh Khojasteh Ghamari
Abstract: In this Paper, from an Entity Linking (EL) system, we take a set of tweets, where some subsequence of words is annotated with possible meaning/entities and these entities are linked with several Wikipedia pages. We propose a model using crowdsourcing to disambiguate and decide about the accurate Wikipedia page that must be linked with a definite word/spot. We discuss about importance of crowdsourcing and compare different crowdsourcing systems and at the end, introduce crowdflower. We discuss about the crowdflower features in particular. Finally, we analyze output reports of the crowdflower and present a novel approach to select the reliable results. In summary, our observations show that reliable results have a confidence rate over 0.5.
Keywords: Crowdsourcing; Information Extraction; Data Mining.
Simulation and analysis of DoS attack in cloud environment
by Gayathri Rajakumaran, Neelanarayanan Venkataraman
Abstract: Cloud computing is a dynamic environment in terms of both topology and network technology. The resources in the cloud environment are scalable and it is offered to the customers on-demand which leads to a drastic increase in the customers count. The key design feature of the internet and the protocols used to access the cloud services makes it vulnerable to various security issues. Out of the critical security issues, Distributed Denial of Service (DDoS) ranks first since it disrupts the availability of cloud services. This paper aims to provide a detection measure for DDoS attack in the cloud computing environment using Simple Network Management Protocol (SNMP). DDoS Attack analysis is done with the proposed parameters network throughput, latency and delay specific to the cloud environment and plotted graphically. DDoS prevention architecture is also proposed which is planned to implement as a future work.
Keywords: DDoS; SNMP; SYN-flood attack – Regular traffic analysis.
Exploring the Risk Factors of Top Five Malignancies in Bangladesh
by M. Rashedur Rahman
Abstract: The number of Cancer patients has been increased to a great extent over last few years in Bangladesh. National Institute of Cancer Research and Hospital (NICRH) has confirmed more than 27000 cancer patients from year 2008-10. However, the number of researches on Cancer patients is limited in the context of Bangladesh. This research aims to explore the key factors that have relationship with Cancer. The factors include different demographic and occupational information as well smoking habits of patients. We have generated two Adaptive Neuro Fuzzy Inference System (ANFIS) models, one for female and one for male patients of Bangladesh and reported the relationships of different factors with five top malignancies reported in NIRCH data set.
Keywords: Adaptive Neuro Fuzzy Inference System (ANFIS); Fuzzy Inference System (FIS); malignancies; ICD-O; cancer.
A Bayesian approach And Probabilistic Latent variable Clustering Based Web Services Selection
by Vaitheki Kanagaraj, Zayaraz Godandapani
Abstract: Web services are the software system to support interoperable machine to machine interaction over a network. There is a constant increase in the number of services and processing the large quantity of data over the web requires the exceptional and an improved service selection and discovery approach. The requisite to recommend services grounded on both functional and non-functional requirements. The user keyword extraction using the lexical analyzer may give a better extraction than the traditional keyword based search. The use of Bayesian model is efficient for handling the non-missing of services. The Probalistic Latent Variable Clustering (PLVC) enhanced with the Bayesian classification may perform better in terms of Precision, Recall, and F-measure. The quality of the cluster might be better in terms of purity and entropy for the proposed algorithm.
Keywords: Web service ; K-Means; Bayesian Model; Probabilistic Latent Variable Clustering.
Hybrid Key Management Scheme for Heterogeneous Wireless Sensor Networks
by Sharmila Ramamurthy
Abstract: Wireless Sensor Network (WSN) is a large scale network with thousands of tiny sensor nodes deployed in the field which is widely used in real time applications that includes Internet of Things (IOT), Smart Card, Smart Grid, Smart Phone and Smart City. The greatest issue in wireless sensor network is secure key establishment and communication. The key management plays vital role to provide efficient key establishment in resources constraint device. Existing key management techniques have many limitations such as prior deployment knowledge, transmission range, insecure communication and node capture by the adversary. In this paper, hybrid key management scheme is proposed for heterogeneous WSN. Initially the keys are generated and pre-distributed into sensor nodes using hyper elliptic curve equation. After the deployment, the node tries to form a cluster and then establishes a symmetric key using Orthogonal Latin Square method, where the cluster heads and base station use public key encryption method based on Hyper Elliptic Curve Cryptography (HECC). This symmetric key encryption enhances the security with less communication and computational overhead between adjacent nodes in the cluster to improve the connectivity, network resilience and reduces the storage overhead. The simulation results show that the proposed scheme is better in terms of robustness, connectivity and lesser computational overhead with reduced key size.
Keywords: wireless sensor network; clustering; key management scheme; orthogonal Latin square; hyper elliptic curve cryptography.