International Journal of Computational Systems Engineering (9 papers in press)
A Survey on Effects of Class Imbalance in Data Pre-processing Stage of Classification Problem
by Nitin Malave, Anant Nimkar
Abstract: Classifier learning with data-sets suffering from imbalance class distribution is a challenging task and it hinders the performance of machine learning algorithms. This imbalance occurs when a particular class is highly outnumbered than that of another class. Such kind of data distribution in the real world applications caught the attention of many researchers. This paper presents the review of various state of the art sampling techniques and ensemble techniques to resolve class imbalance. Classification task in imbalance domain can be a binary or a multi-class classification problem. This paper discusses the techniques incorporated on a binary classification problem. There are various other factors such as threshold of distribution, inter or within class imbalance etc, that make class imbalance a more complex issue. Threshold is a way to provide significance of imbalance among the classes. This threshold has normal range of 1:9 for minority to majority class. Different datasets can have different range as per distribution. Various techniques have been used throughout the literature to alleviate the class imbalance problem which includes data sampling, cost sensitive methods, bagging, boosting etc. Comparison of various approaches have been shown in the literature with their advantages and disadvantages. Different parameters used for evaluating model for performance measure have been reviewed. Accuracy is majorly used as evaluation parameter in machine learning problems, but from reviews it is found that there are different parameters such as precision, recall and AU-ROC which provide statistical measures for evaluating the model. The paper also gives research directions in the domain of Class Imbalance Problems.
Keywords: Machine Learning; Class Imbalance; Rare Event Detection; Classification; Resampling Techniques.
From secured Legacy systems to interoperable services (The careful evolution of the French Tax Administration to provide new possibilities while ensuring the primary tax recovering objective)
by Christophe GAIE
Abstract: The purpose of this paper is two-fold. One, the author describes the interest of opening Legacy systems in large organization instead of replacing them from scratch. A review of similar approaches in the literature is also provided and a concrete method based on the combination of REST architecture and Legacy systems is proposed. Two, the author provides a feedback on the different REST solutions available to facilitate their usage by Information Technology (IT) architects.
The paper also points out the importance not only to preserve the good functioning of the Legacy heritage but also to migrate progressively applications to modernized languages. Assuredly, the code developed is robust, tackles the whole business perimeter and is maintained by experts whereas new technologies may suffer from a lack of stability and/or technical expertise within the organization. This advocates for a progressive migration from Legacy to modern applications, especially in the specific context of essential public services.
The paper finally details a method to perform efficiently the migration by introducing data exchange between the Legacy and modern parts of the hybrid architecture during the migration. It also describes a method which can be useful to select an API management solution suited to the particularity of reader organization.
Keywords: API management; decoupling; IT migration; webservices; Service-oriented architecture (SOA); REST architecture; Legacy modernization; large organizations.
Protecting Child on the Internet using Deep Generative Adversarial Networks
by Sabira Ojagverdiyeva
Abstract: In this paper to provide sanitization of harmful information, a Generative Adversarial Network (GAN) is used. An approach consisting of two blocks implementing sanitization of data to control children's access to malicious (harmful) information on the Internet is proposed in the article. The first block of the approach is a generator that contains an autoencoder deep neural network and the second one is a discriminator which contains a logistic regression classifier. According to the proposed approach, the autoencoder inside the generator block, by adding some noise, implements the transformation of sensitive attributes into non-sensitive, which are considered to be dangerous for children, and the logistic regression inside the discriminator block, realizes the classification of the transformed data. The purpose of the anonymizer (generator) here is to minimize the recognition efficiency of the classifier, by transforming malicious content into non-malicious content. To maintain the usefulness of information during the transformation of data, the privacy and utility rates of the sanitized data are measured. Expected risks and the optimal consensus between these two parameters are achieved with the application of the minimax algorithm. As a result of experiments on synthetic data, the classification algorithm performs the recognition of the class of sensitive data with low accuracy and the class of non-sensitive data with high accuracy.
Keywords: child protection; data sanitization; autoencoder; deep learning; Generative Adversarial Networks.
Question Answering System for Agriculture Domain using machine learning techniques: literature survey and challenges
by Prashant Niranjan, Vijay Rajpurohit
Abstract: Natural language processing (NLP) is a part of artificial intelligence & computer science. Question answering system that provides an interaction between computers and human languages, it is role to program the computers to process and analyze the large amounts of human language data. It is one of the important aspect in all domains and helps in every domain to satisfy the people requirements. Now a days almost all people are literate and using the mobile phone to receive the up to date information as per their requirement. QAS can be used to provide succinct information for the questions that are being asked by user and it provides answers to users based on some rules which are stored in the data base. This survey paper details about what is question answering system and its previous related work with respect to methods, technologies or approaches that were used. It provides research gaps and future scope to the researches in the reviewed papers, which helps researchers to choose a suitable solution to their problems. Wherein available comparative analyses have been provided.
Keywords: Question-answering; QAS; Natural language processing; Answer Extraction; human-computer interaction; Artificial intelligence.
Some Investigations on cost study for Economic Order Quantity model (EOQ) by quantity declined under time - associated demand and non-steady holding cost
by R.P. Tripathi
Abstract: This investigation is probable to assist the manufacturing executives and practitioners in assessing the pressure of changeable holding cost mechanism. In reality holding cost is always in fluctuating stage. In this investigation,we make an effort is ready to set an EOQ models for time sensitive demand by means of unstable holding costs. Two models is argued viz (i) time linked demand with cargo space time linked holding cost and (ii) time linked demand and unvarying holding cost. Purchase cost is unspecified stable connected to order amount. Every element measure discounts are accessible. Mathematical analysis is prepared to authenticate the model proposed in this research. An algorithm is also displayed to settle on most favorable procurement measure that diminishes total cost.We show that total cost is convex. Numerical designs and Sensitivity study are discussed to learn effects of a variety of constraints.Mathematica software 7.0 is used for finding numerical results.
Keywords: Time-induced demand; variable holding cost; size discount; optimality; EOQ.
Cost-effective modernization of COBOL legacy applications
by Christophe Gaie, Franck Barbier
Abstract: This paper deals with the problem of reducing technical debt and modernizing legacy applications, particularly COBOL heritage on mainframe computers (a.k.a. mainframes). The proposed approach aims at offering a solution, which is efficient through a strong relation with a cost-effective method about transformation. rnrnIndeed, literature shows both technical and scientific solutions whose economic facet is often totally ignored, letting us understand that re-doing all from scratch might be cheaper, but proofs about this are urgently expected, an often missing link. Indeed, it is essential to remind that the cost of transformation is a critical concern, especially when the to-be-transformed applications rely on millions of lines of, very often, odd COBOL.
Keywords: legacy software; software heritage; cost-effective modernization; COBOL; reengineering; code migration.
RELIABILITY ANALYSIS OF (N) CLIENTS SYSTEM UNDER STAR TOPOLOGY AND COPULA LINGUISTIC APPROACH
by Vijayvir Singh, Monika Gahlot
Abstract: This paper contrasts with the study of a computer laboratory system with n clients under Star Topology and k-out-of-n: G scheme. The system has two servers one is operational and another a redundant server. The main server connects the clients, and failure of the main server acknowledge as a partial failure. The system also has another needed device such as switch and router, and failure in switch and router brings the system in the entire failed situation. The main server bears a partial failure, and the redundant server before the repair of the main server brings the system in a completely failed state. The failure rates of transmuting one state to another state are constant and presume to follow an exponential time distribution. The repairs follow two types of distributions the system analyzed by employing the supplementary variable and the Laplace transform. The various traditional measures of system performance through reliability context have computed for different types of' failures and repair. Some computations reliability parameters corresponding to assumed values of parameters has illustrated as examples.
Keywords: Availability; MTSF; Switch failure; Server failure; Router failure; Gumbel- Hougaard family copula distribution.
Interval Valued Fuzzy Matrix Based Decision Making For Machine Learning Algorithms
by Priya Bhatnagar, Kriti Ohri, Deepak Sukheja
Abstract: Decision making is very important in terms of machine learning or imparting artificial intelligence into machines that work upon the traditional logic theory. It is a process that helps a machine to think like a human being and for a human being to ease his/her difficult decision-making process. Real world problems related to decision making contain uncertainty in data which cannot be very precise as per our choice. As it is seen that the Interval valued fuzzy logic deals greatly with such imprecise data and gives the best outcome.
This paper presents a multi criteria decision making approach using interval valued fuzzy logic through a new operator and a new algorithm which is based on various parameters of satisfaction of a buyer who wish to buy a certain item provides a great choice of satisfaction with a certain result. Decision making is very important in terms of machine learning or imparting artificial intelligence into machines which work upon the traditional logic theory. It is a process which helps a machine to think like a human being and for a human being to ease his/her difficult decision-making process. Finally, With the help of a case study based on a significant survey the proposed method is described.
Keywords: Interval valued fuzzy matrix;Decision making;Algebra.
Detecting Gurmukhi and Hindi text object present in images
by Rahul Malik
Abstract: This paper resolves the trouble needed in detecting Gurmukhi and Hindi text objects present in images. As there are lots of differences among the features of the script in English and Indian languages (for example line over text string), we cannot directly apply the existing algorithms. Therefore in this paper, we propose a new region-based bottom-up method to extract text from images, where we first identify elementary substructures using connected components and edges and then merge them successively into huge buildings until all book areas are recognized. We localized text candidates by extracting closed boundaries. As each character counter has high contrast as compared to its neighbors, all character pixels and several non-character pixels which exhibit excessive neighborhood intensity difference are localized as written text within the advantage image. For every localized unit, we compute a set of geometric properties (based on height width and area) and filter out non-text regions. Although a lot of noise is removed by this step a second pass filter is applied based on stroke width. The stroke width of Hindi and Gurmukhi characters are approximately uniform. The candidates with similar properties are aggregated into chains, according to the observation that the figures of a book type are aimed along a particular path with similar color, stroke, and size width. For performance evaluation, we created our own dataset containing 60 images of varied size, resolution, and type. The performance is evaluated on this dataset containing caption, document, and scene text images using different performance metrics viz., Precision, Recall, and F-Score.
Keywords: Gurmukhi; Precision; Recall.