Title: Mining balanced API protocols

Authors: Deng Chen; Yanduo Zhang; Wei Wei; Rongcun Wang; Huabing Zhou; Xun Li; Binbin Qu

Addresses: Hubei Provincial Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, 430205, China ' Hubei Provincial Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, 430205, China ' Industrial Robot Engineering Center, Wuhan Institute of Technology, Wuhan, 430205, China ' School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China ' Hubei Provincial Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, 430205, China ' Hubei Provincial Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, 430205, China ' School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China

Abstract: API protocols can be used in many aspects of software engineering, such as software testing, program validation, software documentation, etc. Mining API protocols based on probabilistic models is proved to be an effective approach to achieve protocols automatically. However, it always achieves unbalanced protocols, that is, protocols described using probabilistic models have unexpected extremely high and low probabilities. In this paper, we discuss the unbalanced probability problem and propose to address it by preprocessing method call sequences used for training. Our method first finds tandem arrays in method call sequences based on the suffix tree. Then, it substitutes each tandem array with a tandem repeat. Since repeated sub method call sequences are eliminated, balanced API protocols may be achieved. In order to investigate the feasibility and effectiveness of our approach, we implemented it in our previous prototype tool ISpecMiner and used the tool to perform a comparison test based on several real-world applications. Experimental results show that our approach can achieve more balanced API protocols than existing approaches, which is essential for mining valid and precise API protocols.

Keywords: mining API protocol; suffix tree; probability balance; method call sequence; Markov model; tandem array.

DOI: 10.1504/IJCSE.2018.091764

International Journal of Computational Science and Engineering, 2018 Vol.16 No.3, pp.289 - 302

Received: 16 Sep 2015
Accepted: 03 Mar 2016

Published online: 16 May 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article