Article: An information network security policy learning algorithm based on Sarsa with optimistic initial values Journal: International Journal of Computational Science and Engineering (IJCSE) 2019 Vol.19 No.2 pp.189 - 196 Abstract: With the widespread applications of artificial intelligence and automation, more and more devices are monitored by computer systems. In many cases, multiple management control information systems compose a comprehensive information system network. As the scale of the network is getting larger and larger and the topology of the network is getting more and more sophisticated, it is impossible for a fixed mode network system control policy, which was designed for small and simple network that often lacked ability to deal with dynamic environment, to handle security policy task. Hereby a network security policy online learning algorithm based on Sarsa with the optimistic initial values is proposed. The algorithm consists of two parts, one acting as the defence agent and the other acting as the attacking agent. The defence agent learns and improves the system protection policy by fighting against simulating attacking from attacking agent. Defence agent takes advantage of Sarsa method to improve its defence policy, which utilises historical experience to improve the defence policy in an online mode. The use of optimistic initial values speeds up the training time. Inderscience Publishers - linking academia, business and industry through research

Title: An information network security policy learning algorithm based on Sarsa with optimistic initial values

Authors: Fang Wang; Renjun Feng; Haiyan Chen; Wen Wu; Fei Zhu

Addresses: State Grid Jiangsu Electric Power Limited Company, Suzhou Power Supply Branch, No. 555 Laodong Road, Suzhou Jiangsu, 215004, China ' State Grid Jiangsu Electric Power Limited Company, Suzhou Power Supply Branch, No. 555 Laodong Road, Suzhou Jiangsu, 215004, China ' State Grid Jiangsu Electric Power Limited Company, Suzhou Power Supply Branch, No. 555 Laodong Road, Suzhou Jiangsu, 215004, China ' School of Computer Science and Technology, Soochow University, Suzhou Jiangsu, 215006, China ' School of Computer Science and Technology, Soochow University, Suzhou Jiangsu, 215006, China

Abstract: With the widespread applications of artificial intelligence and automation, more and more devices are monitored by computer systems. In many cases, multiple management control information systems compose a comprehensive information system network. As the scale of the network is getting larger and larger and the topology of the network is getting more and more sophisticated, it is impossible for a fixed mode network system control policy, which was designed for small and simple network that often lacked ability to deal with dynamic environment, to handle security policy task. Hereby a network security policy online learning algorithm based on Sarsa with the optimistic initial values is proposed. The algorithm consists of two parts, one acting as the defence agent and the other acting as the attacking agent. The defence agent learns and improves the system protection policy by fighting against simulating attacking from attacking agent. Defence agent takes advantage of Sarsa method to improve its defence policy, which utilises historical experience to improve the defence policy in an online mode. The use of optimistic initial values speeds up the training time.

Keywords: information network; optimistic initial values; Sarsa; network defence; risk control.

DOI: 10.1504/IJCSE.2019.100239

International Journal of Computational Science and Engineering, 2019 Vol.19 No.2, pp.189 - 196

Received: 11 Apr 2018
Accepted: 01 Jul 2018
Published online: 20 Jun 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: An information network security policy learning algorithm based on Sarsa with optimistic initial values

Keep up-to-date