Title: A Spark-based parallel genetic algorithm for Bayesian network structure learning

Authors: Naixin Wu

Addresses: Information Center, Wuxi Institute of Technology, Wuxi, 214121, Jiangsu, China

Abstract: The Bayesian network structure learning (BNSL) algorithm based on genetic algorithm (GA) has the problem of long search time and being prone to falling into local optima. When the sampling data is large, the single machine BNSL algorithm cannot obtain the BN structure within a limited time. To address this issue, this paper proposes a parallel BNSL algorithm based on the Spark framework with GA (PGA-BN). The three main stages of the proposed PGABN are population initialisation, BIC score calculation, and evolution operators, which are all designed in parallel on each partition to accelerate based on Spark. The experiments are studied on two typical BN datasets with different sample sizes to evaluate the parallel performance of the PGA-BN algorithm. Experimental results showed that the PGA-BN is significantly faster than its single-machine version with the satisfied accuracy.

Keywords: Bayesian networks; structure learning; genetic algorithm; parallel; BIC score; learning accuracy.

DOI: 10.1504/IJCSM.2024.140876

International Journal of Computing Science and Mathematics, 2024 Vol.20 No.2, pp.109 - 117

Received: 06 Jun 2023
Accepted: 07 Sep 2023

Published online: 03 Sep 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article