Article: Considering continuous review policy in a two-echelon inventory system using a reinforcement learning approach Journal: International Journal of Procurement Management (IJPM) 2025 Vol.23 No.3 pp.385 - 405 Abstract: This research focuses on analysing a two-echelon inventory system comprising a central warehouse and several identical retailers. The system utilises a continuous review policy for replenishment across all facilities. The demand at the retailers follows an independent Poisson process, and the lead times are subject to stochastic variability without a pre-defined probability distribution. Additionally, the lead time for the warehouse, sourced from an external supplier, is assumed to remain constant. Unfulfilled demand is lost at the retailers, while it is backlogged at the warehouse. To optimise the ordering points and predetermined order sizes at all echelons, a reinforcement learning algorithm is developed. The proposed algorithm's effectiveness is evaluated through simulation and comparison with existing literature solutions. Moreover, the algorithm is implemented with both ordering points and order sizes as decision variables, demonstrating the efficacy of the Q-learning algorithm in this context. Inderscience Publishers - linking academia, business and industry through research

Title: Considering continuous review policy in a two-echelon inventory system using a reinforcement learning approach

Authors: Adele Behzad; Mohammadali Pirayesh; Mohammad Ranjbar

Addresses: Industrial Engineering Department, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran ' Industrial Engineering Department, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran ' Industrial Engineering Department, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran

Abstract: This research focuses on analysing a two-echelon inventory system comprising a central warehouse and several identical retailers. The system utilises a continuous review policy for replenishment across all facilities. The demand at the retailers follows an independent Poisson process, and the lead times are subject to stochastic variability without a pre-defined probability distribution. Additionally, the lead time for the warehouse, sourced from an external supplier, is assumed to remain constant. Unfulfilled demand is lost at the retailers, while it is backlogged at the warehouse. To optimise the ordering points and predetermined order sizes at all echelons, a reinforcement learning algorithm is developed. The proposed algorithm's effectiveness is evaluated through simulation and comparison with existing literature solutions. Moreover, the algorithm is implemented with both ordering points and order sizes as decision variables, demonstrating the efficacy of the Q-learning algorithm in this context.

Keywords: multi-echelon inventory system; continuous review; lost sales; reinforcement learning.

DOI: 10.1504/IJPM.2025.146719

International Journal of Procurement Management, 2025 Vol.23 No.3, pp.385 - 405

Received: 06 Nov 2023
Accepted: 19 Dec 2023
Published online: 16 Jun 2025 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Considering continuous review policy in a two-echelon inventory system using a reinforcement learning approach

Keep up-to-date