Title: Stage based parallel programming model for high concurrency, stateful network services: internals and design principles

Authors: Yan Chen, Xinyuan Fan, Wenjun Yang, Kai Chen, Guozhi Xu

Addresses: Department of Electronics Engineering, Shanghai Jiaotong University, Room 1613, HaoRan High Tech Build, HuaShan Road No.1954 Shanghai 200030, China. ' Department of Electronics Engineering, Shanghai Jiaotong University, Room 1613, HaoRan High Tech Build, HuaShan Road No.1954 Shanghai 200030, China. ' Department of Electronics Engineering, Shanghai Jiaotong University, Room 1613, HaoRan High Tech Build, HuaShan Road No.1954 Shanghai 200030, China. ' Department of Electronics Engineering, Shanghai Jiaotong University, Room 1613, HaoRan High Tech Build, HuaShan Road No.1954 Shanghai 200030, China. ' Department of Electronics Engineering, Shanghai Jiaotong University, Room 1613, HaoRan High Tech Build, HuaShan Road No.1954 Shanghai 200030, China

Abstract: Recent research has revealed that the conventional threaded programming model exhibits poor performance under high concurrency workloads. Moreover, with emerging stateful network services, where concurrent states can go to thousands, the traditional |thread per request| solution is no longer feasible. To meet this challenge, people promote a new parallel programming model, stage-based programming, where the whole service logic is viewed as a set of stages, each driven by a limited number of threads and capable of communicating with others through message passing. In this paper, we show two main streams of stage design, Thread-Over-Stages and Thread-Per-Stage. Due to the advantages in Thread-Per-Stage, we promote this solution and reveal three key design principles for delivering a high performance stage-based design. They are: on a uniprocessor system, the number of stages should not be too large - for most fine grained network services, it should not be over ten; for stages with blocking calls, a good estimation of the number of threads for that stage can be gotten by (call arrival rate × blocking time); and to deliver temporary messages quickly, it is much better to locate this part of the services in a separate nonblocking stage. We implement a sample SIP proxy server to prove our arguments.

Keywords: parallel programming; stage-based programming; stateful application; high performance computing; networking; threaded programming; thread-per-stage.

DOI: 10.1504/IJHPCN.2005.007865

International Journal of High Performance Computing and Networking, 2005 Vol.3 No.1, pp.33 - 44

Published online: 28 Sep 2005 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article