Title: DeepSim: cluster level behavioural simulation model for deep learning

Authors: Yuankun Shi; Kevin J. Long; Kaushik Balasubramanian; Zhaojuan Bian; Adam Procter; Ramesh Illikkal

Addresses: Software and Service Group, Intel Corp., Shanghai, China ' Data Center Group, Intel Corp., Santa Clara, CA, USA ' Data Center Group, Intel Corp., Santa Clara, CA, USA ' Software and Service Group, Intel Corp., Shanghai, China ' Data Center Group, Intel Corp., Santa Clara, CA, USA ' Data Center Group, Intel Corp., Santa Clara, CA, USA

Abstract: We are witnessing an explosion of AI use cases driving the computer industry, and especially datacentre and server architectures. As Intel faces fierce competition in this emerging technology space, it is critical that architecture definitions and directions are driven with data from proper tools and methodologies, and insights are drawn from end-to-end holistic analysis at the datacentre levels. In this paper, we introduce DeepSim, a cluster-level behavioural simulation model for deep learning. DeepSim, which is based on the Intel CoFluent simulation framework, uses timed behavioural models to simulate complex interworking between compute nodes, networking, and storage at the datacentre level, providing a realistic performance model of real-world image recognition applications based on the popular deep learning framework Caffe. The end-to-end simulation data from DeepSim provides insights which can be used for architecture analysis driving future datacentre architecture directions. DeepSim enables scalable system design, deployment, and capacity planning through accurate performance insights. Results from preliminary scaling studies (e.g., node scaling and network scaling) and what-if analyses (e.g., Xeon with HBM and Xeon Phi with dual OPA) are presented in this paper. The simulation results are correlated well with empirical measurements, achieving an accuracy of 95%.

Keywords: deep learning; datacentre; behavioural simulation; AlexNet; architecture analysis; performance analysis; server srchitecture.

DOI: 10.1504/IJBDI.2019.100892

International Journal of Big Data Intelligence, 2019 Vol.6 No.3/4, pp.224 - 233

Received: 08 Mar 2018
Accepted: 22 Jun 2018

Published online: 04 Jun 2019 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article