Authors: Yuankun Shi; Kevin J. Long; Kaushik Balasubramanian; Zhaojuan Bian; Adam Procter; Ramesh Illikkal
Addresses: Software and Service Group, Intel Corp., Shanghai, China ' Data Center Group, Intel Corp., Santa Clara, CA, USA ' Data Center Group, Intel Corp., Santa Clara, CA, USA ' Software and Service Group, Intel Corp., Shanghai, China ' Data Center Group, Intel Corp., Santa Clara, CA, USA ' Data Center Group, Intel Corp., Santa Clara, CA, USA
Abstract: We are witnessing an explosion of AI use cases driving the computer industry, and especially datacentre and server architectures. As Intel faces fierce competition in this emerging technology space, it is critical that architecture definitions and directions are driven with data from proper tools and methodologies, and insights are drawn from end-to-end holistic analysis at the datacentre levels. In this paper, we introduce DeepSim, a cluster-level behavioural simulation model for deep learning. DeepSim, which is based on the Intel CoFluent simulation framework, uses timed behavioural models to simulate complex interworking between compute nodes, networking, and storage at the datacentre level, providing a realistic performance model of real-world image recognition applications based on the popular deep learning framework Caffe. The end-to-end simulation data from DeepSim provides insights which can be used for architecture analysis driving future datacentre architecture directions. DeepSim enables scalable system design, deployment, and capacity planning through accurate performance insights. Results from preliminary scaling studies (e.g., node scaling and network scaling) and what-if analyses (e.g., Xeon with HBM and Xeon Phi with dual OPA) are presented in this paper. The simulation results are correlated well with empirical measurements, achieving an accuracy of 95%.
Keywords: deep learning; datacentre; behavioural simulation; AlexNet; architecture analysis; performance analysis; server srchitecture.
International Journal of Big Data Intelligence, 2019 Vol.6 No.3/4, pp.224 - 233
Received: 08 Mar 2018
Accepted: 22 Jun 2018
Published online: 04 Jun 2019 *