研究目的
Investigating and optimizing the performance of GPU-based communication schemes on container-based cloud environments to deliver near-native performance for end applications.
研究成果
The C-GDR approach provides locality-aware, NUMA-aware, and communication-pattern-aware capabilities to enable intelligent and adaptive communication coordination for optimal performance on GPU-enabled clouds. Performance evaluations show that MVAPICH2 with C-GDR outperforms default MVAPICH2-GDR schemes by up to 66% on micro-benchmarks and up to 26% on HPC applications over a container-based environment.
研究不足
The study focuses on intra-node MPI communication across GPUs in containers, and the performance of inter-node communication is not affected. The complexity of designing efficient GPU-based communication schemes on clouds is significantly increased under container environments.
1:Experimental Design and Method Selection:
The study involves evaluating the performance of different GPU-based communication schemes on both native and container-based environments. The C-GDR approach is proposed to optimize communication performance by dynamically selecting the best communication paths based on process locality, GPU residency, NUMA architecture, and communication pattern.
2:Sample Selection and Data Sources:
The experiments are conducted on a testbed consisting of eight physical nodes, each equipped with a dual-socket Intel Xeon processor, NVIDIA K-80 GPU, and Mellanox ConnectX-4 EDR HCA. Docker containers are used to simulate the cloud environment.
3:List of Experimental Equipment and Materials:
The equipment includes Intel Xeon E5-2680 processors, NVIDIA K-80 GPUs, Mellanox ConnectX-4 EDR HCAs, and Docker containers.
4:Experimental Procedures and Operational Workflow:
The performance of point-to-point and collective MPI operations is evaluated using OSU Micro-Benchmarks and several HPC applications. The C-GDR approach is integrated into the MVAPICH2 library for performance comparison.
5:Data Analysis Methods:
The performance data is analyzed to compare the latency and bandwidth of different communication schemes under various message sizes and communication patterns.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容