System Software and Operation
Compute and Memory Resource Profiling and Optimization for Deep Neural Networks in a Warehouse-Scale Computer
With the continuous development of big data and intelligent technology, artificial intelligence applications, especially applications built based on deep learning technology, have gradually become the mainstream applications of cloud platforms. However, as the scale of input data and number of network layers increase, the complexity of deep learning models continues to evolve, and the computing resources required for model training and inference also continue to increase. Massive compute and memory requirements make data centers to increase the hardware scale to get a quick response. The growth in hardware scale brings not only a rapid increase in purchase costs, but also huge hardware energy consumption and operating costs. Many enterprises have integrated a large number of heterogeneous acceleration units in the Cloud, including FPGA, GPU, and TPU, to improve the utilization of hardware resources and reduce the energy consumption of the data center.
There are many problems in the training of complex neural network models. First of all, before the model training, the developer does not know the amount of accelerator memory required by the model, so the training process often fails due to insufficient accelerator memory. Existing studies have shown that 8.8% of job failures are caused by insufficient memory. Failure of model training will greatly affect development efficiency and causes more waste of compute and memory resources. If memory requirement can be roughly estimated before model training, the problem of training failure can be alleviated. At present, some researchers have analyzed the memory usage of projects written in languages such as C, C++, and Java. However, deep neural networks have their own particularities: (1) The deep learning framework adopts a hybrid programming paradigm, which separates the Python front-end part from the back-end part, thereby hiding part of the internal execution information, making it difficult to accurately track the accelerator memory usage. (2) In the deep learning framework, in order to accelerate the calculation, a low-level calculation framework such as cuDNN is used, which makes it difficult to analyze the amount of accelerator memory occupied. (3) When the framework is running, there will be many hidden factors, such as allocation strategy, internal use, etc. This makes the previous work unable to be directly applied to the analysis of the real-time memory usage of the training model.
- Build a set of methodologies to estimate running time, memory occupancy and energy consumption for deep neural networks in the Cloud.
- A thorough characterization of performance impact between different hardware resource configurations, deep learning frameworks for various deep neural networks in the Cloud.
- A scalable profiling and optimization system for large-scale deep neural networks in the Cloud.
- Various compute and memory performance optimizations for deep neural networks in a warehouse-scale computer.
Related Research Topics
- An analytical performance model for deep neural networks
- Cross-stack/full-stack profiling and analysis of machine learning models on GPUs
- Layer-wise performance bottleneck analysis of deep neural networks
- Performance evaluation of CUDA libraries on NVIDIA GPUs
- Performance evaluation of Tensorflow on NVIDIA GPUs
- Optimize PyTorch for deep neural networks on GPUs
- Optimize Tensorflow for deep neural networks on GPUs
- A programming compiler of deep neural networks for heterogenous platforms
- Lightweight distributed tracing for deep learning workloads
- Compute and memory reuse optimization for deep learning workloads
- Leverage model sparsity, compression and pruning for optimizing memory efficiency
Suggested Collaboration Method
AIR (Alibaba Innovative Research), one-year collaboration project.