Alibaba Innovative Research (AIR) > Research on Frontier Technologies in Data Center and Server
Optimizing Extended GPU Memory System for Heterogeneous Computing

Research Themes

Research on Frontier Technologies in Data Center and Server

Background

The era of AI computing brings significant challenges to traditional GPU memory subsystems. While the AI model size increases >10x per year, we only observe a very slow-paced improvement of GPU memory system’s capability in terms of both capacity and bandwidth, which becomes a serious obstacle for the AI model evolvement. Besides, other novel applications (eg., database, high performance computing, etc.) also need high-throughput memory accesses and large memory footprint. As a result, a large-capacity and high-bandwidth GPU memory system is required to deal with the large-scale data in modern GPU applications.

Unfortunately, it is difficult to tackle the well-known “memory wall” problem with current GPU design: First, scaling up memory capacity and bandwidth by applying more HBMs is unsustainable due to the limited on-die area. Second, scaling out memory capacity and bandwidth by employing more GPUs may cause great wastes on the computing power of GPUs.

The target of this study is to build a GPU memory system with more powerful memory subsystems with novel solutions such as high speed off-chip interconnection, emerging non-volatile memory, 3D-stacking memory, etc.

Target

  • A large-capacity, low-cost, configurable and flexible memory extension solution for GPUs
  • A novel system architecture involving GPU, CPU, and memory to provide high performance and low TCO on modern GPU applications
  • Algorithm/software/hardware co-optimization for killer applications such as recommendation system, graph-neural networks, etc.
  • Support the model inferencing/training of large-scale foundation models

Related Research Topics

  • Application study on GPUs with memory extension
  • System architecture design for memory pooling
  • Cache-coherent protocols for memory sharing
  • Memory management and virtualization with unified memory space
  • CUDA extension and programming model support for GPU memory extension/pooling

Scan QR code
关注Ali TechnologyWechat Account