System and Hardware for Hyperscale Graph Neural Network Training


Chip Technology


System and Hardware for Hyperscale Graph Neural Network Training


Graph neural network (GNN) is an emerging type of neural network that has the potential of making revolutions for important applications such as recommendation, searching, and risk control. Therefore, GNN has becoming a very important workload in many enterprises. 

The raise of GNN leads to rethinking the design of the computing resource that used to optimized for general DNN (e.g., CNN, Transformer). The uniqueness of GNN arises many new challenges for the hardware platform, system design, and algorithm design. In general, DNN workload, GB-level data is streamed to GPU/customized hardware for hundreds of layers, heavy NN computing. However, in GNN, tens of TB data is sampled and randomly accessed, and then fed into GPU or other hardware for very shallow (e.g., 2 layers) NN computing. The focus of the compute pipeline is then switched to the graph data sampling and aggregation part, instead of the NN computing part. 

Two important aspects must be taken into consideration while (re)designing the system, hardware, compiler, or algorithm for GNN. The first one is scalability. Industrial graph data size is too large to feed in any single node (not to mention GPU/customized hardware’s device memory). Hyperscale, scale-out solution is highly demanded. The second one is flexibility and programmability. As GNN is actively evolving, any customized system/hardware needs to support a large variety of operators and all their possible combinations.


The goal of this research is to improve the performance of the computing resource for GNN training workload, with the perspectives of scalability and programmability. The scope includes but not limited to hardware architecture, system-level solution, framework and compiler optimization, and algorithm/hardware co-design.

Related Research Topics (but not limited to)

All topics below are dedicated to “scalable and programmable GNN training”

  • Customized hardware architecture
  • Near-memory processing architecture
  • Rethinking the GPGPU architecture
  • Compiler support for GNN customized hardware
  • Distributed heterogenous system optimization
  • Top-of-rack solution and scheduling optimization with graph partition
  • Graph partition and scheduling optimization on full-batch training
  • Framework and compiler (TVM) optimization
  • Hardware/system-friendly GNN algorithms
  • Sparsity in GNN algorithms


Suggested Collaboration Method

AIR (Alibaba Innovative Research), one-year collaboration project. 


Scan QR code
关注Ali TechnologyWechat Account