Memory-Centric and Heterogeneous Computing Storage Server Architecture in the Big Data Era

Theme

Storage Technology

Topic

Memory-Centric and Heterogeneous Computing Storage Server Architecture in the Big Data Era

Background

In the era of big data and new age of data ushered in by AI, cloud, and IOT, computer systems are experiencing an unprecedented scale of data volume, variety and velocity. The problem is that most storage IO is still too slow and too costly, especially for certain workloads like online payment & e-commerce in Alibaba single day event. Upgrades to the network infrastructure like 100G/200G RDMA over ethernet and NVMe-OF is going to be mainstream to address certain extent of problem. Recently, we also witnessed great progress in emerging byte-addressable nonvolatile memory technologies like 3D-Xpoint which have the potential to achieve near-memory performance. Heterogeneous computing denotes a scenario where different computing platforms are exploited for distributed storage because of the end of moore’s laws. The result is that the computing nodes in storage server have different execution models, ranging from the traditional x68 architecture to GPUs, FPGA and then even other processor types like the ARM ones or more specialized processors as TPUs.

All of the new technologies’ mentioned above will be combined to use in storage server design to address the problems distributes storage are facing listed below:

  • High performance IO path
  • SLA problem
  • QOS problem
  • Cost Efficient
  • Fault tolerant design

Target

  • A new architect of our current distributed storage system balanced among storage, network, computing, and further performance improved with cost-efficient of such system
  • A fully scalable testing/validated infrastructure for testing the reliability and performance of a new architect distributed storage system
  • A thorough understanding of the potential impact of any failures to our design and architect of new storage system 

Related Research Topics

  • Software-hardware co-design storage engine under distributed storage efficiently utilizing new storage media(OCSSD), smartNic and heterogeneous computing
  • SmartNIC offloading including storage service offloading & high performance networking protocol in a large scale distributed storage system   
  • Heterogeneous computing in storage server utilizing FPGA/ARM/GPU along with x86 platform
  • To increase the availability and reliability of a large scale distributed system by providing a white box based storage media like OC-SSD
  • Multi-level caches design with using different media type like SMR(HDD), SSD(TLC), SSD(QLC), Optane(AEP)

 

Suggested Collaboration Method

AIR (Alibaba Innovative Research), one-year collaboration project. 

 

Scan QR code
关注Ali TechnologyWechat Account