Framework and Algorithm Optimization for GPU Database System

Background

With the increasing of data volumes, analytical databases are facing vital challenges with respect to both storage and computation. To minimize storage cost and maximize IO utilization, the materialized data is supposed to be compressed. However, the data compression and decompression procedures consume a large amount of CPU cycles, which might negatively impact the system performance. Some database operations, such as join and aggregation, are also computing intensive. Those operations are frequently used in modern analytical databases to mine the value of PB-scale data. Given the fact that the processing speed increasing of modern CPU are getting slower due to the end of the Moore’s Low, traditional CPU-based analytical databases cannot meet the demand of analyzing fast-growing data. We sort to emerging architectures to solve the existing challenges.

As a representative of emerging architectures, GPUs are equipped with tremendous hardware resources and capable to deliver much higher performance on data-intensive applications than CPUs. At the meantime, the computing capability of GPUs is still growing rapidly. GPUs have been widely employed in both academic research and industry applications. Endeavors have also been dedicated to apply GPUs to solve the challenges in analytical databases. Researchers have proved that GPUs are able to deliver 10 times better performance than CPUs for some database operations, such as hash join, aggregation. Commercial GPU databases, such as SQream, MapD, BlazingDB have been accessible for over 5 years. However, we observe that there are still many challenges.

We plan to revisit the framework and algorithm design of existing analytical databases and redesign the system to better utilize the capability of GPUs. More clearly, we plan to redesign the system with respect to storage subsystem, GPU resource management, algorithm optimization and GPU-conscious plan optimizer.

 

Target

  • To design novel storage formats, smart index formats, smart metadata formats, and GPU-based compressing/decompressing algorithms. 
  • To efficiently manage the host/device memory space, GPU computing resource scheduling, and multi-GPU scheduling.
  • To design efficient parallel GPU algorithms for database operations, such as hash join, aggregation, multi-column sorting.
  • To adapt the existing plan optimizer to GPU-based database and take the GPU features into consideration at planning time.

Related Research Topics

  • https://www.omnisci.com/
  • https://www.kinetica.com/
  • https://sqream.com/
  • Red Fox- An execution environment for relational query processing on GPUs
  • Acceleration and execution of relational queries using general purpose graphics processing unit (GPGPU)
  • A framework for cost-based optimization of hybrid CPU/GPU query plans in database systems
  • Database compression on graphics processors
  • GPU-based Minwise Hashing
  • Why it is time for a HyPE- A hybrid query processing engine for efficient GPU Coprocessing in DBMS

Scan QR code
关注Ali TechnologyWechat Account