Alibaba Innovative Research (AIR) > Low Carbon & Efficient Serverless Infrastructure
Optimized Resource Scheduling Algorithm in Cloud Computing System

Research Themes

Low Carbon & Efficient Serverless Infrastructure

Background

MaxCompute is a data processing platform for large-scale data warehousing. It has been widely used and trusted by Alibaba and other customers, and is responsible for various large-scale data analysis scenarios in e-commerce, security, manufacturing and logistics. Currently, MaxCompute has more than 100 thousand servers located in data centers around the world, and can process more than one million jobs each day. The stored data has reached EB-level data volume long before, making it one of the leading products in the global market.

 

The huge cost of investment with low resource utilization has long been a great concern for cloud computing platforms like MaxCompute.How to improve the resource efficiency and save cost while guaranteeing the QoS (Quality of Service) of workloads becomes a great challenge to cloud providers. The main challenges for resource scheduling includes (1) different patterns and characteristics of workloads (2) complex periodicity and dynamicity of a workload (3) resource contention among those co-locating workloads on the same hardware (4) strict requirements for QoS from customers (5) uncertainty and heterogeneity of resources supply etc.

 

In Alibaba, we have resource management and job scheduling system like Fuxi which is able to implement flexible and precise resource scheduling strategy. At present, most scheduling strategy are based on predefined strategy. We’d like to explore some new strategy based on optimization and machine learning algorithms, so that we can get the global optimal solution which can adapt to the dynamicity and diversity of workloads.

Target

  • An optimized scheduling algorithm which can make efficient use of the resources on the premise of meeting the Quality of Service (QoS) requirements of cloud workloads.
  • A precise prediction and classification algorithm for the workloads based on historical patterns.
  • A real-time and autonomic analysis framework which can adapt to the unexpected change of the resource demand online.

Related Research Topics

  • Prediction Models for time series.
  • Cost minimized dynamic resource allocation algorithm for cloud computing system.
  • Quality assessment and QoS-aware autonomic resource management in cloud computing.
  • Dynamic and adaptive based resource management algorithms.
  • Dynamic scalability and elastic resource provision of cloud systems.
  • Cloud procurement of an infrastructure as a service (IaaS) cloud.

Scan QR code
关注Ali TechnologyWechat Account