Alibaba Innovative Research (AIR) > Machine Learning (algorithm)
A heterogeneous task scheduling and deep model acceleration research in software-hardware integrated system


Machine Learning (algorithm)


A heterogeneous task scheduling and deep model acceleration research in software-hardware integrated system


The software-hardware integrated solution combines the end-side solution with Alibaba Cloud's existing cloud-side integrated solution to form a multi-layered architecture of cloud, edge, and end collaboration. This architecture is particularly suitable for the complex and hierarchical business in the city brain solutions. Different levels of business are mapped to the cloud side, the edge side, or the end side with an optimal solution to form a unified solution to reduce operating costs, reduce energy consumption, and improve user experience.  We hope to achieve task scheduling of the integrated system with multi-objective optimization methods, achieve inference acceleration through model compression algorithms to reduce hardware costs, and reduce algorithm adaptation costs through online learning algorithms.

The integrated system usually consists of heterogeneous devices. The computing power and energy consumption of heterogeneous computing devices are different. At the same time, the network connection in the devices also has diverse bandwidth and delay. A complex computing task requires multiple devices at different levels to cooperate. We hope to solve the problem that dispatching a task to multi-level devices. In order to select the optimal scheduling strategy, we can define multi-dimensional measurement methods such as computing power, delay, bandwidth, and energy consumption. However, the complexity of reality usually requires us to achieve the optimization in multiple dimensions at the same time. For this reason, we need to study the task scheduling algorithm of the software-hardware integrated system based on multi-objective optimization.

Deep learning models require a lot of computing power and memory comsumption. In a software-hardware integrated system, large-scale deep models will be challenged when they are performing real-time inference, running models on the device side, and running models with limited computing resources. Model compression is an important method to improve inference efficiency and reduce runtime memory usage. How to efficiently generate models with smaller scale, higher memory utilization, lower energy consumption, faster inference speed, and minimal loss of inference accuracy through quantification, pruning, distillation, and neural architecture search on the model is an important topic in the integrated solution.

The deep learning models deployed on the end-side generally need to be trained with local samples. As products are broadly promoted, adaptation costs will increase dramatically. How to reduce the cost of model local adaptation through online self-learning is an important topic in the software-hardware integrated solution. We hope to reduce the labor cost of localized deployment through online adaptive sample collection and model training algorithms.

With the development of edge chips, it is possible to make model inferences at the edge. With the needs of applications, more and more scenarios require inference on the edge to meet the application's requirements for inference speed and network bandwidth. Nevertheless, inference at the edge still requires a balance between speed and accuracy.

There are many works that apply an existing model to the edge for online inference. For example, by using model compression and model quantization, the existing large networks such as resnet50, vgg, etc. are compressed and quantized, so that they can be used on edge devices Obtain a higher calculation speed; the other is to directly design a model with a smaller amount of calculation, such as mobilenetv1~v4, shufflenet v1~v2, mnasnet and so on.

However, both of these methods will reduce the accuracy of the model. But in fact, we can make full use of time domain information, and with the help of cloud resources, it is expected that higher accuracy can be obtained on devices with limited computing power. For example, we can use cloud low-frequency detection. By sending the edge image to the cloud at a lower frequency, the cloud uses a high-precision model to calculate the large-size features, and the end-side compares its own features with the cloud's large size. The dimensional feature is fused, which not only takes advantage of the fact that cloud reasoning cannot be real-time but high precision, but also takes advantage of the end-to-side computing power but is faster. Through feature fusion, a high inference speed and accuracy is obtained. model.

In short, we hope to realize the task scheduling algorithm of the cloud-side-end integrated system; realize inference acceleration through model compression algorithm and reduce hardware cost; reduce the adaptation cost of the algorithm through online learning algorithm; and improve the algorithm performance through cloud-side feature fusion.


  • A multi-objective optimized task scheduling algorithm for cloud-side-end integrated system
  • A Method for Accelerating Deep Neural Network inference
  • A Method of Online Learning to Reduce Adaptation Cost
  • A method of cloud edge feature fusion to improve algorithm performance

Related Research Topics

  • Neural network architecture search
  • Online learning for object detection and tracking


Suggested Collaboration Method

AIR (Alibaba Innovative Research), one-year collaboration project. 


Scan QR code
关注Ali TechnologyWechat Account