Research Focus

Service-oriented OS

Cloud-native technologies are gaining traction and are poised to be the foundation on which the next-generation cloud computing is built. This departure from traditional technologies poses great challenges to the services and original design of operating systems. The OS services and interfaces need to be innovated around the cloud-native serverless computing and modern application architectures. Our research in a service-oriented OS focuses on the software isolation mechanism of the OS, and the security and verification mechanism of the managed runtime and bytecode. The lab aspires to improve the OS architecture and services, cater to the requirements of cloud-edge-end applications, and build a new OS ecosystem.

Hardware/Software Co-design System

Research in this area focuses on improving compute capabilities of an OS running on heterogenous chips, and developing performance analysis capabilities for the OS. The research centers on improving performance of hardware and software algorithms in the compute, storage, and network fields. Furthermore, feasibility studies are also performed on heterogenous processor programming frameworks and toolchain solutions. We aim to design the next-generation cloud-native serverless execution unit with hardware/software co-design, building an ecosystem where the heterogenous hardware is better coordinated with software and meeting the requirements for cloud-edge-end services.


Research Team
Jiangwei JiangHead of DAMO OS Lab

He is a distinguished engineer in Alibaba Group. He is also in charge of the Basic Products Business Unit, Alibaba Cloud Intelligence Business Group. Jiangwei joined Taobao in 2008 and built a highly-available architecture for the e-commerce platform. He had led the high availability team and middleware product line since 2012. In December, 2017, Jiangwei started to lead the Basic Products team of Alibaba Cloud with a focus on the R&D of Apsara OS.

Tao MaDAMO OS Lab Researcher

He leads the Operating System team of the Systems Software department, and he is the co-founder of the OS Kernel team in Alibaba Group.He has been working as an OS R&D engineer at Oracle and Alibaba for sixteen years and accumulated rich experience in filesystems, memory management, and generic block layer. He is a distinguished Linux kernel engineer in China and has been invited to multiple well-known academic conferences on Linux and kernels as the guest speaker.

Zheng LiuOS Expert

He is a member of CCF-TCSS. After joining Alibaba in 2011, he has been responsible for the product design and R&D in the OS, infrastructure for cloud computing, and cloud-native infrastructure system. His research areas include the next-generation cloud-edge synergy OS and MLSys.

Zhuo SongOS Kernel Computing Researcher

He is a member of CCF-TCSS. He conducted research in the kernel networking after joining Alibaba in 2014. His research interests now include the OS kernel, heterogeneous computing, hardware/software co-design, and high-performance network. Zhuo has published multiple papers on international systems conferences such as ATC, SC, and ASPLOS.


Academic Achievements
Paper
  • Weihao Cui,Han Zhao,Quan Chen,Ningxin Zheng,Jingwen Leng,Jieru Zhao,Zhuo Song,Tao Ma,Yong Yang,Chao Li,Minyi Guo. Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction. [SC'21]
  • Teng Ma, Kang Chen, Shaonan Ma, Zhuo Song, Yongwei Wu. Thinking More about RDMA Memory Semantics. [CLUSTER'21]
  • Shuai Xue, Shang Zhao, Quan Chen, Zhuo Song, Tao Ma, Shanpei Chen, Yong Yang, Wenli Zheng and Minyi Guo. Kronos: Towards Bus Contention Aware Job Scheduling in Public Clouds. [FCS'21]
  • Shuai Xue, Shang Zhao, Quan Chen, Gang Deng, Zheng Liu, Jie Zhang, Zhuo Song, Tao Ma, Yong Yang, Zhou Yanbo, Niu Keqiang, Sun Sijie, Minyi Guo. Spool: Reliable Virtualized NVMe Storage Pool in Public Cloud Infrastructure. ATC 2020.
  • Mingyu Wu, Ziming Zhao, Yanfei Yang, Haoyu Li, Haibo Chen, Binyu Zang, Haibing Guan, Sanhong Li, Chuansheng Lu, Tongbao Zhang. Platinum: A CPU-Efficient Concurrent Garbage Collector for Tail-Reduction of Interactive Services. ATC 2020.
  • Quan Chen, Shuai Xue, Shang Zhao, Shanpei Chen, Zhuo Song, Huan Ding, Yu Xu, Tao Ma, Yong Yang, Minyi Guo. Alita: Comprehensive Performance Isolation through Bias Resource Management for Public Clouds. [SC'20]
  • Quan Chen, Shuai Xue, Shang Zhao, Shanpei Chen, Zhuo Song, Huan Ding, Yu Xu, Tao Ma, Yong Yang, Minyi Guo. Alita: Comprehensive Performance Isolation through Bias Resource Management for Public Clouds. SC 2020.
  • Zijun Li, Quan Chen, Shuai Xue, Tao Ma, Yong Yang, Zhuo Song, Minyi Guo. Amoeba: QoS-Awareness and Reduced Resource Usage of Microservices with Serverless Computing. IPDPS 2020.
  • Teng Ma, Mingxing Zhang, Kang Chen, Zhuo Song, Yongwei Wu, Xuehai Qian. AsymNVM: An Efficient Framework for Implementing Persistent Data Structures on Asymmetric NVM Architecture. ASPLOS 2020.
  • Wei Zhang, Ningxin Zheng, Quan Chen, Yong Yang, Zhuo Song, Tao Ma, Jingwen Leng, Minyi Guo. URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds. ICPP 2020.
  • Sa Wang, Yanhai Zhu, Shanpei Chen, Tian-Ze Wu, Wenjie Li, Xusheng Zhan, Haiyang Ding, Weisong Shi, Yungang Bao. A Case for Adaptive Resource Management in Alibaba Datacenter Using Neural Networks. JCST 2020.
  • Teng Ma, Tao Ma, Zhuo Song, Jingxuan Li, Huaixin Chang, Kang Chen, Hai Jiang, Yongwei Wu. X-RDMA: Effective RDMA Middleware in Large-scale Production Environments. CLUSTER 2019.
  • Heyuan Shi, Runzhe Wang, Ying Fu, Mingzhe Wang, Xiaohai Shi, Xun Jiao, Houbing Song, Yu Jiang, Jiaguang Sun. Industry Practice of Coverage-Guided Enterprise Linux Kernel Fuzzing. ESEC/FSE 2019.
  • Shiyou Huang, Jianmei Guo, Sanhong Li, Xiang Li, Yumin Qi, Kingsum Chow, Jeff Huang. SafeCheck: Safety Enhancement of Java Unsafe API. ICSE 2019.
  • Fangxi Yin, Denghui Dong, Chuansheng Lu, Tongbao Zhang, Sanhong Li, Jianmei Guo, Kingsum Chow. Cloud-Scale Java Profiling at Alibaba. ICPE 2018.
  • Fangxi Yin, Denghui Dong, Sanhong Li, Jianmei Guo, Kingsum Chow. Java Performance Troubleshooting and Optimization at Alibaba. ICSE 2018 SEIP.
  • Shuai Xue, Shang Zhao, Quan Chen, Gang Deng, Zheng Liu, Jie Zhang, Zhuo Song, Tao Ma, Yong Yang, Zhou Yanbo, Niu Keqiang, Sun Sijie, Minyi Guo. Spool: Reliable Virtualized NVMe Storage Pool in Public Cloud Infrastructure. [ATC’20]
Expand

Scan QR code
关注Ali TechnologyWechat Account