Research Focus
  • Online Transaction Processing (OLTP) and Hybrid Transactional/Analytical Processing (HTAP) Engines

In single-node architectures, the lab uses storage and state sharing technologies to store data across multiple nodes, allowing single-node systems to scale up for transactional processing. In multi-node cluster architectures, the lab allows users to create distributed databases by means of sharding, enabling a cluster to scale out for transactional processing. The HTAP engine uses GTM to adjust transaction concurrency and control the consistency of data read and write to allow both transactional and analytical processing on the same data.

  • Multi-modal Online Analytical Processing (OLAP), NoSQL, and NewSQL Database Systems

In scenarios that involve complex and content-rich multi-modal data, database systems are required to perform query fusion, integration, and cleansing of such structured, semi-structured, and unstructured data to extract and process structured features. This requires the lab to continuously improve the applicability, performance, and efficiency of NoSQL, NewSQL, and OLAP systems.

  • Data Security and Database System Security

A challenge for traditional security protection measures, such as access control and SQL injection prevention, is the improvement of system security and data protection without compromising the performance of database systems. To improve both security and efficiency of database systems, the lab continuously improves the key technologies of database systems, such as encrypted data query and update (based on homomorphic encryption), oblivious random access, and differential privacy. The rapid development of security hardware has brought new opportunities to improve the security of database systems. For example, users can use security hardware such as Intel SGX to create a new encrypted database system.

  • Autonomous and Intelligent Databases

The lab analyzes the system operating status and log data to prepare for system modeling based on machine learning technologies. These technologies can dynamically tune system parameters and optimize systems to reduce the O&M costs of system DBAs. The use of these technologies on key database system modules, such as the query optimizer, makes it possible to evolve from rule-based optimization to cost-based optimization, and then to machine learning-based optimization. Machine learning technologies can also help implement more accurate and efficient online warning and real-time monitoring mechanisms to intelligently manage O&M tasks performed by DBAs and allocate resources. In addition, analytical modeling of large amounts of structured, semi-structured, and unstructured data has called for the research on the implementation of intelligent database systems for deep data analysis.

  • New Hardware Acceleration and Data Storage

To maximize the performance of database systems, the lab focuses on the R&D of a heterogeneous computing architecture that combines benefits of CPUs, GPUs, and FPGAs. When optimizing multi-core and high-concurrency data query and analysis tasks, developers must take note of the system hardware architecture (such as the NUMA architecture), reduce data transfer, and implement a paradigm shift from computing-centric to data-centric in storage systems. New hardware applications such as NVM and RDMA have also spawned new data storage and management technologies that require system designers to consider the separation of storage and computing.

  • Fundamental Database Algorithms and Structures

Various levels of database system designs are faced with challenges on fundamental algorithms and datastructure such as concurrency control, data processing, system scheduling, approximate query processing (AQP), unstructured data analysis, and feature extraction. Algorithm design ideas and the operating status and features of database systems must all be considered to address these challenges. This gives rise to new challenges and requirements in the construction of the fundamental algorithms and data structures.


Products and Applications
  • Application in areas such as postal service and real estate

    The lab has helped Vanke and China Post substantially improve the overall storage and computing capacity of their databases by using core capabilities such as the scalability of distributed databases. The lab has also supported the core business systems’ upgrades of these two companies by deploying the horizontal and vertical fragmentation provided by the distributed transaction processing engine. And the database operation and maintenance costs are significantly reduced.

  • Technical Support for Major National Projects

    The Database and Storage Lab supports major national projects in the public and private cloud domains, such as Shanghai City Brain and National Tax projects.


Research Team
Feifei LiHead of Database and Storage Lab

Feifei Li is a tenured professor of Computer Science at University of Utah. He is a recipient of numerous awards and honors from ACM, IEEE, Visa, Google, HP, and Huawei, which include the IEEE ICDE 2014 10 The Most Influential Papers Award, ACM SIGMOD 2016 Best Paper Award, ACM SIGMOD 2015 Best System Presentation Award, IEEE ICDE 2004 Best Paper Award, US NSF Career Award, NSFC Oversea Collaboration Grant, and ACM Distinguished Member in 2018. He has served as a member of the editorial board and the chairman of many leading international academic journals and conferences.

Jason WuSenior Principal Engineer of Database and Storage Lab

Jason Wu holds a PhD of Computer Science from Ohio State University. He joined Alibaba Seattle in 2014, and currently leads the development of storage infrastructure and cloud storage services. Before joining Alibaba, he was a principal manager in the Microsoft Azure storage team from 2008 to 2014, a senior manager at Ask.com from 2004 to 2008, and a research engineer at the Institute of Computing Technology of Chinese Academy of Science (National Research Center for Intelligent Computing Systems) from 1996 to 1999. His research fields include large-scale distributed systems and big data processing and analysis systems.

Wei CaoPrincipal Engineer of Database and Storage Lab

Wei Cao leads the Alibaba Cloud database team and is a member of CCF Technical Committee on Databases. He has published several articles in international academic conferences and journals such as SIGMOD, VLDB, and TSC. His research fields include distributed database and storage systems and large-scale real-time computing.

Windsor HsuPrincipal Engineer of Database and Storage Lab

Windsor Hsu holds a PhD of Computer Science from University of California at Berkeley. He previously served as the Chief Technology Officer of EMC Data Domain. He established the research group for the Backup Recovery Systems Division of EMC and was an EMC Distinguished Engineer. Before joining Data Domain, he was an IBM Master Inventor at IBM Almaden Research Center where he led and managed research on database and storage technologies. He has published over 30 research papers in storage-related fields and has been awarded more than 100 patents.

Chaoqun ZhanPrincipal Engineer of Database and Storage Lab

Chaoqun Zhan serves as the general manager of the OLAP product line at Database Products Business Unit, Alibaba Cloud Intelligence. He developed the large-scale online data analysis products AnalyticDB and Data Lake Analytics from scratch. With many years of experience in R&D of platforms for analysis of large amounts of data, he has served as a chief architect for several commercial big data projects of Alibaba Cloud and Apsara Stack.

Jian TanSenior staff engineer of Damo Institute of Alibaba

He was a research scientist at the IBM T.J. Watson research center and then a tenure-track assistant professor at the Electrical and Computer Engineering Department of The Ohio State University. His work was supported by NSF and won four best paper awards. He received M.S. and Ph.D. degrees from Columbia University with the Eliahu Jury Award for the best PhD thesis, and a B.S. degree from University of Science and Technology of China, all in Electrical Engineering. His research interests focus on stochastic modeling and statistical algorithms for distributed systems.

Jiong XieDirector of The Aerospace Database Team

Jiong Xie leads the aerospace database team and is a member of CCF Technical Committee on Databases. He once engaged in postdoctoral research at Zhejiang University, and later worked as an associate researcher at the State Key Laboratory of Resources and Environmental Information System of the Chinese Academy of Sciences. His research interests focus on the R&D of spatial database, remote sensing image database, moving objects database and NoSQL spatial distributed system.

Sheng WangResearch Scientist of Database and Storage Lab

Sheng Wang holds a PhD of Computer Science from National University of Singapore. Before joining Alibaba, he was a research fellow at Database System Lab, National University of Singapore. He has published papers in top academic conferences and journals in database-related fields, such as VLDB. His research focuses on the design and optimization of large-scale data management systems, which include distributed database systems, data analysis platforms, and blockchain systems.


Academic Achievements
Paper
  • AnalyticDB-V: A Hybrid Analytical Engine towards Query Fusion for Structured and Unstructured Data, by C. Wei, B. Wu, S. Wang, R. Lou, C. Zhan, F. Li, Y. Cai. VLDB 2020
  • LedgerDB: A Centralized Ledger Database for Universal Audit and Verification, by X. Yang, Y. Zhang, S. Wang, B. Yu, F. Li, Y. Li, W. Yan. VLDB 2020
  • Diagnosing Root Causes of Intermittent Slow Queries in Cloud Databases, by M. Ma, Z. Yin, S. Zhang, S. Wang, C. Zheng, X. Jiang, H. Hu, C. Luo, Y. Li, N. Qiu, F. Li, C. Chen, D. Pei. VLDB 2020
  • Timon: A Timestamped Event Database for Efficient Telemetry Data Processing and Analytics, by W. Cao, Y. Gao, F. Li, S. Wang, B. Lin, K. Xu, X. Feng, Y. Wang, Z. Liu, G. Zhang. SIGMOD 2020
  • Two-Level Data Compression using Machine Learning in Time Series Database, by X. Yu, Y. Peng, F. Li, S. Wang, X. Shen, H. Mai, Y. Xie. ICDE 2020
  • FPGA-Accelerated Compactions for LSM-based Key-Value Store, by T. Zhang, J. Wang, X. Cheng, H. Xu, N. Yu, G. Huang, T. Zhang, D. He, F. Li, W. Cao, Z. Huang, J. Sun. FAST 2020
  • HotRing: A Hotspot-Aware In-Memory Key-Value Store, by J. Chen, L. Chen, S. Wang, G. Zhu, Y. Sun, H. Liu, F. Li. FAST 2020
  • POLARDB Meets Computational Storage: Efficiently Support Analytical Workloads in Cloud-Native Relational Database, by W. Cao, Y. Liu, Z. Cheng, N. Zheng, W. Li, W. Wu, L. Ouyang, P. Wang, Y. Wang, R. Kuan, Z. Liu, F. Zhu, T. Zhang. FAST 2020
  • Leaper: A Learned Prefetcher for Cache Invalidation in LSM-tree based Storage Engines, by L. Yang, H. Wu, T. Zhang, X. Cheng, F. Li, L. Zou, Y. Wang, R. Chen, J. Wang, G. Huang. VLDB 2020
  • Realization of the Low Cost and High Performance MySQL Cloud Database, by W. Cao, F. Yu, J. Xie. VLDB 2014
  • TcpRT: Instrument and Diagnostic Analysis System for Service Quality of Cloud Databases at Massive Scale in Real-time, by W. Cao, Y. Gao, B. Lin, X. Feng, Y. Xie, X. Lou, P. Wang. SIGMOD 2018
  • PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database, by W. Cao, Z. Liu, P. Wang, S. Chen, C. Zhu, S. Zheng, Y. Wang, G. Ma. VLDB 2018
  • X-Engine: An Optimized Storage Engine for Large-Scale E-Commerce Transaction Processing, by G. Huang, X. Cheng, J. Wang, Y. Wang, D. He, T. Zhang, F. Li, S. Wang, W. Cao, Q. Li. SIGMOD 2019
  • X-Engine: An Optimized Storage Engine for Large-Scale E-Commerce Transaction Processing, by G. Huang, X. Cheng, J. Wang, Y. Wang, D. He, T. Zhang, F. Li, S. Wang, W. Cao, Q. Li. SIGMOD 2019
  • iBTune: Individualized Buffer Tuning for Largescale Cloud Databases, by J. Tan, T. Zhang, F. Li, J. Chen, Q. Zheng, P. Zhang, H. Qiao, Y. Shi, W. Cao, R. Zhang. VLDB 2019
  • AnalyticDB: Real-time OLAP Database System at Alibaba Cloud, by C. Zhan, M. Su, C. Wei, X. Peng, L. Lin, S. Wang, Z. Chen, F. Li, Y. Pan, F. Zheng, C. Chai. VLDB 2019
  • Cloud Native Database Systems: Challenges and Opportunities, by F. Li. VLDB 2019
  • Cao, Wei and Yu, Feng and Xie, Jiasen, Realization of the Low Cost and High Performance MySQL Cloud Database, VLDB 2014
Expand

Scan QR code
关注Ali TechnologyWechat Account