Research Focus
  • OLTP (online transaction processing) and HTAP (online hybridtransactions and analytical processing) Engines

Inspired by the single-node architecture, research hasfocused on storing data onto multiple nodes by means of shared storage to achieve scaled-up  extension and transaction processing capabilities. For multi-node cluster architecture, research is focused on using thesharding technology to distribute database in order to realize scaled-out (scaling out for clusters) transaction processing capabilities. Under this architecture, schedulingtransactions and concurrency and control are key challenges to achieve the read/write consistency of data using GTM. The HTAP engine implements the capability to perform both transaction processing and analytical processing on a single piece of data.

  • Multi-model, OLAP (Online Analytical Processing), and NoSQL/NewSQL Database Systems

With complex and rich  mulit-model data, database systems must perform fusion analysis, integration, and cleansing of such structured, semi-structured, and unstructured data to achieve the extraction and processing of structured features. This requires continuous improvement of the applicability, performance, and efficiency of NoSQL/NewSQL and OLAP systems.

  • Data Security and Database System Security

Built on the traditional security measures such as access control and SQL injection prevention, a core challenge is how to improve system security and data protection without sacrificing database system performance. Database systems require continuous improvements in key technologies such as encrypted data query and update (using homomorphic encryption and other related technologies), oblivious random access, and differential privacy to reduce the trade-offs between system security and efficiency. The rapid development of security hardware has also brought new opportunities for secure database systems, for example, using security hardware such as that by Intel and SGX to build a new encrypted database system.

  • Autonomous and Intelligent Databases

By analyzing the system operating environment status and log data information, the lab aims to use machine learning techniques and models  to realize dynamic system parameter  tuning and system optimizations and reduce the operation and maintenance cost of system DBAs. Using these technologies on key modules of the database system such as query and analysis optimizer, the evolution of query optimizers based on more advanced machine learning models can be realized. In addition, these technologies can help the database system establish a more accurate and efficient online early warning and real-time monitoring system to achieve smart DBA operation and maintenance control and resource allocation. Meanwhile, analytical modeling of massively structured, semi-structured, and unstructured data leads to research topics on how to implement  intelligent database systems for deep data analysis.

  • New Hardware Acceleration and Data Storage

The database system requires the research and development of the CPU/GPU/FPGA heterogeneous computing system.. When optimizing multi-core and high-concurrency data queries and analysis, developers must be aware of the system hardware architecture (such as NUMA architecture) to reduce data movement and achieves a data-centered query and analysis engine. New hardware applications (such as NVM and RDMA) have also spawned new data storage and management technologies that also require system designers to consider the separation of storage and computation.

  • Database Core Algorithms

Various directions and levels of database system design involve core algorithmic challenges such as concurrency control, data processing, system scheduling, approximate computing, unstructured data analysis, and feature extraction. Effectively solving these problems requires a combination of algorithm design consideration and the operating status and characteristics of the database system, which poses new challenges and requirements for the construction of core algorithms.

Products and Applications
  • The Big Data of Meteorological

    The China Meteorological Administration’s Meteorological Big Data Analysis Platform adopts the concurrent complex query capabilities of the OLAP engine, supported by high-throughput real-time writes and high concurrent reading and writing. These are used to realize millisecond-level query analysis of single weather station historical data based on aggregated columns and has continued to store minute-level data from 60,000 weather stations nationwide since 1957. Thus service goals that data is displayed in applications within minutes are realized since it is written into the database.

  • Postal and Real Estate Application

    Real estate developer Vanke and China Post have linearly improved the overall storage and computing capacity of the database by using core technologies including the horizontal extension of distributed databases. Through the database splitting technology provided by the distributed transaction processing engine, the lab supports core services system iterations of these two companies, greatly saving their database operation and maintenance costs.

  • National Project

    The lab supports major Chinese National projects in the public and private cloud domains, such as Shanghai City Brain, and so on.

Research Team
Feifei LiHead of Database and Storage Lab

He is a Tenured Professor of Computer Science at the University of Utah and a world's top scientist in the field of databases. For his academic and scientific achievements he has won the ACM SIGMOD 2016 Best Paper Award, ACM SIGMOD 2015 Best System Presentation Award, IEEE ICDE 2014 10 The Most Influential Papers Award, Hewlett-Packard's 2011 and 2012 Global R&D Awards, Google Faculty Award in 2015, Visa Faculty Award in 2017, ACM Distinguished Member in 2018, and the US NSF Career Award. He has hosted and participated in many important research projects, and has served as a member of the editorial board and the chairman of many leading international academic journals and academic conferences and is a reviewer and panelists of many major projects.

Wei CaoPrincipal Engineer of Database and Storage Lab

Head of the Alibaba Cloud database team. For the past 7 years, he has focused on the independent research and development of Alibaba Cloud RDS and NoSQL products, as well as the POLARDB and HybridDB relational databases. His main research areas include distributed databases and storage systems, large-scale real-time computing, data global disaster tolerance and multi-activity, AI-based anomaly diagnosis, and AI-based operation and maintenance. He is a member of the China Computer Association Database Committee and has published numerous articles in international academic conferences and journals in the fields of databases and systems, such as SIGMOD, VLDB, and TSC.

Jason WuSenior Principal Engineer of Database and Storage Lab

Received his Ph.D. in Computer Science from The Ohio State University in 2004. He joined Alibaba Seattle in 2014, and is currently a senior director in Alibaba Cloud group, where he leads the development of storage infrastructure and cloud storage services. Before joining Alibaba, Jason was a principal manager in Microsoft Azure storage team from 2008 to 2014, a senior manager at from 2004 to 2008, and a research engineer at Institute of Computing Technology (ICT) from 1997 to 1999. His areas of interest include large-scale distributed systems and big-data processing systems.

Chaoqun ZhanPrincipal Engineer of Database and Storage Lab

Head of Alibaba Database Division's OLAP Platform. He built the large-scale online cloud analysis products AnalyticDB and Data Lake Analytics from scratch, These products is widely deployed in Alibaba Group, and Alibaba Cloud (both public and dedicated cloud). With many years of experience in massively scale data analysis platform research and development, He has served as chief architect for several Alibaba Group internal and Alibaba Cloud huge and commercial projects.

Sheng WangResearch Scientist of Database and Storage Lab

He holds a Ph.D. in Computer Science from National University of Singapore. Before joining Alibaba, he was a research fellow at Database System Lab, National University of Singapore. He has published papers in top conferences and journals in database related areas, such as VLDB, KDD and TKDE. His research interests are mainly in large-scale data management, including distributed databases, data processing engines, machine learning platforms and blockchain technologies.

Tieying,ZhangResearch Scientist of Database and Storage Lab

Tieying Zhang, staff engineer,member of CCF Technical Committee on Databases and CCF Technical Committee on Bigdata. Used to be an assistant professor at Chinese Academy of Sciences and postdoc of Carnegie Mellon University. His research interest focuses on intelligent databases and distributed system. He has published over 30 high quality papers including SIGMOD, VLDB, ICDE, TPDS.

Academic Achievements
  • AnalyticDB-V: A Hybrid Analytical Engine towards Query Fusion for Structured and Unstructured Data, by C. Wei, B. Wu, S. Wang, R. Lou, C. Zhan, F. Li, Y. Cai. VLDB 2020
  • LedgerDB: A Centralized Ledger Database for Universal Audit and Verification, by X. Yang, Y. Zhang, S. Wang, B. Yu, F. Li, Y. Li, W. Yan. VLDB 2020
  • Diagnosing Root Causes of Intermittent Slow Queries in Cloud Databases, by M. Ma, Z. Yin, S. Zhang, S. Wang, C. Zheng, X. Jiang, H. Hu, C. Luo, Y. Li, N. Qiu, F. Li, C. Chen, D. Pei. VLDB 2020
  • Timon: A Timestamped Event Database for Efficient Telemetry Data Processing and Analytics, by W. Cao, Y. Gao, F. Li, S. Wang, B. Lin, K. Xu, X. Feng, Y. Wang, Z. Liu, G. Zhang. SIGMOD 2020
  • Two-Level Data Compression using Machine Learning in Time Series Database, by X. Yu, Y. Peng, F. Li, S. Wang, X. Shen, H. Mai, Y. Xie. ICDE 2020
  • FPGA-Accelerated Compactions for LSM-based Key-Value Store, by T. Zhang, J. Wang, X. Cheng, H. Xu, N. Yu, G. Huang, T. Zhang, D. He, F. Li, W. Cao, Z. Huang, J. Sun. FAST 2020
  • HotRing: A Hotspot-Aware In-Memory Key-Value Store, by J. Chen, L. Chen, S. Wang, G. Zhu, Y. Sun, H. Liu, F. Li. FAST 2020
  • POLARDB Meets Computational Storage: Efficiently Support Analytical Workloads in Cloud-Native Relational Database, by W. Cao, Y. Liu, Z. Cheng, N. Zheng, W. Li, W. Wu, L. Ouyang, P. Wang, Y. Wang, R. Kuan, Z. Liu, F. Zhu, T. Zhang. FAST 2020
  • Leaper: A Learned Prefetcher for Cache Invalidation in LSM-tree based Storage Engines, by L. Yang, H. Wu, T. Zhang, X. Cheng, F. Li, L. Zou, Y. Wang, R. Chen, J. Wang, G. Huang. VLDB 2020
  • Realization of the Low Cost and High Performance MySQL Cloud Database, by W. Cao, F. Yu, J. Xie. VLDB 2014
  • TcpRT: Instrument and Diagnostic Analysis System for Service Quality of Cloud Databases at Massive Scale in Real-time, by W. Cao, Y. Gao, B. Lin, X. Feng, Y. Xie, X. Lou, P. Wang. SIGMOD 2018
  • PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database, by W. Cao, Z. Liu, P. Wang, S. Chen, C. Zhu, S. Zheng, Y. Wang, G. Ma. VLDB 2018
  • X-Engine: An Optimized Storage Engine for Large-Scale E-Commerce Transaction Processing, by G. Huang, X. Cheng, J. Wang, Y. Wang, D. He, T. Zhang, F. Li, S. Wang, W. Cao, Q. Li. SIGMOD 2019
  • X-Engine: An Optimized Storage Engine for Large-Scale E-Commerce Transaction Processing, by G. Huang, X. Cheng, J. Wang, Y. Wang, D. He, T. Zhang, F. Li, S. Wang, W. Cao, Q. Li. SIGMOD 2019
  • iBTune: Individualized Buffer Tuning for Largescale Cloud Databases, by J. Tan, T. Zhang, F. Li, J. Chen, Q. Zheng, P. Zhang, H. Qiao, Y. Shi, W. Cao, R. Zhang. VLDB 2019
  • AnalyticDB: Real-time OLAP Database System at Alibaba Cloud, by C. Zhan, M. Su, C. Wei, X. Peng, L. Lin, S. Wang, Z. Chen, F. Li, Y. Pan, F. Zheng, C. Chai. VLDB 2019
  • Cloud Native Database Systems: Challenges and Opportunities, by F. Li. VLDB 2019
  • Cao, Wei and Yu, Feng and Xie, Jiasen, Realization of the Low Cost and High Performance MySQL Cloud Database, VLDB 2014

Scan QR code
关注Ali TechnologyWechat Account