- OLTP (online transaction processing) and HTAP (online hybridtransactions and analytical processing) Engines
Inspired by the single-node architecture, research hasfocused on storing data onto multiple nodes by means of shared storage to achieve scaled-up extension and transaction processing capabilities. For multi-node cluster architecture, research is focused on using thesharding technology to distribute database in order to realize scaled-out (scaling out for clusters) transaction processing capabilities. Under this architecture, schedulingtransactions and concurrency and control are key challenges to achieve the read/write consistency of data using GTM. The HTAP engine implements the capability to perform both transaction processing and analytical processing on a single piece of data.
- Multi-model, OLAP (Online Analytical Processing), and NoSQL/NewSQL Database Systems
With complex and rich mulit-model data, database systems must perform fusion analysis, integration, and cleansing of such structured, semi-structured, and unstructured data to achieve the extraction and processing of structured features. This requires continuous improvement of the applicability, performance, and efficiency of NoSQL/NewSQL and OLAP systems.
- Data Security and Database System Security
Built on the traditional security measures such as access control and SQL injection prevention, a core challenge is how to improve system security and data protection without sacrificing database system performance. Database systems require continuous improvements in key technologies such as encrypted data query and update (using homomorphic encryption and other related technologies), oblivious random access, and differential privacy to reduce the trade-offs between system security and efficiency. The rapid development of security hardware has also brought new opportunities for secure database systems, for example, using security hardware such as that by Intel and SGX to build a new encrypted database system.
- Autonomous and Intelligent Databases
By analyzing the system operating environment status and log data information, the lab aims to use machine learning techniques and models to realize dynamic system parameter tuning and system optimizations and reduce the operation and maintenance cost of system DBAs. Using these technologies on key modules of the database system such as query and analysis optimizer, the evolution of query optimizers based on more advanced machine learning models can be realized. In addition, these technologies can help the database system establish a more accurate and efficient online early warning and real-time monitoring system to achieve smart DBA operation and maintenance control and resource allocation. Meanwhile, analytical modeling of massively structured, semi-structured, and unstructured data leads to research topics on how to implement intelligent database systems for deep data analysis.
- New Hardware Acceleration and Data Storage
The database system requires the research and development of the CPU/GPU/FPGA heterogeneous computing system.. When optimizing multi-core and high-concurrency data queries and analysis, developers must be aware of the system hardware architecture (such as NUMA architecture) to reduce data movement and achieves a data-centered query and analysis engine. New hardware applications (such as NVM and RDMA) have also spawned new data storage and management technologies that also require system designers to consider the separation of storage and computation.
- Database Core Algorithms
Various directions and levels of database system design involve core algorithmic challenges such as concurrency control, data processing, system scheduling, approximate computing, unstructured data analysis, and feature extraction. Effectively solving these problems requires a combination of algorithm design consideration and the operating status and characteristics of the database system, which poses new challenges and requirements for the construction of core algorithms.
- The Big Data of Meteorological
The China Meteorological Administration’s Meteorological Big Data Analysis Platform adopts the concurrent complex query capabilities of the OLAP engine, supported by high-throughput real-time writes and high concurrent reading and writing. These are used to realize millisecond-level query analysis of single weather station historical data based on aggregated columns and has continued to store minute-level data from 60,000 weather stations nationwide since 1957. Thus service goals that data is displayed in applications within minutes are realized since it is written into the database.
- Postal and Real Estate Application
Real estate developer Vanke and China Post have linearly improved the overall storage and computing capacity of the database by using core technologies including the horizontal extension of distributed databases. Through the database splitting technology provided by the distributed transaction processing engine, the lab supports core services system iterations of these two companies, greatly saving their database operation and maintenance costs.
- National Project
The lab supports major Chinese National projects in the public and private cloud domains, such as Shanghai City Brain, and so on.
He is a Tenured Professor of Computer Science at the University of Utah and a world's top scientist in the field of databases. For his academic and scientific achievements he has won the ACM SIGMOD 2016 Best Paper Award, ACM SIGMOD 2015 Best System Presentation Award, IEEE ICDE 2014 10 The Most Influential Papers Award, Hewlett-Packard's 2011 and 2012 Global R&D Awards, Google Faculty Award in 2015, Visa Faculty Award in 2017, ACM Distinguished Member in 2018, and the US NSF Career Award. He has hosted and participated in many important research projects, and has served as a member of the editorial board and the chairman of many leading international academic journals and academic conferences and is a reviewer and panelists of many major projects.
Head of the Alibaba Cloud database team. For the past 7 years, he has focused on the independent research and development of Alibaba Cloud RDS and NoSQL products, as well as the POLARDB and HybridDB relational databases. His main research areas include distributed databases and storage systems, large-scale real-time computing, data global disaster tolerance and multi-activity, AI-based anomaly diagnosis, and AI-based operation and maintenance. He is a member of the China Computer Association Database Committee and has published numerous articles in international academic conferences and journals in the fields of databases and systems, such as SIGMOD, VLDB, and TSC.
Received his Ph.D. in Computer Science from The Ohio State University in 2004. He joined Alibaba Seattle in 2014, and is currently a senior director in Alibaba Cloud group, where he leads the development of storage infrastructure and cloud storage services. Before joining Alibaba, Jason was a principal manager in Microsoft Azure storage team from 2008 to 2014, a senior manager at Ask.com from 2004 to 2008, and a research engineer at Institute of Computing Technology (ICT) from 1997 to 1999. His areas of interest include large-scale distributed systems and big-data processing systems.
Head of Alibaba Database Division's OLAP Platform. He built the large-scale online cloud analysis products AnalyticDB and Data Lake Analytics from scratch, These products is widely deployed in Alibaba Group, and Alibaba Cloud (both public and dedicated cloud). With many years of experience in massively scale data analysis platform research and development, He has served as chief architect for several Alibaba Group internal and Alibaba Cloud huge and commercial projects.
He served as the Server Architect at several Silicon Valley companies, responsible for large server architecture design. He joined Facebook in 2007, dedicated to software performance and architecture analysis. He created the Hip Hop project to rewrite and implement the PHP language, increasing its speed by 5 to 6 times and saving the company billions of dollars. This won him the 16th Most Creative Person of 2010 award as reviewed by Fast Company. After that, he devoted himself to the design concept of optimizing distributed systems with asynchronous processing and carried out research on the optimization of distributed databases for this purpose.
He led the team to complete the revolution from commercial database to open sourse database, called De-IOE. He also has built the multi-home datacenter architecture for Alibaba. In order to support the high performance requirements of Alibaba Double 11 shopping festival, He led team accomplished a lot of optimization on the open source database, named AliSQL and had been open sourced to community. Since 2017, he has been committed to the construction of Alibaba's new-generation database technology system.
He holds a Ph.D. in Computer Science from National University of Singapore. Before joining Alibaba, he was a research fellow at Database System Lab, National University of Singapore. He has published papers in top conferences and journals in database related areas, such as VLDB, KDD and TKDE. His research interests are mainly in large-scale data management, including distributed databases, data processing engines, machine learning platforms and blockchain technologies.
- X-Engine: An Optimized Storage Engine for Large-scale E-commerce Transaction Processing