Intelligent Database Plan Management and Tuning
It is a common business scenario that extremely complex queries co-exist with highly concurrent simple queries within one single on-line interactive and decision-making data analytics system, i.e., OLAP (On-line Analytics Processing) system. The execution plan of each such query impacts the efficiency and stability of their business application. However, it is always challenging for DBAs to quickly determine if the execution plan of a query is optimal, and to apply the appropriate tuning approaches to optimize query and system performance while keeping system stable.
Query optimization is a typical NP-hard problem, where the conventional database optimization technologies, e.g., dynamic programming, genetic algorithm, etc., are applied to select the optimal access plan of a given query in a bounded time. Cost estimation is usually the foundation to make such a decision. However, cost estimation is determined by the cardinality estimation which heavily replies on the available statistics. In modern database systems, statistics are collected and utilized in a limited way where data skew and correlation are not handled well. As a result, the selectivity estimation of predicates, single or compound, is not accurate enough that causes cost estimation error and thus results in sub-optimal query plan and query performance problems.
Furthermore, the queries nowadays get more and more complex, along with the database system evolution where new hardware, new data sources, and new computation models are evolved in the database systems. The conventional query optimization approaches become less capable to handle such complex scenarios.
Besides query optimization, system performance also relies on many other sub-systems, e.g., workload management, resource management, database physical design, etc. Usually tuning a database system requires extensive expert experiences on many such sub-systems. However, manually tuning by such an expert is less and less feasible with thousands of database instances provided by database services on cloud, and thus raises huge challenges to cloud database service providers.
Meanwhile, artificial intelligence especially machine learning technologies in recent years reveals the promising direction in solving traditionally challenging problems in a large number of domains. Therefore, in this collaborative research project the opportunities and appropriate approaches are to be explored to solve challenging query and system optimization problems by exploiting evolving machine learning technologies.
With this collaborative research project, we aim to solve the following problems:
- A modeling framework to “intelligently learn” from historical and execution data.
- An intelligent database tuning framework to recognize and correct query plan and system issues.
- An evaluation system to quantify TCO (Total Cost of Ownership) reduction given system and DBA tuning feedback.
- A complete survey on competitivity among database vendors, mainly on system and query intelligent optimization and tuning capabilities.
- Machine learning based incremental calibration of cost model that generates better cost and resources estimation at operator as well as workload level, with given benchmarks and specific scenarios.
- Machine learning based resources and performance prediction that produces higher accuracy of prediction, with given benchmarks and specific scenarios.
- Better query plan selection with given benchmarks and specific scenarios.
- Automatic resources optimization at workload and system levels, with improved system resources utilization, and reduced cost on system computation and I/O (local as well as network).
- Autonomous system, including system and query auto tuning, system auto resources management, and auto system model calibration, auto physical design modification, etc.
- Architecture design to integrate machine learning into MPP DB system.
One or more patents and/or paper publications are expected in solving each of above problems. Detailed implementation is expected upon the mutual agreement between the research institute and Alibaba Group.
Related Research Topics
- Historical and execution data learning based auto tuning
- System evaluation of query plan and TCO
- Cardinality estimation by utilizing machine learning technologies
- Machine learning based incremental calibration of cost model
- Intelligent query plan selection
- Learning based system performance and resources prediction
- Automatic resources optimization of workload and DB system
- System and query performance auto tuning
- Autonomous DB system
- Unstructured data computation and optimization
- A complete survey on intelligent database tuning
Suggested Collaboration Method
AIR (Alibaba Innovative Research), one-year collaboration project.