Date: 2025-05-15
Time: 10:30–11:20
Room: Breakout Room 3
Level: Intermediate
The PostgreSQL optimizer works well and generates optimal plans for most cases, but for some specific queries, when the optimizer underestimates cardinality, it can generate a query plan that is so suboptimal that the execution time can be prolonged. In some extreme cases, the query execution time becomes so long that it becomes virtually impossible to wait for the query to complete to use the EXPLAIN ANALYZE tool to figure out what went wrong. To make matters worse, the PostgreSQL optimizer does not remember its cardinality estimation errors, so it will likely generate the same query plan over and over again unless something changes: statistics, optimizer settings, or something in the internal state of the database. The PostgreSQL community has proposed advanced statistics and a number of additional optimizations, such as incremental sorting and materialization of intermediate query results. In addition, there are some extensions such as AQO [0], replan [1] that helped the optimizer to cope with this problem. The first one remembers the real cardinality information and shares it with the planner when the query plan is generated again. The second one can completely replan the query after interrupting the execution of the current non-optimal query plan and storing the information from the partially executed query plan. The information is collected by analyzing the query execution tree with the real cardinality, and in the next attempt of the replanning process, the planner uses this knowledge, which can allow it to build a fairer query plan based on the actual data. I would like to explain the mechanism of switching between query plans. As in the case of Adaptive Query Processing in Microsoft SQL Server [2], Switch Join switches from the Nested Loop node to a backup node with a more efficient algorithm - HashJoin or MergeJoin, if an underestimation error occurs. We execute the left subnode of NestedLoop and after scanning it, when we can accurately determine the underestimation error, we switch to the identical node of HashJoin and MergeJoin, and save the results using materialization. This can lead to faster query execution, since we switch to a more efficient algorithm for big data in time. We will talk about the design principles of this technology in the PostgreSQL optimizer, its settings, how this mechanism can be applied in the mechanism in life. We will also show examples of how this mechanism can be used.
[0] https://www.pgevents.ca/events/pgconfdev2024/schedule/session/147-adaptive-query-optimization-in-pos... [1] https://habr.com/ru/companies/postgrespro/articles/819911/ [2] https://www.sqlservercentral.com/blogs/adaptive-query-processing-and-automatic-tuning-adaptive-joins-2