OB优化器的三步走

转自：OB技术深潜

问题描述：

业务反馈在某套ob 3.2.3.3集群抓到某个多表关联的语句执行很慢，查看执行计划发现是驱动表选择错误，使用leading指定合适的表驱动顺序后SQL执行很快，随后对驱动表选择错误问题进行分析。

原因分析：

1、查看原执行计划，可以发现驱动表R表过滤后的预估行数是1，所以优化器认为该表作为驱动表执行后的cost成本最低

Query Plan: =================================================================================|ID|OPERATOR                        |NAME                        |EST. ROWS|COST|---------------------------------------------------------------------------------|0 |SCALAR GROUP BY                 |                            |1        |174 ||1 | PX COORDINATOR                 |                            |1        |174 ||2 |  EXCHANGE OUT DISTR            |:EX10001                    |1        |173 ||3 |   MERGE GROUP BY               |                            |1        |173 ||4 |    NESTED-LOOP JOIN            |                            |1        |173 ||5 |     NESTED-LOOP JOIN           |                            |1        |130 ||6 |      EXCHANGE IN DISTR         |                            |1        |100 ||7 |       EXCHANGE OUT DISTR (PKEY)|:EX10000                    |1        |100 ||8 |        PX PARTITION ITERATOR   |                            |1        |100 ||9 |         TABLE SCAN             |R(IDX_STUDENTER_SCHDATE)    |1        |100 ||10|      PX PARTITION ITERATOR     |                            |1        |29  ||11|       TABLE SCAN               |S(IDX_TEACHERCER_TEAHNUMBER)|1        |29  ||12|      DISTRIBUTED TABLE SCAN    |C(PK_CM_CU_SCHOOLER)        |2        |5   |=================================================================================

2、单独对R表进行分析，谓词过滤后的返回行数有14万行，拿14万行作为嵌套循环的驱动表必定是灾难性的根本跑不出来

select count(*) from tbcs.studenter rwhere r.schdate >= sysdate - NUMTODSINTERVAL('90','DAY')and r.schoolid in ('OpenStudeXS')\G+----------+| COUNT(*) |+----------+|   145532 |+----------+1 row in set (0.099 sec)
*************************** 1. row ***************************Query Plan: ======================================================================|ID|OPERATOR                 |NAME                    |EST. ROWS|COST|----------------------------------------------------------------------|0 |SCALAR GROUP BY          |                        |1        |801 ||1 | PX COORDINATOR          |                        |1        |801 ||2 |  EXCHANGE OUT DISTR     |:EX10000                |1        |801 ||3 |   MERGE GROUP BY        |                        |1        |801 ||4 |    PX PARTITION ITERATOR|                        |1        |801 ||5 |     TABLE SCAN          |R(IDX_studenter_schdate)|1        |801 |======================================================================
Outputs & filters: -------------------------------------  0 - output([T_FUN_COUNT_SUM(T_FUN_COUNT(*)(0x7e5548c1d6f0))(0x7e5548cb1ed0)]), filter(nil),       group(nil), agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT(*)(0x7e5548c1d6f0))(0x7e5548cb1ed0)])  1 - output([T_FUN_COUNT(*)(0x7e5548c1d6f0)]), filter(nil)  2 - output([T_FUN_COUNT(*)(0x7e5548c1d6f0)]), filter(nil), dop=1  3 - output([T_FUN_COUNT(*)(0x7e5548c1d6f0)]), filter(nil),       group(nil), agg_func([T_FUN_COUNT(*)(0x7e5548c1d6f0)])  4 - output([remove_const(1)(0x7e5548cbfe40)]), filter(nil),       force partition granule.  5 - output([remove_const(1)(0x7e5548cc0690)]), filter(nil),       access([R.schoolid(0x7e5548cbf4b0)]), partitions(p40, p81, p122, p163, p204, p245, p286, p327, p368, p409, p450, p491, p532, p573, p614, p655, p696, p737, p[786-803]),       is_index_back=false,       range_key([R.schoolid(0x7e5548c1d270)], [R.schdate(0x7e5548c16f10)], [R.STATUS(0x7e5548c92080)], [R.REGION(0x7e5548c16c20)], [R.__pk_increment(0x7e5548c823b0)]),       range(OpenStudeXS,2025-08-10 06:23:34,MIN,MIN,MIN ; OpenStudeXS,NULL,MIN,MIN,MIN),       range_cond([R.schdate(0x7e5548c16f10) >= ?(0x7e5548c193e0)], [R.schoolid(0x7e5548c1d270) IN (?, ?)(0x7e5548c1c590)(0x7e5548c1bba0)])

小结：

查看执行计划详细信息partitions部分扫描的分区，结合下面查询表分区统计信息的结果，可以发现该sql扫描了表上的所有分区，优化器会在扫描的所有分区中指定行数最多的分区进行估算。

OB优化器估行规则：

优化器估行时依赖的统计信息可以从以下3个地方获取，其优先级为:存储层估行> 直方图统计信息> 基础统计信息(合并时搜集)

est_method：存储层采样方式。有 local_storage、remote_storage、basic_stat 三种。目前仅支持对本地存储进行采样，对于数据分区在远程节点上的情况，只能使用默认收集的统计信息进行代价估计，可能会引入代价偏差。

physical_range_rows：存储层对表估行估出的物理行数，优化器算 cost 参考的行数，当多分区表全表扫描时，优化器会根据数据量最大的分区的行数*分区数(扫描的分区数)来估算。

--表的分区统计信息（预估行数最多的分区是 P_451_202507）select a.svr_ip, b.part_id, b.part_name, a.row_count  from __all_virtual_partition_table a        left join       __all_virtual_partition_item b        on a.partition_id = b.partition_id and a.table_Id = b.table_id where a.table_id = 1100611139649270  and a.role=1   order by part_id;+---------------+---------+--------------+-----------+| svr_ip        | part_id | part_name    | row_count |+---------------+---------+--------------+-----------+| 134.85.20.134 |      40 | P_430_MAX    |   2654052 || 192.92.80.110 |      81 | P_431_MAX    |   4626111 || 134.85.20.134 |     122 | P_432_MAX    |   1243404 || 134.85.20.134 |     163 | P_433_MAX    |   3479456 || 192.92.80.110 |     204 | P_434_MAX    |   2575294 || 192.92.80.110 |     245 | P_435_MAX    |    648928 || 192.92.80.110 |     286 | P_436_MAX    |   1025867 || 134.85.20.134 |     327 | P_451_MAX    |         0 || 134.85.20.134 |     368 | P_438_MAX    |   1249465 || 134.85.20.134 |     409 | P_439_MAX    |    250290 || 134.85.20.134 |     450 | P_443_MAX    |    479238 || 134.85.20.134 |     491 | P_446_MAX    |    780327 || 134.85.20.134 |     532 | P_631_MAX    |   2255353 || 192.92.80.110 |     573 | P_632_MAX    |    315325 || 192.92.80.110 |     614 | P_633_MAX    |    314753 || 134.85.20.134 |     655 | P_634_MAX    |         0 || 134.85.20.134 |     696 | P_635_MAX    |   2425383 || 134.85.20.134 |     737 | P_888_MAX    |         0 || 192.92.80.110 |     786 | P_451_202507 |  10622506 |--统计信息行数最多的分区| 192.92.80.110 |     787 | P_888_202507 |    668884 || 192.92.80.110 |     788 | P_451_202508 |   9014439 || 192.92.80.110 |     789 | P_888_202508 |   1357195 || 192.92.80.110 |     790 | P_451_202509 |   9307493 || 192.92.80.110 |     791 | P_888_202509 |   3723321 || 192.92.80.110 |     792 | P_451_202510 |   9014622 || 192.92.80.110 |     793 | P_888_202510 |   3415043 || 192.92.80.110 |     794 | P_451_202511 |   2425119 || 192.92.80.110 |     795 | P_888_202511 |    128298 || 192.92.80.110 |     796 | P_451_202512 |         0 || 192.92.80.110 |     797 | P_888_202512 |         0 || 192.92.80.110 |     798 | P_451_202601 |         0 || 192.92.80.110 |     799 | P_888_202601 |         0 || 192.92.80.110 |     800 | P_451_202602 |         0 || 192.92.80.110 |     801 | P_888_202602 |         0 || 192.92.80.110 |     802 | P_451_202603 |         0 || 192.92.80.110 |     803 | P_888_202603 |         0 |+---------------+---------+--------------+-----------+

3、代入过滤查询此分区没有数据返回，优化器就会估出1行

select count(*) from tbcs.studenter partition(P_451_202507) rwhere r.schdate >= sysdate - NUMTODSINTERVAL('90','DAY')and r.schoolid in ('OpenStudeXS');+----------+| COUNT(*) |+----------+|        0 |+----------+

此时就已明了优化器为何出错，由于该笔是一个分区表，我们可以加上一个分区键从而减少分区扫描范围，确定一个有数据的分区做正常估行验证

explain extended select count(*) from tbcs.studenter rwhere r.schdate >= sysdate - NUMTODSINTERVAL('90','DAY') and region=451and r.schoolid in ('OpenStudeXS')\G
*************************** 1. row ***************************Query Plan: ========================================================================|ID|OPERATOR                 |NAME                    |EST. ROWS|COST  |------------------------------------------------------------------------|0 |SCALAR GROUP BY          |                        |1        |211261||1 | PX COORDINATOR          |                        |1        |202141||2 |  EXCHANGE OUT DISTR     |:EX10000                |1        |202141||3 |   MERGE GROUP BY        |                        |1        |202141||4 |    PX PARTITION ITERATOR|                        |238940   |193022||5 |     TABLE SCAN          |R(IDX_studenter_schdate)|238940   |193022|========================================================================
Outputs & filters: -------------------------------------  0 - output([T_FUN_COUNT_SUM(T_FUN_COUNT(*)(0x7e6ea34e9bf0))(0x7e6ea3571600)]), filter(nil),       group(nil), agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT(*)(0x7e6ea34e9bf0))(0x7e6ea3571600)])  1 - output([T_FUN_COUNT(*)(0x7e6ea34e9bf0)]), filter(nil)  2 - output([T_FUN_COUNT(*)(0x7e6ea34e9bf0)]), filter(nil), dop=1  3 - output([T_FUN_COUNT(*)(0x7e6ea34e9bf0)]), filter(nil),       group(nil), agg_func([T_FUN_COUNT(*)(0x7e6ea34e9bf0)])  4 - output([remove_const(1)(0x7e6ea3581290)]), filter(nil),       force partition granule, asc.  5 - output([remove_const(1)(0x7e6ea3581ae0)]), filter([R.REGION(0x7e6ea34e2d90) = 451(0x7e6ea34e82c0)]),       access([R.REGION(0x7e6ea34e2d90)]), partitions(p327, p368, p788, p790, p792, p794, p796, p798, p800, p802),       is_index_back=false, filter_before_indexback[false],       range_key([R.schoolid(0x7e6ea34e97f0)], [R.schdate(0x7e6ea34e3080)], [R.STATUS(0x7e6ea3557740)], [R.REGION(0x7e6ea34e2d90)], [R.__pk_increment(0x7e6ea3543060)]),       range(OpenStudeXS,2025-08-10 06:24:57,MIN,451,MIN ; OpenStudeXS,NULL,MIN,MIN,MIN),       range_cond([R.schdate(0x7e6ea34e3080) >= ?(0x7e6ea34e5550)], [R.schoolid(0x7e6ea34e97f0) = ?(0x7e6ea34e8b00)])
Optimization Info:-------------------------------------R:table_rows:31972723, physical_range_rows:477880, logical_range_rows:477880, index_back_rows:0, output_rows:238940, est_method:remote_storage, optimization_method=cost_based, avaiable_index_name[IDX_studenter_schdate,studenter], pruned_index_name[IDX_studenter_RELATERECOID,IDX_studenter_SERVNUM,IDX_studenter_CUSTID,IDX_studenter_ORDERID,PK_CS_REC_studenter,IDX_studenter_ENTITYID,IDX_studenter_FORMNUM,IDX_studenter_GROUPID,IDX_studenter_RECOPID,IDX_studenter_RECORGID], estimation info[table_id:1100611139649489, (table_type:8, version:0-1762542036995292-1762542036995292, logical_rc:47788, physical_rc:47788), (table_type:5, version:1762542000927080-1762542000927080-1762542036995292, logical_rc:0, physical_rc:0), (table_type:0, version:1762542036995292-1762542036995292-9223372036854775807, logical_rc:0, physical_rc:0)]

查看执行计划详细信息partitions部分扫描的分区：partitions(p327, p368, p788, p790, p792, p794, p796, p798, p800, p802)

+---------------+---------+--------------+-----------+| svr_ip        | part_id | part_name    | row_count |+---------------+---------+--------------+-----------+| 192.92.80.135 |     788 | P_451_202508 |   9014439 || 192.92.80.110 |     790 | P_451_202509 |   9307493 |--统计信息行数最多的分区| 192.92.80.110 |     792 | P_451_202510 |   9014772 || 192.92.80.110 |     794 | P_451_202511 |   3386516 || 192.92.80.110 |     796 | P_451_202512 |         0 || 192.92.80.110 |     798 | P_451_202601 |         0 || 192.92.80.110 |     800 | P_451_202602 |         0 || 192.92.80.110 |     802 | P_451_202603 |         0 || 134.85.20.134 |     327 | P_451_MAX    |         0 || 134.85.20.134 |     368 | P_452_MAX    |   1249503 |+---------------+---------+--------------+-----------+

代入行数最多的分区P_451_202509做正常估行验证

select count(*) from tbcs.studenter partition(P_451_202509) rwhere r.schdate >= sysdate - NUMTODSINTERVAL('90','DAY') and region=451and r.schoolid in ('OpenStudeXS');+----------+| COUNT(*) |+----------+|    48813 |+----------+