When Statistics Help with Filters

Here’s a fun little experiment (I’m using Oracle 19.0 for this).

First, let’s start off by creating a couple of tables.

-- The table definition below creates something that looks
-- like this:  
--   (null), 'X'
--        1, 'X'
--   (null), 'X'
--        2, 'X'
--   (null), 'X'
--        3, 'X'
--   ...etc
create table table_a
as
select /*+ no_gather_optimizer_statistics */ 
       case when mod(rn * 10, 10) = 5 
            then null 
            else rn end id
    , 'X' x
from (select rownum / 2 rn
      from dual
      connect by level <= 200);


-- The table definition below creates something that looks
-- like this:
-- 1, 'Y'
-- 1, 'Y'
-- 2, 'Y'
-- 2, 'Y'
-- 3, 'Y'
-- 3, 'Y'
-- ...etc
create table table_b
as
select  /*+ no_gather_optimizer_statistics */ y.rn id
     , 'Y' y
from (select 'x' 
      from dual
      connect by level <= 2) x
   , (select rownum rn 
      from dual
      connect by level <= 100) y
order by y.rn;

Cool! Now that we got our tables, let’s join them in a query and look a the plan.

explain plan for
select *
from table_a a
   , table_b b
where a.id = b.id;

select *
from table(dbms_xplan.display());

Here’s the plan:

Plan hash value: 2348051686
 
------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         |   200 |  6400 |     4   (0)| 00:00:01 |
|*  1 |  HASH JOIN         |         |   200 |  6400 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| TABLE_A |   200 |  3200 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| TABLE_B |   200 |  3200 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("A"."ID"="B"."ID")
 
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

The main thing is that step 2 in our plan expects 200 rows, and has no filters. If it had a filter, it would have an asterisk in front of the number 2, the same way that step 1 does.

So now that we’ve created our tables, let’s gather stats!

begin
  dbms_stats.gather_table_stats(
     ownname => user
    , tabname => 'TABLE_A'
    , cascade => true
  );
  
  dbms_stats.gather_table_stats(
     ownname => user
    , tabname => 'TABLE_B'
    , cascade => true
  );
end;
/

So when we look at the same query now…

explain plan for
select *
from table_a a
   , table_b b
where a.id = b.id;

select *
from table(dbms_xplan.display());

Notice that there’s a difference:

Plan hash value: 2348051686
 
------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         |   200 |  1800 |     4   (0)| 00:00:01 |
|*  1 |  HASH JOIN         |         |   200 |  1800 |     4   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| TABLE_A |   100 |   400 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| TABLE_B |   200 |  1000 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("A"."ID"="B"."ID")
   2 - filter("A"."ID" IS NOT NULL)

Suddenly, Step 2 of our plan has a filter (as denoted by the asterisk in front). And, that step is only expecting 100 rows, instead of 200. Oracle has added an invisible WHERE A.ID IS NOT NULL predicate “under the hood,” which causes our hash join to process fewer records.

So as a result of getting proper stats on our table, it actually made our table scan/hash join more efficient. Statistics told Oracle to filter out all records with null IDs, before passing them up the execution plan.

Stats not only help the optimizer get accurate statistics for an execution plan; they can also tell the optimizer where to add filters in a query. This can help to make the query more efficient. Specifically, in our case, it was able to filter rows out before adding them to a hash join.

Free Oracle SQL Tuning Guide

Check out my FREE guide! 7 SQL Tuning Secrets You Can Use Immediately, Even If You’ve Never Tuned a Query In Your Life!

Get it here: tuningsql.com/secrets

Free Oracle SQL Tuning Guide

Leave a Reply Cancel reply