Tips for DBMS_STATS (Part 2/2)

The second part of this post includes taking statistics on partitioned objects and how aggregated statistics work.

If you want more detail on DBMS_STATS go to the first part of this post, where you will find info on stale statistics and how to take them in parallel, since parallel statistcs works great on partitioned tables it is highly advisable to read it.

The statistics on partitioned objects are a little different and a little bit more difficult to maintain when they grow, they not only take longer to be analyzed, but because these objects are commonly the largest ones in the database the process needs more temporary space.

On partitioned tables there are two statistics: GLOBAL and LOCAL, if when we take the statistics we provide one partition (using the PARTNAME parameter) the global statistics will be also collected.

The GRANULARITY parameter will allow you to decide the level of stats to be taken, GLOBAL as a whole object, PARTITION or SUBPARTITION.

Options for the GRANULARITY parameter:

  • 'ALL' – gathers all (subpartition, partition, and global) statistics
  • 'AUTO'– determines the granularity based on the partitioning type. This is the default value.
  • 'DEFAULT' – gathers global and partition-level statistics. This option is obsolete, and while currently supported, it is included in the documentation for legacy reasons only. You should use the ‘GLOBAL AND PARTITION‘ for this functionality. Note that the default value is now ‘AUTO‘.
  • 'GLOBAL' – gathers global statistics
  • GLOBAL AND PARTITION‘ – gathers the global and partition level statistics. No subpartition level statistics are gathered even if it is a composite partitioned object.
  • 'PARTITION ‘- gathers partition-level statistics
  • 'SUBPARTITION' – gathers subpartition-level statistics.

NOTE: Global statistics will not collect individual statistics for partitions and subpartitions.

If we choose PARTITION or SUBPARTITION the stats will be taken in a fraction of the time, but there is a price for this speed, the GLOBAL stats for the whole table as an object will be missing, therefore only the queries that are using the partition key will have the stats.

Luckily, Oracle can calculate the GLOBAL stats by aggregation based on the partitions, aggregation works this way:

1st.- There should not be global stats taken ever, you will have to delete the global stats using DELETE_TABLE_STATS procedure and when you retake them at the lower levels the aggregated stats will be calculated.

2nd .- The aggregations will be taken from the lowest level (partitions and subpartitions), it is important that ALL partitions or subpartitions have to be analyzed, then the upper levels will be calculated by aggregations.

NOTE: When statistics are calculated by aggregation the quality of them is not the same, you will be missing histograms and the number of distinct values on a column can be inaccurate.

How do I know if the stats are being takes or are derived by aggregation?

When we query the view USER|ALL|DBA_TAB_STATISTICS, if the column GLOBAL_STATS is NO then the values are derived by aggregation.

SELECT table_name, partition_name P_NAME, subpartition_name SP_NAME,num_rows,global_stats
FROM user_tab_statistics
ORDER BY 1,2,3;

Example scenarios:

A.- Range partitioned table with 3 partitions: P200901, P200902, P200903 (the same example applies to HASH partition tables)

exec dbms_stats.delete_table_stats('cacosta','PART_DATE',cascade_indexes=>true, cascade_parts=>true);

1.-

Take GLOBAL stats and only one partition

Time: 21.8seconds

exec dbms_stats.gather_table_stats (‘cacosta’,’part_date’,partname=>’P200901′)

2nd.-

No GLOBAL stats and only one partition

Time: 4.2seconds

exec dbms_stats.gather_table_stats
(‘cacosta’,’part_date’,partname=>’P200901′,granularity=>’PARTITION’)

There won’t be GLOBAL stats calculated until all the lower levels are analyzed, once all the lower level stats are taken we will see the stats and the column GLOBAL_STATS on the DBA_TAB_STATISTICS view will say NO.

TABLE_NAME         PNAME         NUM_ROWS       GLOB_STATS

PART_DATE            P200901                    2667602                    YES

PART_DATE            P200902                    101                               YES

PART_DATE            P200903                    150                               YES

PART_DATE                                                2667853                        NO

B.- Subpartitioned table RANGE-LIST with 2 partitions: P200906, P200907

We can take the stats on two levels, partition or subpartition, this will depend mostly on the queries being performed, if the queries specify the partitions keys for both partition and sub partition we can take stats on the subpartitions, but both the PARTITIONS and GLOBAL stats will be aggregated with the loss of quality consequences.

Subpartitions stats are being used by the RDBMS after 10.2.0.4 only, if your subpartitions are very different in size and value distribution it is better to take the stats individually for each subpartitions.

Let’s see an example:

exec dbms_stats.delete_table_stats
(‘cacosta’,’subpart_range_list’,cascade_indexes=>true,cascade_parts=>true)

Taking statistics on PARTITION LEVEL, we will miss the subpartition stats:

exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200906′,granularity=>’partition’)

exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200907′,granularity=>’partition’)

T_NAME                    P_NAME    SP_NAME      ROWS   GLOB_STATS

SUBPART_RANGE_LIST  P200906        P200906_SP1                                  NO

SUBPART_RANGE_LIST  P200906        P200906_SP2                                  NO

SUBPART_RANGE_LIST  P200906        P200906_SP3                                  NO

SUBPART_RANGE_LIST  P200906                                             16795             YES

SUBPART_RANGE_LIST  P200907        P200907_SP1                                  NO

SUBPART_RANGE_LIST  P200907        P200907_SP2                                  NO

SUBPART_RANGE_LIST  P200907        P200907_SP3                                  NO

SUBPART_RANGE_LIST  P200907                                             19073             YES

SUBPART_RANGE_LIST                                                                   35868             NO

Now let’s take stats on SUBPARTITION LEVEL:

exec dbms_stats.delete_table_stats
(‘cacosta’,’subpart_range_list’,cascade_indexes=>true,cascade_parts=>true)

exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200906_SP1′,granularity=>’subpartition’)

exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200906_SP2′,granularity=>’subpartition’)

exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200906_SP3′,granularity=>’subpartition’)

exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200907_SP1′,granularity=>’subpartition’)

exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200907_SP2′,granularity=>’subpartition’)

exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200907_SP3′,granularity=>‘subpartition’)


T_NAME              P_NAME       SP_NAME          ROWS    GLOB_STATS

SUBPART_RANGE_LIST  P200906        P200906_SP1                      9123   YES

SUBPART_RANGE_LIST  P200906        P200906_SP2                      7658   YES

SUBPART_RANGE_LIST  P200906        P200906_SP3                      14        YES

SUBPART_RANGE_LIST  P200906                                                           16795 NO

SUBPART_RANGE_LIST  P200907        P200907_SP1                      9933   YES

SUBPART_RANGE_LIST  P200907        P200907_SP2                      9124   YES

SUBPART_RANGE_LIST  P200907        P200907_SP3                      16        YES

SUBPART_RANGE_LIST  P200907                                                           19073 NO

SUBPART_RANGE_LIST                                                                                 35868 NO

Here we can clearly see the size differences for each subpartitions, this info was missing when taking stats on the partition level and can affect the decisions taken by the Cost Based Optimizer (CBO).

C.- Subpartitioned table RANGE-HASH with 3 partitions: P200901, P200902, P200903

When we have subpartitioned tables we can have the GLOBAL stats aggregated by taking stats on all subpartitions or in all partitions. Since they are HASH subpartitions it does not make much sense to take them individually, so we take partitions stats:

exec dbms_stats.delete_table_stats
(‘cacosta’,’subpart_date’,cascade_indexes=>true, cascade_parts=>true)

exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_DATE’,partname=>’P200901′,granularity=>’partition’)

exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_DATE’,partname=>’P200902′,granularity=>’partition’)

exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_DATE’,partname=>’P200903′,granularity=>’partition’)

T_NAME   P_NAME     SP_NAME         ROWS    GLOB_STATS

SUBPART_DATE    P200901        P200901_S1                                               NO

SUBPART_DATE    P200901        P200901_S2                                               NO

SUBPART_DATE    P200901        P200901_S3                                               NO

SUBPART_DATE    P200901        P200901_S4                                               NO

SUBPART_DATE    P200901                                          2548357                    YES

SUBPART_DATE    P200902        P200902_S1                                              NO

SUBPART_DATE    P200902        P200902_S2                                             NO

SUBPART_DATE    P200902        P200902_S3                                             NO

SUBPART_DATE    P200902        P200902_S4                                             NO

SUBPART_DATE    P200902                                                     15992             YES

SUBPART_DATE    P200903        P200903_S1                                            NO

SUBPART_DATE    P200903        P200903_S2                                           NO

SUBPART_DATE    P200903        P200903_S3                                           NO

SUBPART_DATE    P200903        P200903_S4                                           NO

SUBPART_DATE    P200903                                                   15992             YES

SUBPART_DATE                                               2580341                  NO

Conclusions: If you can afford to take GLOBAL stats then do it, you will have complete statistics for all your database, just set the GRANULARITY parameter to AUTO, this will also avoid stats problems when queries do not specify the partition key.

Aggregated stats have worked out very well for me, but that might not be the case on all databases.

Here is the official documentation for oracle10g and oracle11g

Advertisements
This entry was posted in Oracle Server, Performance & Tuning and tagged , , , , , , , . Bookmark the permalink.

15 Responses to Tips for DBMS_STATS (Part 2/2)

  1. Tnelson says:

    I usually don’t post on Blogs but ya forced me to, great info.. excellent! … I’ll add a backlink and bookmark your site.

  2. RobD says:

    Hey, I read a lot of blogs on a daily basis and for the most part, people lack substance but, I just wanted to make a quick comment to say GREAT blog!…..I”ll be checking in on a regularly now….Keep up the good work! 🙂 🙂

  3. Nick says:

    Hi, Thanks for the blog entry.

    I have a quick question re: the gathering of subpartition stats. Do you know if this works on 10.2.03?

    I am trying this, and I get a ORA-20001 invalid identifier error.

    I have a vague memory of reading somewhere that this does not work before 10.2.04, but of course, can’t find that again.

    Cheers.

    • carlosacostaalamo says:

      Hi, Are you referring to a particular section in the post? If I am not wrong this post should run OK in any 10.2.X.X version, please publish here the command that is giving you the error to try it myself.

  4. Nick says:

    Sorry about the delay replying, I got tied up with other things.

    Anyway, I seem to have found my problem . The first SQL that did not work was as follows:


    begin
    dbms_stats.gather_table_stats(‘REPDB’,’TRAN_GPT_DELTA’,CASCADE=>TRUE ,granularity=>’SUBPARTITION’ , partname=>’2009_04_30_5372′,
    method_opt=>’FOR ALL COLUMNS SIZE 1′ ,estimate_percent=>1,degree=>4);
    end;

    The problem seems to be the name I supply for the partition, this needs to be in double quotes for some reason. As far as I can tell, I have not created these partition names with ” ” around them. In fact, looking in the user_subpart_col_statistics table, there are no “”‘s around the subpartition names.

    The SQL that works is below:

    begin
    dbms_stats.gather_table_stats(‘REPDB’,’TRAN_GPT_DELTA’,CASCADE=>TRUE ,granularity=>’SUBPARTITION’ , partname=>'”2009_04_30_5372″‘,
    method_opt=>’FOR ALL COLUMNS SIZE 1’ ,estimate_percent=>1,degree=>4);
    end;

  5. Pingback: Tips for DBMS_STATS (part 1/2) « Oracle Examples

  6. ramesh says:

    You have provide a valuable information. Thank you. I have a question in regards to gathering stats on one partition with global stats. So, I ran the following but the dba_tab_partitions show GLOBAL =YES for this partition, I am confused. Please let me know your thoughts.

    execute DBMS_STATS.GATHER_TABLE_STATS(‘PROD1,’T_TRANSACTION’,granularity =>’PARTITION’,partname => ‘P_20101004’,estimate_percent => 35 ,degree => 4, CASCADE => TRUE);

    • Hi,
      It’s OK to see GLOBAL_STATS=YES when you take the stats on a particular partition or sub partition, now, when there are stats at the table level or partition level (with sub partitions) and GLOBAL_STATS=NO it means that the stats have been calculated by aggregation.
      Hope this helps to clarify your confusion
      Carlos

  7. Julien M says:

    Thank you for your great explanation. it is clear now

  8. We’re a gaggle of volunteers and opening a brand new scheme in our community. Your website provided us with valuable information to work on. You’ve done a formidable task and our entire neighborhood will
    be thankful to you.

  9. Khattak says:

    Thank you for such a nice post.
    Due to huge size of tables (in TB), I am taking status on granularity =>’PARTITION’.
    My question is “Should I take global_stats for table as well”, after completing on partition level?

    • Hi, If your queries specify the partition key then they will have the stats available as partition stats are being taken, other queries NOT USING the partition key will lack the global stats and performance can be impacted.
      You can:
      Take global stats all the time, requires a lot of time and temp space but stats will be more accurate, or
      Delete the global stats and retake all the stats for all the partitions (granularity=>’PARTITION’) and the GLOBAL will be aggregated, then keep taking only PARTITION stats.
      in 11g you have a new granularity option: ‘APPROX_GLOBAL AND PARTITION’, this one will take the global stats by using aggregation.
      You can query the view USER|ALL|DBA_TAB_STATISTICS, if the column GLOBAL_STATS is NO then the values are derived by aggregation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s