The second part of this post includes taking statistics on partitioned objects and how aggregated statistics work.
If you want more detail on DBMS_STATS go to the first part of this post, where you will find info on stale statistics and how to take them in parallel, since parallel statistcs works great on partitioned tables it is highly advisable to read it.
The statistics on partitioned objects are a little different and a little bit more difficult to maintain when they grow, they not only take longer to be analyzed, but because these objects are commonly the largest ones in the database the process needs more temporary space.
On partitioned tables there are two statistics: GLOBAL and LOCAL, if when we take the statistics we provide one partition (using the PARTNAME parameter) the global statistics will be also collected.
The GRANULARITY parameter will allow you to decide the level of stats to be taken, GLOBAL as a whole object, PARTITION or SUBPARTITION.
Options for the GRANULARITY parameter:
'ALL'
– gathers all (subpartition, partition, and global) statistics'AUTO'
– determines the granularity based on the partitioning type. This is the default value.'DEFAULT'
– gathers global and partition-level statistics. This option is obsolete, and while currently supported, it is included in the documentation for legacy reasons only. You should use the ‘GLOBAL AND PARTITION
‘ for this functionality. Note that the default value is now ‘AUTO
‘.'GLOBAL'
– gathers global statistics- ‘
GLOBAL AND PARTITION
‘ – gathers the global and partition level statistics. No subpartition level statistics are gathered even if it is a composite partitioned object. 'PARTITION
‘- gathers partition-level statistics'SUBPARTITION'
– gathers subpartition-level statistics.
NOTE: Global statistics will not collect individual statistics for partitions and subpartitions.
If we choose PARTITION or SUBPARTITION the stats will be taken in a fraction of the time, but there is a price for this speed, the GLOBAL stats for the whole table as an object will be missing, therefore only the queries that are using the partition key will have the stats.
Luckily, Oracle can calculate the GLOBAL stats by aggregation based on the partitions, aggregation works this way:
1st.- There should not be global stats taken ever, you will have to delete the global stats using DELETE_TABLE_STATS procedure and when you retake them at the lower levels the aggregated stats will be calculated.
2nd .- The aggregations will be taken from the lowest level (partitions and subpartitions), it is important that ALL partitions or subpartitions have to be analyzed, then the upper levels will be calculated by aggregations.
NOTE: When statistics are calculated by aggregation the quality of them is not the same, you will be missing histograms and the number of distinct values on a column can be inaccurate.
How do I know if the stats are being takes or are derived by aggregation?
When we query the view USER|ALL|DBA_TAB_STATISTICS, if the column GLOBAL_STATS is NO then the values are derived by aggregation.
SELECT table_name, partition_name P_NAME, subpartition_name SP_NAME,num_rows,global_stats
FROM user_tab_statistics
ORDER BY 1,2,3;
Example scenarios:
A.- Range partitioned table with 3 partitions: P200901, P200902, P200903 (the same example applies to HASH partition tables)
exec dbms_stats.delete_table_stats('cacosta','PART_DATE',cascade_indexes=>true, cascade_parts=>true);
1.-
Take GLOBAL stats and only one partition
Time: 21.8seconds
exec dbms_stats.gather_table_stats
(‘cacosta’,’part_date’,partname=>’P200901′)
2nd.-
No GLOBAL stats and only one partition
Time: 4.2seconds
exec dbms_stats.gather_table_stats
(‘cacosta’,’part_date’,partname=>’P200901′,granularity=>’PARTITION’)
There won’t be GLOBAL stats calculated until all the lower levels are analyzed, once all the lower level stats are taken we will see the stats and the column GLOBAL_STATS on the DBA_TAB_STATISTICS view will say NO.
TABLE_NAME PNAME NUM_ROWS GLOB_STATS
PART_DATE P200901 2667602 YES
PART_DATE P200902 101 YES
PART_DATE P200903 150 YES
PART_DATE 2667853 NO
B.- Subpartitioned table RANGE-LIST with 2 partitions: P200906, P200907
We can take the stats on two levels, partition or subpartition, this will depend mostly on the queries being performed, if the queries specify the partitions keys for both partition and sub partition we can take stats on the subpartitions, but both the PARTITIONS and GLOBAL stats will be aggregated with the loss of quality consequences.
Subpartitions stats are being used by the RDBMS after 10.2.0.4 only, if your subpartitions are very different in size and value distribution it is better to take the stats individually for each subpartitions.
Let’s see an example:
exec dbms_stats.delete_table_stats
(‘cacosta’,’subpart_range_list’,cascade_indexes=>true,cascade_parts=>true)
Taking statistics on PARTITION LEVEL, we will miss the subpartition stats:
exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200906′,granularity=>’partition’)
exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200907′,granularity=>’partition’)
T_NAME P_NAME SP_NAME ROWS GLOB_STATS
SUBPART_RANGE_LIST P200906 P200906_SP1 NO
SUBPART_RANGE_LIST P200906 P200906_SP2 NO
SUBPART_RANGE_LIST P200906 P200906_SP3 NO
SUBPART_RANGE_LIST P200906 16795 YES
SUBPART_RANGE_LIST P200907 P200907_SP1 NO
SUBPART_RANGE_LIST P200907 P200907_SP2 NO
SUBPART_RANGE_LIST P200907 P200907_SP3 NO
SUBPART_RANGE_LIST P200907 19073 YES
SUBPART_RANGE_LIST 35868 NO
Now let’s take stats on SUBPARTITION LEVEL:
exec dbms_stats.delete_table_stats
(‘cacosta’,’subpart_range_list’,cascade_indexes=>true,cascade_parts=>true)
exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200906_SP1′,granularity=>’subpartition’)
exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200906_SP2′,granularity=>’subpartition’)
exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200906_SP3′,granularity=>’subpartition’)
exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200907_SP1′,granularity=>’subpartition’)
exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200907_SP2′,granularity=>’subpartition’)
exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_RANGE_LIST’,partname=>’P200907_SP3′,granularity=>‘subpartition’)
T_NAME P_NAME SP_NAME ROWS GLOB_STATS
SUBPART_RANGE_LIST P200906 P200906_SP1 9123 YES
SUBPART_RANGE_LIST P200906 P200906_SP2 7658 YES
SUBPART_RANGE_LIST P200906 P200906_SP3 14 YES
SUBPART_RANGE_LIST P200906 16795 NO
SUBPART_RANGE_LIST P200907 P200907_SP1 9933 YES
SUBPART_RANGE_LIST P200907 P200907_SP2 9124 YES
SUBPART_RANGE_LIST P200907 P200907_SP3 16 YES
SUBPART_RANGE_LIST P200907 19073 NO
SUBPART_RANGE_LIST 35868 NO
Here we can clearly see the size differences for each subpartitions, this info was missing when taking stats on the partition level and can affect the decisions taken by the Cost Based Optimizer (CBO).
C.- Subpartitioned table RANGE-HASH with 3 partitions: P200901, P200902, P200903
When we have subpartitioned tables we can have the GLOBAL stats aggregated by taking stats on all subpartitions or in all partitions. Since they are HASH subpartitions it does not make much sense to take them individually, so we take partitions stats:
exec dbms_stats.delete_table_stats
(‘cacosta’,’subpart_date’,cascade_indexes=>true, cascade_parts=>true)
exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_DATE’,partname=>’P200901′,granularity=>’partition’)
exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_DATE’,partname=>’P200902′,granularity=>’partition’)
exec dbms_stats.gather_table_stats
(‘cacosta’,’SUBPART_DATE’,partname=>’P200903′,granularity=>’partition’)
T_NAME P_NAME SP_NAME ROWS GLOB_STATS
SUBPART_DATE P200901 P200901_S1 NO
SUBPART_DATE P200901 P200901_S2 NO
SUBPART_DATE P200901 P200901_S3 NO
SUBPART_DATE P200901 P200901_S4 NO
SUBPART_DATE P200901 2548357 YES
SUBPART_DATE P200902 P200902_S1 NO
SUBPART_DATE P200902 P200902_S2 NO
SUBPART_DATE P200902 P200902_S3 NO
SUBPART_DATE P200902 P200902_S4 NO
SUBPART_DATE P200902 15992 YES
SUBPART_DATE P200903 P200903_S1 NO
SUBPART_DATE P200903 P200903_S2 NO
SUBPART_DATE P200903 P200903_S3 NO
SUBPART_DATE P200903 P200903_S4 NO
SUBPART_DATE P200903 15992 YES
SUBPART_DATE 2580341 NO
Conclusions: If you can afford to take GLOBAL stats then do it, you will have complete statistics for all your database, just set the GRANULARITY parameter to AUTO, this will also avoid stats problems when queries do not specify the partition key.
Aggregated stats have worked out very well for me, but that might not be the case on all databases.
Here is the official documentation for oracle10g and oracle11g
I usually don’t post on Blogs but ya forced me to, great info.. excellent! … I’ll add a backlink and bookmark your site.
Thanks, I appreciate your comment. Carlos
Hey, I read a lot of blogs on a daily basis and for the most part, people lack substance but, I just wanted to make a quick comment to say GREAT blog!…..I”ll be checking in on a regularly now….Keep up the good work! 🙂 🙂
Thanks for the comment, it is encouraging!
Hi, Thanks for the blog entry.
I have a quick question re: the gathering of subpartition stats. Do you know if this works on 10.2.03?
I am trying this, and I get a ORA-20001 invalid identifier error.
I have a vague memory of reading somewhere that this does not work before 10.2.04, but of course, can’t find that again.
Cheers.
Hi, Are you referring to a particular section in the post? If I am not wrong this post should run OK in any 10.2.X.X version, please publish here the command that is giving you the error to try it myself.
Sorry about the delay replying, I got tied up with other things.
Anyway, I seem to have found my problem . The first SQL that did not work was as follows:
begin
dbms_stats.gather_table_stats(‘REPDB’,’TRAN_GPT_DELTA’,CASCADE=>TRUE ,granularity=>’SUBPARTITION’ , partname=>’2009_04_30_5372′,
method_opt=>’FOR ALL COLUMNS SIZE 1′ ,estimate_percent=>1,degree=>4);
end;
The problem seems to be the name I supply for the partition, this needs to be in double quotes for some reason. As far as I can tell, I have not created these partition names with ” ” around them. In fact, looking in the user_subpart_col_statistics table, there are no “”‘s around the subpartition names.
The SQL that works is below:
begin
dbms_stats.gather_table_stats(‘REPDB’,’TRAN_GPT_DELTA’,CASCADE=>TRUE ,granularity=>’SUBPARTITION’ , partname=>'”2009_04_30_5372″‘,
method_opt=>’FOR ALL COLUMNS SIZE 1’ ,estimate_percent=>1,degree=>4);
end;
Pingback: Tips for DBMS_STATS (part 1/2) « Oracle Examples
You have provide a valuable information. Thank you. I have a question in regards to gathering stats on one partition with global stats. So, I ran the following but the dba_tab_partitions show GLOBAL =YES for this partition, I am confused. Please let me know your thoughts.
execute DBMS_STATS.GATHER_TABLE_STATS(‘PROD1,’T_TRANSACTION’,granularity =>’PARTITION’,partname => ‘P_20101004’,estimate_percent => 35 ,degree => 4, CASCADE => TRUE);
Hi,
It’s OK to see GLOBAL_STATS=YES when you take the stats on a particular partition or sub partition, now, when there are stats at the table level or partition level (with sub partitions) and GLOBAL_STATS=NO it means that the stats have been calculated by aggregation.
Hope this helps to clarify your confusion
Carlos
Thank you for your great explanation. it is clear now
thanks
We’re a gaggle of volunteers and opening a brand new scheme in our community. Your website provided us with valuable information to work on. You’ve done a formidable task and our entire neighborhood will
be thankful to you.
Thank you for such a nice post.
Due to huge size of tables (in TB), I am taking status on granularity =>’PARTITION’.
My question is “Should I take global_stats for table as well”, after completing on partition level?
Hi, If your queries specify the partition key then they will have the stats available as partition stats are being taken, other queries NOT USING the partition key will lack the global stats and performance can be impacted.
You can:
Take global stats all the time, requires a lot of time and temp space but stats will be more accurate, or
Delete the global stats and retake all the stats for all the partitions (granularity=>’PARTITION’) and the GLOBAL will be aggregated, then keep taking only PARTITION stats.
in 11g you have a new granularity option: ‘APPROX_GLOBAL AND PARTITION’, this one will take the global stats by using aggregation.
You can query the view USER|ALL|DBA_TAB_STATISTICS, if the column GLOBAL_STATS is NO then the values are derived by aggregation.