cancel
Showing results for 
Search instead for 
Did you mean: 

Density nonsense

Nicolas Dumeige
Esteemed Contributor

Density nonsense

Hello,

On DSS, large table 840 000 000 records.

I know that there is 2 distinct value for the field BUSINESS_UNIT reparted on a 93% / 7% basis.

Histograms have been calculated with
execute dbms_stats.gather_table_stats(
ownname => user,
tabname => 'T1',
granularity=>'GLOBAL',
estimate_percent => 0.05,
block_sample => TRUE,
method_opt => 'FOR COLUMNS BUSINESS_UNIT SIZE 2',
degree => 6);

select TABLE_NAME,COLUMN_NAME,DENSITY
from USER_TAB_COL_STATISTICS
where table_name='T1'
and column_name='BUSINESS_UNIT';

TA COLUMN_NAME DENSITY
-- ------------------------------ ------------------------------
T1 BUSINESS_UNIT .000000000616598840794179

>>>>>>>>>>>>> density * num_rows
SQL> select 6.1660E-10 * 842276500 from dual;

6.1660E-10*842276500
-------------------------
.5193476899

Metalink :
The density is expressed as a decimal number between 0 and 1.
Values close to 1 indicate that this column is unselective
Values close to 0 indicate that this column is highly selective
The more selective a column, the less rows are likely to be returned by a query referencing this column in its predicate list.

I don't get it ?!?
The field seems to be very selective according to this statistic, but in fact, it is not. Giving a certain value, user get 97% of all records !

SQL> select *
from user_tab_histograms
where table_name='T1'
and column_name='BUSINESS_UNIT';

TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
------------------------------ --------------- --------------- -------------- --------------------
T1 BUSINESS_UNIT 7269 3.6963E+35
TMEURST BUSINESS_UNIT 8109 3.8002E+35


Any insight, guess, idea welcomed.

Cheers

Nicolas
All different, all Unix
12 REPLIES
Rodney Hills
Honored Contributor

Re: Density nonsense

And this is related to HPUX forum how?

Rod Hills
There be dragons...
Jean-Luc Oudart
Honored Contributor

Re: Density nonsense

Hi Nicolas

Welcome back !

As you only have 2 values, why not using bitmap index ?

Regards
Jean-Luc
fiat lux
Nicolas Dumeige
Esteemed Contributor

Re: Density nonsense

Hello Jean-Luc,

The table is compressed. In order to get the best compression we have to sort the staging table before making direct path insert.
Index maintenance is a cost I'm not willing to pay if it can be avoided (unusable + DBMS_PCLXUTIL is not an option).

1) There 4 other indexes on table
2) One specific BUSINESS_UNIT represents 97% of the data ... and 97% of the query ;) So the index wouldn't be of much help here.

--> how do you interpret the statistics ? That was my point.

All different, all Unix
Nicolas Dumeige
Esteemed Contributor

Re: Density nonsense

Rodney,

This is a database question in a the "Databases category" thread ... so what's with you ???
All different, all Unix
Eric Antunes
Honored Contributor

Re: Density nonsense

Hi Nicolas,

"I know that there is 2 distinct value for the field BUSINESS_UNIT reparted on a 93% / 7% basis."

So you have 840,000,000 * 0.93 = 781,200,000 rows with one value and 840,000,000 * 0.07 = 58,800,000 rows with another value?

Maybe you should consider creating another table (MT1?) only with those 2 rows that would be referenced by by a table T1 Foreign Key:

create table MT1 (
MT1_ID NUMBER NOT NULL,
BUSINESS_UNIT ...)

You'll need to recreate table T1 (saving all rows to a T1_TEMP temporary table, re-creating T1 and re-loading records from T1_TEMP with MT1_ID but without the BUSINESS_UNIT)...

That's just an ideia...

Best Regards,

Eric Antunes
Each and every day is a good day to learn.
Nicolas Dumeige
Esteemed Contributor

Re: Density nonsense

Eric,

Thanks for you proposition but I fail to understand the point a rebuild the table ... beside loosing a saturday ;)

Again, my question is a on the interpretation of the Oracle stat.

All different, all Unix
Eric Antunes
Honored Contributor

Re: Density nonsense

Nicolas,

My point is the same as Oracle point when they created those 2 tables (for example) for Inventory:

MTL_MATERIAL_TRANSACTIONS
MTL_SECONDARY_INVENTORIES

When I want to know all the transactions of one particular inventory it is much more easy and responsive than if I hadn't the MTL_SECONDARY_INVENTORIES table and respective indexes...

Best Regards,

Eric Antunes
Each and every day is a good day to learn.
Jean-Luc Oudart
Honored Contributor

Re: Density nonsense

Nicolas,

Regarding the sample (0.05%) would you really expect getting relevant information back from Oracle ?

Regards
Jean-Luc
fiat lux
Nicolas Dumeige
Esteemed Contributor

Re: Density nonsense

Jean-Luc,

SQL> select blocks * 0.05 from tabs where table_name='T1';

BLOCKS*0.05
-----------
2657667

This is 20 GB sample, not so bad ...



All different, all Unix
Nicolas Dumeige
Esteemed Contributor

Re: Density nonsense

Eric,

During batch, the FK would be most painful, and again, what could possibly be the gain ?

Referencial constraint serve no purpose here.
The index on a 2 value table neither.

Thanks for your help anyway.
All different, all Unix
Jean-Luc Oudart
Honored Contributor

Re: Density nonsense

In Absolute , not so bad
But compared to the overall size of your table, you wonder about the density "nonsense", I would think that the sample size speaks for itself.

The overall distribution is 93% / 7% for 840M records. But 420,000 would you expect this distribution to be uniform ?

From the overview on histograms :
When to Not Use Histograms
--------------------------

Also, be aware that histograms, as well as all optimizer statistics, are
static. If the data distribution of a column changes frequently, it is
necessary to recompute the histogram for a given column. Histograms are not
useful for columns with the following characteristics:

o all predicates on the column use bind variables
o the column data is uniformly distributed
o the column is not used in WHERE clauses of queries
o the column is unique and is used only with equality predicates


Regards
Jean-Luc
fiat lux
Nicolas Dumeige
Esteemed Contributor

Re: Density nonsense

FYI : all conditions are met.

For what I read in the statistics in 20 GB data sample, Oracle have found 0.51 record with a specific value and that is a nonesense.
And the histograms is studpid as weel with value of a such magnitude 10E35 !

Furthermore, I have 4 others columns with the same kind of crap ...

Thanks all for your post.
All different, all Unix