0% found this document useful (0 votes)
294 views52 pages

Oracle Cost-Based Optimizer Basics

The document provides an overview of key concepts for understanding how the Oracle optimizer works, including: - The optimizer's cost estimate is based on cardinality estimates for single tables, joins, and subqueries which can be incorrect if it does not account for data characteristics like clustering. - Clustering of data, whether within or across tables, has a large impact on performance by affecting the number of blocks accessed. - Caching is not considered by the optimizer but can significantly improve performance for hot data that typically resides in the buffer cache.

Uploaded by

Sam Cool
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
294 views52 pages

Oracle Cost-Based Optimizer Basics

The document provides an overview of key concepts for understanding how the Oracle optimizer works, including: - The optimizer's cost estimate is based on cardinality estimates for single tables, joins, and subqueries which can be incorrect if it does not account for data characteristics like clustering. - Clustering of data, whether within or across tables, has a large impact on performance by affecting the number of blocks accessed. - Caching is not considered by the optimizer but can significantly improve performance for hot data that typically resides in the buffer cache.

Uploaded by

Sam Cool
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

 Independent consultant

 Available for consulting


 In-house workshops
 Cost-Based Optimizer
 Performance By Design
 Performance Troubleshooting
 Oracle ACE Director

 Member of OakTable Network


 Optimizer Basics – Key Concepts

 Proactive: Performance by design

 Reactive: Troubleshooting
 Three main questions you should ask when
looking for an efficient execution plan:

 How much data? How many rows / volume?

 How scattered / clustered is the data?

 Caching?

=> Know your data!


 Why are these questions so important?

 Two main strategies:

 One “Big Job”


=> How much data, volume?

 Few/many “Small Jobs”


=> How many times / rows?
=> Effort per iteration? Clustering / Caching
 Optimizer’s cost estimate is based on:

 How much data? How many rows / volume?

 (partially)

 (Caching?) Not at all


 Single table cardinality

 Join cardinality

 Filter subquery / Aggregation cardinality


 Selectivity of predicates applying to a single
table
 Selectivity of predicates applying to a single
table
 Selectivity of predicates applying to a single
table
 Selectivity of predicates applying to a single
table
Filtered Cardinality / Filter Ratio

Base Cardinality
 Optimizer challenges
 Skewed column value distribution

 Gaps / clustered values

 Correlated column values

 Complex predicates and expressions

 Bind variables
Demo!

optimizer_basics_single_table_cardinality_testcase.sql
 Impact limited to a “single table”

 Influences the favored


(Full Table Scan, Index Access etc.)

 Influences the and


(NESTED LOOP, HASH, MERGE)

=> An incorrect single table cardinality


potentially screws up whole !
 Oracle joins exactly row sources at a time

 If more than two row sources need to be joined,


join operations are required

 Many different possible (factorial!)


 Tree shape of execution plan
 Challenges

 Getting the right!

 A join can mean anything between and a


product
 Getting the right

1,000,000 rows
T1, T2
0 rows

1,000 rows T1 T2 1,000 rows


 Getting the right

Join cardinality =
Cardinality T1 *
T1, T2 Cardinality T2 *
Join selectivity

T1 T2
 Challenges

 Semi Joins (EXISTS (), = ANY())

 Anti Joins (NOT EXISTS (), <> ALL())

 Non-Equi Joins (Range, Unequal etc.)


 Even for the most common form of a join
- the –
there are several challenges
 Non-uniform join column value distribution

 Partially overlapping join columns

 Correlated column values

 Expressions

 Complex join expressions (multiple AND, OR)


Demo!

optimizer_basics_join_cardinality_testcase.sql
 Influences the and
(NESTED LOOP, HASH, MERGE)

=> An incorrect join cardinality/selectivity


potentially screws up whole !
 Data is organized in blocks

 Many rows can fit into a single block

 According to a specific data can


be either across many different blocks
or in the same or few blocks

 Does make a tremendous difference in terms of


efficiency of a “Small Job”
1,000 rows => visit 1,000 table blocks: 1,000 * 5ms = 5 s
1,000 rows => visit 10 table blocks: 10 * 5ms = 50 ms
 Scattered data means potentially many more
blocks to compete for the Buffer Cache for the
same number of rows
 => Caching!

 Scattered data can result in increased


 physical

 write (Log Writer, DB Writer)


 Most OLTP data has a clustering

 Data arriving is usually


clustered together in a heap organized table

 Depends on the organization

for example can influence this


clustering even for heap organized tables
 Clustering of data can be influenced by
implementation
 Physical design matters
 Segment space management (MSSM / ASSM)
 Partitioning
 Index/Hash Cluster
 Index Organized Tables (IOT)
 Index design / multi-column composite indexes

 There is a reason why the Oracle internal data


dictionary uses all over the place
No table access => only index blocks are visited!
 There is only a single measure of clustering in
Oracle:
The

 The index clustering factor is represented by a


value

 The logic measuring the clustering factor by


default does cater for data clustered across
blocks (ASSM!)
 Challenges

 Getting the right

 There are various reasons why the index clustering


factor measured by Oracle might not be

 Multiple freelists / freelist groups (MSSM)


 ASSM
 Partitioning
 SHRINK SPACE effects
Re-visiting the same recent table blocks
 Challenges

 There is no clustering measurement

 The optimizer therefore doesn’t really have a clue


about the

 You may need to influence the optimizer’s decisions


if you know about this clustering
Demo!

optimizer_basics_inter_table_clustering_testcase.sql
 The optimizer’s model by default doesn’t
consider caching of data

 Every I/O is assumed to be

 But there is a huge difference between


(measured in microseconds) and
(measured in milliseconds)
 You might have knowledge of particular
application data that is and usually stays
in the Buffer Cache

 Therefore certain queries against this “hot”


data can be based on that

 The optimizer doesn’t know about this. You


may need to the optimizer’s decisions
 Oracle obviously played with the idea of
introducing an caching component
into the cost calculation in 9i and 10g

 You can see this from the undocumented


parameters and
as well as the columns
and
in the data
dictionary
 It is important to point out that even
is not “free”

 So even by putting all objects entirely in the


Buffer Cache execution plans may
still lead to poor performance

logical I/O, in particular on “hot


blocks”, can lead to and
and determine whether
the or strategy should be
preferred

 If the optimizer gets these estimates right, the


resulting will be within
the of the given access paths
 How to apply these concepts, where to go from
here?
 Read Jonathan Lewis’ article
“Designing Efficient SQL” at Red
Gate’s “Simple Talk”

Probably the best coverage of the


concepts outlined here including
clustering and caching

http://www.simple-talk.com/sql/performance/designing-efficient-sql-a-visual-approach/
 How to apply these concepts, where to go from
here?
 Read one of Tom Kyte’s books to
learn more about the pro’s and con’s
of clusters and index organized
tables
 How to apply these concepts, where to go from
here?
 Learn how to read, interpret and
understand Oracle execution plans
=> Chapter 6 of “Troubleshooting
Oracle Performance” by Christian
Antognini

 This knowledge is required in order


to compare your understanding of
the query to the optimizer’s
understanding
 How to apply these concepts, where to go from
here?
 Be aware of Query Transformations:
The optimizer might rewrite your
query to something that is
semantically equivalent but
potentially more efficient

 This might take you by surprise


when trying to understand the
execution plan favored by the
optimizer
Query transformation examples by courtesy of Joze Senegacnik (OOW 2010)
 If you want a more formal approach

 Read “SQL Tuning” by Dan Tow


 Teaches a formal approach how to
design and visualize an execution plan
 Focuses on “robust” execution plans
in an OLTP environment
 The formal approach doesn’t take into
account clustering and caching,
however it is mentioned in the book at
some places
 If you want a more formal approach

 Read “Relational Database Index


Design and the Optimizers” by
Tapio Lahdenmäki and Michael
Leach
 Focuses on index design
 Provides simple and more advanced
formulas allowing to predict the
efficiency of queries and indexes
 Covers clustering and caching
 For application developers

 Read “Use the Index, Luke” by


Markus Winand
 Focuses on index design
 Provides a lot of examples how to
http://use-the-index-luke.com/ design efficient database access using
different front-end languages (Java,
Perl, PHP, etc.)
 Also available as free eBook
 Cross database (Oracle DB2,
MySQL…)
 If you want dive into the details of the Cost-
Based Optimizer
 Read “Cost-Based Oracle:
Fundamentals” by Jonathan Lewis
 Almost six years old
 Still the best book about the Oracle
optimizer
 Covers the key concepts mentioned
here in great detail
Q&A

You might also like