0% found this document useful (0 votes)
11 views77 pages

Query-Processing

The document covers query processing and optimizations in database management systems, focusing on the implementation of relational operators and the importance of execution plans. It discusses concepts such as indexing, external merge sort, relational algebra, and file operations, emphasizing the cost of various operations and the impact of query optimization techniques. The course is instructed by Dr. Shirshendu Das at IIT Hyderabad, with recommended readings from the textbook 'Database Management System' by Raghu Ramakrishnan.

Uploaded by

hood.robin.black
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views77 pages

Query-Processing

The document covers query processing and optimizations in database management systems, focusing on the implementation of relational operators and the importance of execution plans. It discusses concepts such as indexing, external merge sort, relational algebra, and file operations, emphasizing the cost of various operations and the impact of query optimization techniques. The course is instructed by Dr. Shirshendu Das at IIT Hyderabad, with recommended readings from the textbook 'Database Management System' by Raghu Ramakrishnan.

Uploaded by

hood.robin.black
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 77

CS3563: Introduction to DBMS II

Query Processing and Optimizations

Course Instructor
Dr. Shirshendu Das
IIT Hyderabad
[email protected]

Most of the content of these slides are based on the textbook:


“Database Management System”, by Raghu Ramakrishnan.

Read Chapter 12 and 13 from this textbook.


R
M
 The relational operators serve as the building blocks for query evaluation.
 Queries, written in a language such as SQL, are presented to a query optimizer,
which uses information about how the data is stored to produce an efficient
execution plan for evaluating the query.

 Finding a good execution plan for a query consists of more than just choosing an
implementation for each of the relational operators that appear in the query.
 For example, the order in which operators are applied can influence the cost.

 This chapter considers the implementation of individual relational operators.


 introduction to query processing
 highlighting some common themes that recur throughout this chapter
 how tuples are retrieved from relations while evaluating various relational
operators.
 implementation alternatives for the various operators.
2
R
M
Concepts you have to know to understand this chapter:

 Indexing:
 Hash based indexing
 B+ Tree based indexing
 Remember both hashing can be either clustered or unclustered.

 External Merge Sort:


 You have to know the concept of runs, page availability in main memory.

 Relational algebra:
• Select, Project, Join
• Selection Condition

 File operations:
• Scan, Equality search, Range search etc.

3
R
M

An index matches a selection condition if the index can be used


to retrieve just the tuples that satisfy the condition.

4
R
M

5
R
M

6
R
M

7
R
M

Cost:

8
R
M

Cost:

9
R
M

10
R
M

Cost:

11
R
M

12
R
M

Cost:

13
R
M

Cost:

14
R
M

15
R
M

16
R
M

17
R
M

18
R
M

19
R
M

We can now retrieve the necessary pages of Reserves to retrieve


20
tuples, and check bid=5.
R
M
 Let us consider the case that one of the conjuncts in the selection condition is a
disjunction of terms.
 If even one of these terms requires a file scan because suitable indexes or sort orders are
unavailable, testing this conjunct by itself requires a file scan.

 For example suppose that the only available indexes are:


a hash index on rname and a hash index on sid

Now if the selection condition contains just the conjunct


(day < 8/9/94 ∨ rname=‘Joe’)

We can retrieve tuples satisfying the condition rname=‘Joe’ by using the index on rname.

However, day < 8/9/94 requires a file scan.

If the selection condition is (day < 8/9/94 ∨ rname=‘Joe’) ∧ sid=3, the index on sid
matches the conjunct sid=3.
21
We can use this index to find qualifying tuples and apply day < 8/9/94 ∨ rname=‘Joe’ to
R
M

22
R
M

If we use temporary relations at each step the first step costs:


1. M I/Os to scan R, where M is the number of pages of R, a
2. T I/Os to write the temporary relation T is the number of pages of the
temporary.

The first and third steps are straightforward and relatively inexpensive.

23
R
M

1. We can scan Reserves at a cost of 1,000 I/Os.


2. If we assume that each tuple in the temporary relation
created in the first step is 10 bytes long, the cost of
writing this temporary relation is 250 I/Os.

3. Suppose that we have 20 buffer pages.


4. We can sort the temporary relation in two passes at a cost
of 2 ∗ 2 ∗ 250 = 1, 000 I/Os.

5. The scan required in the third step costs an additional 250


I/Os.
6. The total cost is 2,500 I/Os.
24
R
M
This approach can be improved on by modifying the sorting algorithm to do
projection with duplicate elimination.

Two important modifications to the sorting algorithm adapt it for projection:


1. We can project out unwanted attributes during the first pass (Pass 0) of sorting.
2. We can eliminate duplicates during the merging passes.

In fact, this modification will reduce the cost of the merging passes since fewer tuples are written out in
each pass.

Let us consider our example again:


1. In the first pass we scan Reserves, at a cost of 1,000 I/Os and write out 250 pages.
2. With 20 buffer pages, the 250 pages are written out as seven internally sorted runs.

3. In the second pass we read the runs, at a cost of 250 I/Os, and merge them.
4. The total cost is 1,500 I/Os.

5. Which is much lower than the cost of the first approach used to implement projection.
25
R
M

Partitioning:

26
R
M
Duplicate Elimination:

27
R
M

28
R
M

29
R
M

30
R
M

31
R
M

Main Memory

R S

32
R
M

Main Memory

R S

33
R
M

If each I/O costs about 10ms on current hardware,


then this join will take about 140 hours! 34
R
M

35
R
M

36
R
M

Main Memory

R S

37
R
M

Main Memory

R S

38
R
M

39
R
Main Memory M

R S

40
R
M

41
R
M

Main Memory

R S

42
R
M

43
R
M

Main Memory

R S

44
R
M

Main Memory

R S

45
Block Nested Loop Join With In-Memory Hashing:
R
M

Main Memory

R S

46
R
M

47
R
M

48
R
M

R S

49
R
M

50
R
M

51
R
M

Cost of sorting Reserve


Cost of sorting Sailor

52
R
M

53
 Hash Join R
M

Main Memory
Relation R

h = Bucket 0 if most significant digit of ID is even. Otherwise Bucket 1

54
*****This is an example of how partitioning happens in hash based join. The relation (table) considered is not from book (RM).
 Hash Join R
Partitions of both R and S using the hash function h. M
Partitions of R
Partitions of S

Note that the hash function must have


to distribute the tuples uniformly
throughout the partitions.

In these example the hash function is


not doing so. I have taken this hash
function just to explain the procedure.

Actually the hash function is not correct


as it is not distributing the tuples
uniformly.

55
R
 Hash Join M

Main Memory

Once a match found between r and s where


I consider 5 pages of Main r is from R and s is from S. Write <r, s> into
Memory here but in the this buffer.
previous step (partitioning) I
considered only 3. Actually both
must be same.

If 5 pages are available then


each relation can be divided
into 4 partitions (see prev.
slide).

I used only two partition to


R S
make the example simple.

If r is a tuple from R and s is a tuple from S then <r,s> is the tuple in the natural join of R and S provided the condition
56 of
natural join is satisfied by these two tuples.
R
M

57
R
M

58
R
M

59
R
M

60
R
M

61
R
M

62
R
M

63
R
M

64
R
M

Another Example of Pipelining

65
66

This slide is based on our text book “Database System Concepts”. But you no need
to read book for this. Understanding the contents in slide is enough.
67

This slide is based on our text book “Database System Concepts”. But you no need
to read book for this. Understanding the contents in slide is enough.
68

This slide is based on our text book “Database System Concepts”. But you no need
to read book for this. Understanding the contents in slide is enough.
69

This slide is based on our text book “Database System Concepts”. But you no need
to read book for this. Understanding the contents in slide is enough.
70

This slide is based on our text book “Database System Concepts”. But you no need
to read book for this. Understanding the contents in slide is enough.
71

This slide is based on our text book “Database System Concepts”. But you no need
to read book for this. Understanding the contents in slide is enough.
Assume:
number of pages in T1 as 10
number of pages in T2 as 250
number of buffer pages is 5.

Cost:
1. Applying sort merge join for T1 and T2:

Cost:
2. Applying block nested join for T1 and T2:

72
Assume:
number of pages in T1 as 10
number of pages in T2 as 250
number of buffer pages is 5.

Cost:
1. Applying sort merge join for T1 and T2: 4060 I/O pages
2. Applying block nested join for T1 and T2: 2770 I/O pages

73
Suppose that we have the following two index available:
1. a clustered static hash index on the bid field of
Reserves
2. a hash index on the sid field of Sailors.

74
Suppose that we have the following two index available:
1. a clustered static hash index on the bid field of
Reserves
2. a hash index on the sid field of Sailors.

Sort (on sid) and materialize the result of bid=100 (T1)

75
We have the following two index available:
1. a clustered static hash index on the bid field of
Reserves
2. a hash index on the sid field of Sailors.

76
End of Query Processing and
Optimisation

77

You might also like