0% found this document useful (0 votes)

18 views

Comp115 Class13 ExSol External Sorting

Uploaded by

josh1717books

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Comp115 Class13 ExSol External Sorting

Uploaded by

josh1717books

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

External Sorting

Exercise 13.1 Answer the following questions for each of these scenarios, assuming that our
most general external sorting algorithm is used:

(a) A file with 10,000 pages and three available buffer pages.

(b) A file with 20,000 pages and five available buffer pages.

(c) A file with 2,000,000 pages and 17 available buffer pages.

1. How many runs will you produce in the first pass?

2. How many passes will it take to sort the file completely?

3. What is the total I/O cost of sorting the file?

4. How many buffer pages do you need to sort the file completely in just two passes?

Exercise 13.3 Suppose that you just finished inserting several records into a heap file and
now want to sort those records. Assume that the DBMS uses external sort and makes
efficient use of the available buffer space when it sorts a file. Here is some potentially useful
information about the newly loaded file and the DBMS software available to operate on it:

The number of records in the file is 4500. The sort key for the file is 4 bytes long. You can
assume that rids are 8 bytes long and page ids are 4 bytes long. Each record is a total of 48
bytes long. The page size is 512 bytes. Each page has 12 bytes of control information on it.
Four buffer pages are available.

1. How many sorted subfiles will there be after the initial pass of the sort, and how
long will each subfile be?

2. How many passes (including the initial pass just considered) are required to sort this
file?

3. What is the total I/O cost for sorting this file?

4. What is the largest file, in terms of the number of records, you can sort with just
four buffer pages in two passes? How would your answer change if you had 257
buffer pages?

5. Suppose that you have a B+ tree index with the search key being the same as the
desired sort key. Find the cost of using the index to retrieve the records in sorted
order for each of the following cases:

▪ The index uses Alternative (1) for data entries.

▪ The index uses Alternative (2) and is unclustered. (You can compute the
worst-case cost in this case.)

▪ How would the costs of using the index change if the file is the largest that
you can sort in two passes of external sort with 257 buffer pages? Give your
answer for both clustered and unclustered indexes.
Solutions
Answer 13.1 The answer to each question is given below.
1. In the first pass (Pass 0), N/B runs of B pages each are produced, where N is the number
of file pages and B is the number of available buffer pages:
(a) 10000/3 = 3334 sorted runs.
(b) 20000/5 = 4000 sorted runs.
(c) 2000000/17 = 117648 sorted runs.

2. The number of passes required to sort the file completely, including the initial sorting
pass, is logB-1N1 + 1, where N1 = N/B is the number of runs produced by Pass 0:

(a) log23334 + 1 = 13 passes.

(b) log44000 + 1 = 7 passes.
(c) log16117648 + 1 = 6 passes.

3. Since each page is read and written once per pass, the total number of page I/Os for
sorting the file is 2 ∗ N ∗ (#passes):

(a) 21000013 = 260000.

(b) 2*20000*7 = 280000.
(c) 2*2000000*6 = 24000000.

4. In Pass 0, N/B runs are produced. In Pass 1, we must be able to merge this many runs;
i.e., B − 1 ≥ N/B. This implies that B must at least be large enough to satisfy B ∗ (B − 1) ≥ N;
this can be used to guess at B, and the guess must be validated by checking the first
inequality. Thus:

(a) With 10000 pages in the file, B = 101 satisfies both inequalities, B = 100 does not, so we
need 101 buffer pages.
(b) With 20000 pages in the file, B = 142 satisfies both inequalities, B = 141 does not, so we
need 142 buffer pages.
(c) With 2000000 pages in the file, B = 1415 satisfies both inequalities, B = 1414 does not, so
we need 1415 buffer pages.

Answer 13.3 The answer to each question is given below.

1. Assuming that the general external merge-sort algorithm is used, and that the
available space for storing records in each page is 512−12 = 500 bytes, each page can
store up to 10 records of 48 bytes each. So, 450 pages are needed in order to store
all 4500 records, assuming that a record is not allowed to span more than one page.
Given that 4 buffer pages are available, there will be 450/4 = 113 sorted runs (sub-
files) of 4 pages each, except the last run, which is only 2 pages long.

2. The total number of passes will be equal to log3113 + 1 = 6 passes.

3. The total I/O cost for sorting this file is 2 ∗ 450 ∗ 6 = 5400 I/Os.
4. As we saw in the previous exercise, in Pass 0, N/B runs are produced. In Pass 1, we
must be able to merge this many runs; i.e., B − 1 ≥ N/B. When B is given to be 4,
we get N = 12. The maximum number of records on 12 pages is 12 ∗ 10 = 120. When
B = 257, we get N = 65792, and the number of records is 65792 ∗ 10 = 657920.

a. If the index uses Alternative (1) for data entries, and it is clustered, the cost
will be equal to the cost of traversing the tree from the root to the leftmost
leaf plus the cost of retrieving the pages in the sequence set. Assuming 67%
occupancy, the number of leaf pages in the tree (the sequence set) is
450/0.67 = 672. We also need to account the cost for traversing the tree,
which is given by the height of the tree: 𝑙𝑜𝑔𝐹 (𝐿𝑒𝑎𝑓𝑃𝑎𝑔𝑒𝑠). We calculated
the number of leaf pages to be 672. Each internal index node has F page
pointers (page ids) and F-1 values. So in total 𝐹 ∙ 4 + (𝐹 − 1) ∙ 4 = 512 −
504
12 ⟹ 8 ∙ 𝐹 = 500 + 4 ⟹ 𝐹 = = 63 . So the total cost in IOs is
8
𝑙𝑜𝑔𝐹 (𝐿𝑒𝑎𝑓𝑃𝑎𝑔𝑒𝑠) + 𝐿𝑒𝑎𝑓𝑃𝑎𝑔𝑒𝑠 = ⌈𝑙𝑜𝑔63 (672)⌉ + 672 = 2 + 672 = 674.
Clearly traversing the tree is a small cost when compared with traversing the
leaves.

b. If the index uses Alternative (2), and is not clustered, in the worst case, first
we scan B+ tree’s leaf pages, also each data entry will require fetching a data
page. The number of data entries is equal to the number of data records,
which is 4500. Since there is one data entry per record, each data entry
requires 12 bytes, and each page holds 512-12=500 bytes, the number of B+
tree leaf pages is about (4500 ∗ 12)/(500 ∗ 0.67)), assuming 67% occupancy,
which is about 160. Thus, about 4500 (one disk access per record) + 160 (the
tree leaves) = 4660 I/Os are required in a worst-case scenario. We still need
to account for the fanout similar to (a).

c. The B+ tree in this case has 65792/0.67 = 98197 leaf pages if Alternative (1)
is used, assuming 67% occupancy. This is the number of I/Os required (plus
the relatively minor cost of going from the root to the left-most leaf). If
Alternative (2) is used, and the index is not clustered, the number of I/Os is
approximately equal to the number of data entries in the worst case, that is
657920, plus the number of B+ tree leaf pages 2224. Thus, number of I/Os is
660144. We still need to account for the fanout similar to (a).

Indexing: Practice Exercises
No ratings yet
Indexing: Practice Exercises
4 pages
HW3 Sol
No ratings yet
HW3 Sol
12 pages
CS2finalreview
No ratings yet
CS2finalreview
11 pages
CS218-Data Structures Final Exam
100% (2)
CS218-Data Structures Final Exam
7 pages
4a0-107.exam.52q: Website: VCE To PDF Converter: Facebook: Twitter
No ratings yet
4a0-107.exam.52q: Website: VCE To PDF Converter: Facebook: Twitter
37 pages
Statement of Purpose
100% (8)
Statement of Purpose
2 pages
Assignment3 Sol
No ratings yet
Assignment3 Sol
4 pages
Response DB 2
No ratings yet
Response DB 2
8 pages
Data Structures Module 5 Complete Solutions
No ratings yet
Data Structures Module 5 Complete Solutions
34 pages
01 DS Quiz Set
No ratings yet
01 DS Quiz Set
109 pages
Q Evaluation
No ratings yet
Q Evaluation
17 pages
DBMS R19 UNIT IV
No ratings yet
DBMS R19 UNIT IV
25 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
20 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
49 pages
Algo Run Time
No ratings yet
Algo Run Time
9 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
27 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
49 pages
Algos Qpaper 2022
No ratings yet
Algos Qpaper 2022
6 pages
Final Info2206 en 2022-2023.vers2
No ratings yet
Final Info2206 en 2022-2023.vers2
10 pages
Persistent Placement Paper
No ratings yet
Persistent Placement Paper
3 pages
CACSC02 Question Paper Data Structure Open Book Solution
No ratings yet
CACSC02 Question Paper Data Structure Open Book Solution
11 pages
CSCI2100 Project
No ratings yet
CSCI2100 Project
7 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
Chang Final Practice
No ratings yet
Chang Final Practice
10 pages
Exercises On File With Solutions
No ratings yet
Exercises On File With Solutions
13 pages
The Chinese University of Hong Kong: Data Structures
No ratings yet
The Chinese University of Hong Kong: Data Structures
2 pages
HW4 Solutions
No ratings yet
HW4 Solutions
7 pages
Sample Content - TCS
No ratings yet
Sample Content - TCS
58 pages
DESIGN AND ANALYSIS OF ALGORITHMS [ICT 2222]
No ratings yet
DESIGN AND ANALYSIS OF ALGORITHMS [ICT 2222]
4 pages
DBMS_W09_PAS
No ratings yet
DBMS_W09_PAS
12 pages
Binary Search and Linear Search (DSA REPORT) .
No ratings yet
Binary Search and Linear Search (DSA REPORT) .
15 pages
DSA PAST PAPERS ( 2012 - 2023 ) Unsolved
No ratings yet
DSA PAST PAPERS ( 2012 - 2023 ) Unsolved
17 pages
hw3 Sols
No ratings yet
hw3 Sols
5 pages
hw3 Sol
100% (1)
hw3 Sol
6 pages
Final Info2206 en 2023-2024.vers1
No ratings yet
Final Info2206 en 2023-2024.vers1
12 pages
Algorithm Analysis Big Oh: Data Structures and Design With Java and Junit
No ratings yet
Algorithm Analysis Big Oh: Data Structures and Design With Java and Junit
45 pages
Final Info2206 en 2022-2023.vers1
No ratings yet
Final Info2206 en 2022-2023.vers1
10 pages
Algorithm Record
No ratings yet
Algorithm Record
48 pages
Sorting and Hashing: Why Sort?
No ratings yet
Sorting and Hashing: Why Sort?
6 pages
hw3 Sols
No ratings yet
hw3 Sols
4 pages
2020-Dec CS-201 34
No ratings yet
2020-Dec CS-201 34
3 pages
Midterm 13w2
No ratings yet
Midterm 13w2
8 pages
Unit V Searching and Sorting Algorithms Syllabus
No ratings yet
Unit V Searching and Sorting Algorithms Syllabus
16 pages
Data Structutes Using C'
No ratings yet
Data Structutes Using C'
7 pages
Final Review
No ratings yet
Final Review
96 pages
Computerengineeringsoln
No ratings yet
Computerengineeringsoln
2 pages
Week 4
No ratings yet
Week 4
13 pages
A B Fibonaccian Search
No ratings yet
A B Fibonaccian Search
12 pages
Greedy Algorithm
No ratings yet
Greedy Algorithm
28 pages
L10-Query Evaluaion
No ratings yet
L10-Query Evaluaion
50 pages
Database Management Systems Practice Problem Set: Query Evaluation, Optimization
No ratings yet
Database Management Systems Practice Problem Set: Query Evaluation, Optimization
3 pages
CMP215 Data Structures Through C++
No ratings yet
CMP215 Data Structures Through C++
184 pages
قواعد بيانات نهائي
No ratings yet
قواعد بيانات نهائي
8 pages
Solutions To Google Top Interview Puzzles
No ratings yet
Solutions To Google Top Interview Puzzles
7 pages
Experiment-1 DAA
No ratings yet
Experiment-1 DAA
5 pages
Homework3 Sol
No ratings yet
Homework3 Sol
4 pages
DAA MTE Sample Question Solution
No ratings yet
DAA MTE Sample Question Solution
14 pages
Final exam sample
No ratings yet
Final exam sample
12 pages
BCSE202L-CAT-1-B2-2017
No ratings yet
BCSE202L-CAT-1-B2-2017
6 pages
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms
From Everand
The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms
Cory Althoff
No ratings yet
Lexical Analyser in C++ - ASHWATH KV - 106120017 For Full Code
No ratings yet
Lexical Analyser in C++ - ASHWATH KV - 106120017 For Full Code
4 pages
Top 32 Node - Js Interview Questions (2023) - Javatpoint
No ratings yet
Top 32 Node - Js Interview Questions (2023) - Javatpoint
25 pages
Taking Flight With Copilot: Early Insights and Opportunities of AI-powered Pair-Programming Tools
No ratings yet
Taking Flight With Copilot: Early Insights and Opportunities of AI-powered Pair-Programming Tools
23 pages
IP Subnetting
No ratings yet
IP Subnetting
11 pages
ARM7TDMI Technical Reference Manual
No ratings yet
ARM7TDMI Technical Reference Manual
7 pages
Operators in Python Faculty
No ratings yet
Operators in Python Faculty
28 pages
Rust Cheat Sheet A4
No ratings yet
Rust Cheat Sheet A4
71 pages
Unit V - Operator Overloading
No ratings yet
Unit V - Operator Overloading
10 pages
Week 1 Graded Quiz On Solution PDF
100% (1)
Week 1 Graded Quiz On Solution PDF
2 pages
Desktop DDR2 Ram Pin Points
No ratings yet
Desktop DDR2 Ram Pin Points
12 pages
Brief Data Sheet: Hi3536DV100 H.265/H.264 Decoder Processor
No ratings yet
Brief Data Sheet: Hi3536DV100 H.265/H.264 Decoder Processor
6 pages
Aruba 360 Series Datasheet
No ratings yet
Aruba 360 Series Datasheet
10 pages
FCA US Purchasing LLC - Supplier Portal Instructions - North America Purchase Order (PO) Inquire Manual (1)
No ratings yet
FCA US Purchasing LLC - Supplier Portal Instructions - North America Purchase Order (PO) Inquire Manual (1)
6 pages
Lab Manual
No ratings yet
Lab Manual
43 pages
Slide Set 2011 RWTH Aachen University
No ratings yet
Slide Set 2011 RWTH Aachen University
39 pages
Dell Alienware 15 R3 Compal LA-D752P r0.1 (X00)
No ratings yet
Dell Alienware 15 R3 Compal LA-D752P r0.1 (X00)
82 pages
Sushanth Vadranam
No ratings yet
Sushanth Vadranam
6 pages
Hand Out-Computer Components
No ratings yet
Hand Out-Computer Components
16 pages
Visual Flow Creator en
No ratings yet
Visual Flow Creator en
158 pages
FS Curriculum - Full Stack
No ratings yet
FS Curriculum - Full Stack
3 pages
SE (UNIT-3)
No ratings yet
SE (UNIT-3)
34 pages
Clear Case
No ratings yet
Clear Case
20 pages
Design and FPGA-Based Hardware Implementation of NB-IoT Physical Uplink Shared Channel Transmitter and Physical Downlink Shared Channel Receiver
No ratings yet
Design and FPGA-Based Hardware Implementation of NB-IoT Physical Uplink Shared Channel Transmitter and Physical Downlink Shared Channel Receiver
27 pages
Unit 3-EVSR Notes
No ratings yet
Unit 3-EVSR Notes
45 pages
CS3591 - Computer Networks
No ratings yet
CS3591 - Computer Networks
201 pages
Route 53 in AWS
No ratings yet
Route 53 in AWS
14 pages
Ecommerce Website Project Synopsis
No ratings yet
Ecommerce Website Project Synopsis
4 pages
Ch-3-Assembly Level Machine Organization
No ratings yet
Ch-3-Assembly Level Machine Organization
66 pages

Uploaded by

Uploaded by

External Sorting

(c) A file with 2,000,000 pages and 17 available buffer pages.

1. How many runs will you produce in the first pass?

2. How many passes will it take to sort the file completely?

3. What is the total I/O cost of sorting the file?

3. What is the total I/O cost for sorting this file?

▪ The index uses Alternative (1) for data entries.

(a) log23334 + 1 = 13 passes.

(a) 2*10000*13 = 260000.

Answer 13.3 The answer to each question is given below.

2. The total number of passes will be equal to log3113 + 1 = 6 passes.

You might also like

(a) 21000013 = 260000.