01 Dbms Storage
01 Dbms Storage
Systems@TUDa http://tuda.systems/
ANNOUNCEMENTS
Warm-up lab: will start this Friday.
• Goal: see a sample lab and learn Rust
• Important: Warm-up lab will give bonus points if you solve it
• Gitlab accounts have been created and you should have received an email
• If you already had an account in our Gitlab (from a previous semester/course), your
account will be re-used
• In any case, try to log in and make sure that you are part of the group
"sdms_ws2024_students”
Systems@TUDa http://tuda.systems/ | 2
COURSE SCHEDULE
Week 1: Introduction
Week 2: DBMS Storage - Single Node
Week 3: DBMS Storage - Distributed
Week 4: DBMS Query Processing - Single Node Single-Node +
Week 5: DBMS Query Processing - Distributed Query Distributed DBMS
Week 6: DBMS Query Optimization - Single Node & Distributed
Week 7: DBMS Transaction Processing - Single Node
Week 8: DBMS Transaction Processing - Distributed
Week 9: Cloud DBMS - Data Centers & DBMS Architectures
Week 10: Cloud DBMS - Scalable Query Processing
Cloud DBMSs
Week 11: Cloud DBMS - Scalable Transaction Processing
Week 12: Cloud DBMS - Secure DBMSs
Week 13: Other Workloads – MapReduce / Streaming
Week 14: Other Workloads – Distributed AI Other Workloads
Systems@TUDa http://tuda.systems/ | 3
A SINGLE-NODE DBMS ARCHITECTURE
SQL Result
Query Optimization
Query Execution
Access Methods
Buffer Pool
Storage Manager
Table Data
Systems@TUDa http://tuda.systems/ | 4
TODAY: STORAGE LAYERS OF A DBMS
SQL Result
Query Optimization
Buffer Pool
Storage Manager
Table Data
Systems@TUDa http://tuda.systems/ | 5
DATABASES NEED TO STORE TABLE DATA
Systems@TUDa http://tuda.systems/ | 6
THE STORAGE HIERARCHY (SIMPLIFIED)
core
registers
L1-I L1-D
L2
Systems@TUDa http://tuda.systems/ | 8
STORAGE HIERARCHY – CAPACITY & COST
3MiB each
96MiB
Increase Cost $ / GB
384MiB
265-4096GiB
1-335TiB
Systems@TUDa http://tuda.systems/ | 9
MOVEMENT OF DATA IN STORAGE HIERARCHY
Systems@TUDa http://tuda.systems/ | 10
LET US DESIGN A DATABASE STORAGE MANAGER
Systems@TUDa http://tuda.systems/ | 11
Interactive Session
(not in slides)
A SINGLE-NODE DBMS Later in the
semester
DBMS
Page Page Buffer Pool Storage
1 2
(in Memory)
Focus today on
single-node storage.
Next lecture(s):
Distributed storage
Systems@TUDa http://tuda.systems/ | 13
TODAY’S AGENDA
Storage Manager
Buffer Pool
Systems@TUDa http://tuda.systems/ | 14
TODAY’S AGENDA
Storage Manager
Buffer Pool
Systems@TUDa http://tuda.systems/ | 15
HOW TO MAP TABLE TO A HDD?
cust-key last-name … age
1 Müller 47
Table 2 Lo 53 Rows (=Tuples)
… … in Table
99 Schmitt 22
A table is mapped to a
database file which consist
of multiple disk blocks
Systems@TUDa http://tuda.systems/ | 16
DATABASE FILES – TWO MAIN TYPES
Unsorted (Heap File) – New blocks are appended if table grows
cid ... cid ... cid ... cid ...
4234 ... 2234 ... 1211 ... 8004 ... How to
1345 ... 1745 ... ... 7235 ... 0325 ... organize data
How to organize
3367 ... 6367 ... 6337 ... 1357 ...
within a block?
1689 ... 8689 ... 7639 ... 2899 ...
Systems@TUDa http://tuda.systems/ | 17
MAPPING TABLES TO BLOCKS: BASIC IDEA
For now: Block = Page
Page (e.g., 4KB)
Strawman idea for how to organize a page Tuple 1 (36B)
• Pages provide fixed-sized “slots” for tuples Tuple 2 (36B)
• Why fixed-sized slots? How can a DBMS know
Tuple 3 (36B)
the size of a row?
Free Space
Systems@TUDa http://tuda.systems/ | 18
EXAMPLE: DETERMINE SLOT SIZE
Systems@TUDa http://tuda.systems/ | 19
RECAP: DATA TYPES IN SQL
Strings (fixed and variable-length):
▪ CHAR(20) -- fixed length
How to support
▪ VARCHAR(40) -- variable length
variable-length
data?
Numbers (fixed-length):
▪ BIGINT (8 Byte), INT (4 Byte), SMALLINT (2), TINYINT (1)
▪ REAL (8.Byte), FLOAT (4 Byte)
Systems@TUDa http://tuda.systems/ | 20
SLOTTED PAGES: DETAILS
Slots
Slotted Page: One slot per tuple (Offset+size of tuples)
• Each slot: stores offset + size for tuple data
• Tuple data: stored from back to front
Header o o o
Systems@TUDa http://tuda.systems/ | 21
EXAMPLE: SLOTTED PAGES
Slots
(Offset + Size)
Header o o o
53 Lo 2 47 Müller 1
Tuple data
Systems@TUDa http://tuda.systems/ | 22
ANOTHER VARIANT OF A SLOTTED PAGES
Per slot: Fixed-
Fixed-length data of tuples (e.g., INT) length data &
Offset/size
is stored directly in slots
Tuple Tuple
Header 1 2
Only variable-length data (e.g., Tuple
3
VARCHAR) is stored at the end of page
Page
Free Space
Advantage: No indirection and faster
access to fixed-length data
Tuple Tuple Tuple
3 2 1
Systems@TUDa http://tuda.systems/ | 23
HOW TO FIND DATA IN TABLES?
The DBMS needs a mechanism to identify individual
tuples in tables (e.g., for DBMS indexes, see below)
Customer Record Id
CID = 5? (Page=1,
Slot = 5)
Systems@TUDa http://tuda.systems/ | 24
DETAILS ABOUT RECORD IDs
Record IDs are “pointers” to a tuple Record Ids in DBMSs
Systems@TUDa http://tuda.systems/ | 25
SLOTTED PAGES: ROW VS COLUMN STORE
Row-Store: All attributes of a tuple in a table are stored in one page
Table Pages
Table Pages
Systems@TUDa http://tuda.systems/ | 26
Row vs. Column-store: Scan Perf.
Systems@TUDa http://tuda.systems/ | 27
TODAY’S AGENDA
Storage Manager
Buffer Pool
Systems@TUDa http://tuda.systems/ | 28
HARD DISK DRIVES / HDDs ARE SLOW
HDDs are (still) the dominant storage device for classical DBMSs due to their low
$/GB price and high capacities
RAM
Systems@TUDa http://tuda.systems/ | 30
STORAGE HIERARCHY: ACCESS TIMES
Systems@TUDa http://tuda.systems/ | 31
Example: Data Access via RIDs
Customer
CID = 5?
Systems@TUDa http://tuda.systems/ | 32
BUFFER: THE COMPLETE PICTURE
Customer
CID = 5?
(1, 5)
Systems@TUDa http://tuda.systems/ | 33
BUFFER: THE COMPLETE PICTURE
(1, 5)
Page Table
tells us where
to find page?
Systems@TUDa http://tuda.systems/ | 34 Buffer or Disk
BUFFER: THE COMPLETE PICTURE
(1, 5)
Systems@TUDa http://tuda.systems/ | 35
BUFFER MANAGER / BUFFER POOL
3
Buffer pool provides fixed number of
…
“frames” to cache pages
Page Table Buffer Pool
Systems@TUDa http://tuda.systems/ | 36
BUFFER MANAGER: INTERFACE?
Buffer pools provide two main operations: PIN and UNPIN
PIN: DBMS (i.e., worker thread which executes a query) requests a page using
Page ID
• Buffer pool checks if page is already in memory
• If not, it loads page and returns pointer to DBMS
UNPIN: worker thread releases page to indicate that it does not use it anymore
Buffer pool contains additional info for caching decisions: Dirty flag /
Reference counter (how many workers access page)
Systems@TUDa http://tuda.systems/ | 37
BUFFER MANAGER: PIN OPERATION
function PIN(pageno) // request page from buffer pool
1. if buffer caches page with pageno then
2. pinCount(pageno) = pinCount(pageno) + 1; Page already
3. return memory address of frame holding page; in buffer?
1. if buffer is full
2. select a victim-frame v using replacement policy; Page needs
3. if dirty(v) then write v to disk; to be evicted?
4. end
Systems@TUDa http://tuda.systems/ | 39
BUFFER MANAGER: EVICTION STRATEGIES
If buffer is full when a new page is requested via a PIN operation, then a
victim frame needs to be selected (called page eviction)
Systems@TUDa http://tuda.systems/ | 40
Interactive Session
(not in slides)
EVICTION STRATEGIES?
Systems@TUDa http://tuda.systems/ | 42
BUFFER MANAGER: EVICTION STRATEGIES
Typical eviction strategies
• FIFO: first-in-first-out (using a simple queue)
• LRU: evict least-recently-used page by an access time per page
(list of pages sorted by access time)
Goal is to evict pages that are not likely to be used in near future
Systems@TUDa http://tuda.systems/ | 43
LEAST-RECENTLY-USED (LRU)
Main idea: keep most recently Implementation:
used pages in buffer Keep ordered list of page numbers
Page access order
Typical implementation: Keep a
sorted list of page IDs ordered by
access time
Systems@TUDa http://tuda.systems/ | 44
CLOCK ALGORITHM TO APPROXIMATE LRU
Clock algorithm is a „good enough“ approximate version of LRU with
lower overhead
Goal: Find an old page, but not necessarily the very oldest
Systems@TUDa http://tuda.systems/ | 45
CLOCK: EVICTION ALGORITHM
A “clock hand” keeps track of which frame needs to be checked
next
Hand
Ref=1 Ref=1
Page 4 Page 2
Page 3
Ref=1
Systems@TUDa http://tuda.systems/ | 47
CLOCK: EXAMPLE (4 FRAMES)
Ref=0
Page requests:
1. PIN(pageID=5) Page 1
Ref=1 Ref=1
Page 4 Page 2
Page 3
Ref=1
Systems@TUDa http://tuda.systems/ | 48
CLOCK: EXAMPLE (4 FRAMES)
Ref=0
Page requests:
1. PIN(pageID=5) Page 1
Ref=1 Ref=0
Page 4 Page 2
Page 3
Ref=1
Systems@TUDa http://tuda.systems/ | 49
CLOCK: EXAMPLE (4 FRAMES)
Ref=0
Page requests:
1. PIN(pageID=5) Page 1
Ref=1 Ref=0
Page 4 Page 2
Page 3
Ref=0
Systems@TUDa http://tuda.systems/ | 50
CLOCK: EXAMPLE (4 FRAMES)
Ref=0
Page requests:
1. PIN(pageID=5)
X
Page 1
Ref=0 Ref=0
Page 4 Page 2
Page 3
Ref=0
Systems@TUDa http://tuda.systems/ | 51
CLOCK: EXAMPLE (4 FRAMES)
Ref=1
Page requests:
Page 5 is loaded into buffer ->
1. PIN(pageID=5) Page 5 Reference-bit is reset to 1
Ref=0 Ref=0
Page 4 Page 2
Page 3
Ref=0
Systems@TUDa http://tuda.systems/ | 52
CLOCK: EXAMPLE (4 FRAMES)
Ref=1
Page requests:
1. PIN(pageID=5) Page 5
2. PIN(pageID=2)
3. PIN(pageID=3)
Ref=0 Ref=1
Page 4 Page 2
Page 4 Page 2
Page 3
Ref=1
Systems@TUDa http://tuda.systems/ | 54
CLOCK: EXAMPLE (4 FRAMES)
Ref=0
Page requests:
1. PIN(pageID=5) Page 5
2. PIN(pageID=2)
3. PIN(pageID=3)
4. PIN(pageID=1)
Ref=0 Ref=1
Page 4 Page 2
Page 3
Ref=1
Systems@TUDa http://tuda.systems/ | 55
CLOCK: EXAMPLE (4 FRAMES)
Ref=0
Page requests:
1. PIN(pageID=5) Page 5
2. PIN(pageID=2)
3. PIN(pageID=3)
4. PIN(pageID=1)
Ref=0 Ref=0
Page 4 Page 2
Page 3
Ref=1
Systems@TUDa http://tuda.systems/ | 56
CLOCK: EXAMPLE (4 FRAMES)
Ref=0
Page requests:
1. PIN(pageID=5) Page 5
2. PIN(pageID=2)
3. PIN(pageID=3)
4. PIN(pageID=1)
Ref=0 Ref=0
Page 4 Page 2
Page 3
Ref=0
Systems@TUDa http://tuda.systems/ | 57
CLOCK: EXAMPLE (4 FRAMES)
Ref=0
Page requests:
1. PIN(pageID=5) Page 5
2. PIN(pageID=2)
3. PIN(pageID=3)
4. PIN(pageID=1)
Ref=0 Ref=0
X
Page 4 Page 2
Page 3
Ref=0
Systems@TUDa http://tuda.systems/ | 58
CLOCK: EXAMPLE (4 FRAMES)
Ref=0
Page requests:
1. PIN(pageID=5) Page 5
2. PIN(pageID=2)
3. PIN(pageID=3)
4. PIN(pageID=1)
Ref=1 Ref=0
Page 1 Page 2
Systems@TUDa http://tuda.systems/ | 59
SUMMARY
Storage Manager
Buffer Pool
Systems@TUDa http://tuda.systems/ | 60
QUESTIONS
Systems@TUDa http://tuda.systems/ | 61