Distributed Database Systems: Vera Goebel
Distributed Database Systems: Vera Goebel
Vera Goebel
Department of Informatics
University of Oslo
2011
1
Contents
• Review: Layered DBMS Architecture
• Distributed DBMS Architectures
– DDBMS Taxonomy
• Client/Server Models
• Key Problems of Distributed DBMS
– Distributed data modeling
– Distributed query processing & optimization
– Distributed transaction management
• Concurrency Control
• Recovery
2
Functional Layers Applications
of a DBMS Interface Data Model / View Management
Transaction Management
Control Semantic Integrity Control / Authorization
Database
3
Dependencies among DBMS components
Transaction
Mgmt
Access Path Sorting
Mgmt component
Central
Components System Buffer
Mgmt
Indicates a dependency
4
Centralized DBS
• logically integrated
• physically centralized
T2
T1 T3
network
T4 DBS
T5
5
Distributed DBS
• Data logically integrated (i.e., access based on one schema)
• Data physically distributed among multiple database nodes
• Processing is distributed among multiple database nodes
Client/Server Distribution
Heterogeneity
7
Common DBMS Architectural Configurations
No Autonomy Federated Multi
N2 D1 D2 GTM & SM
N1
N3 N4 D1 D2 D3 D4
D3 D4
8
Parallel Database System Platforms
(Non-autonomous Distributed DBMS)
Shared Everything Shared Nothing
Mem1
Fast Interconnection Mem2 Proc1 ... ProcN
Network ...
Mem1 MemN
MemM
Disk 1
... Disk p Disk 1
... Disk N
9
Distributed DBMS
• Advantages:
– Improved performance
– Efficiency
– Extensibility (addition of new nodes)
– Transparency of distribution
• Storage of data
• Query execution
– Autonomy of individual nodes
• Problems:
– Complexity of design and implementation
– Data consistency
– Safety
– Failure recovery
10
Client/Server Database Systems
11
Client/Server Environments
data (object) server + n smart clients (workstations)
12
Clients with Centralized Server Architecture
workstations
...
local communication
network
Data interface
Server database functions
...
Disk 1 Disk n
database
13
Clients with Distributed Server Architecture
workstations
...
... ...
Disk 1 Disk n Disk 1 Disk p
database database
14
Clients with Data Server Approach
“3-tier client/server”
...
user interface
Application
query parsing
Server
data server interface
communication
channel
...
Disk 1 Disk n
database
15
In the Client/Server DBMS Architecture,
how are the DB services organized?
16
Client/Server Architectures
server process
Relational client process
Application
cursor SQL
management
client process
Object-Oriented
Application
server process
objects,
object/page cache pages, or
management files
DBMS
17
Object Server Architecture
server process
client process
Log/Lock Object
Object
Manager Manager
Cache
Application File/Index
Manager
objects
object references
queries Page Cache
Object Manager Page
method calls Manager
Cache
locks
log records Storage Allocation
Object and I/O
Cache
Database
18
Object Server Architecture - summary
• Unit of transfer: object(s)
• Server understands the object concept and can execute methods
• most DBMS functionality is replicated on client(s) and server(s)
• Advantages:
- server and client can run methods, workload can be balanced
- simplifies concurrency control design (centralized in server)
- implementation of object-level locking
- low cost for enforcing “constraints” on objects
• Disadvantages/problems:
- remote procedure calls (RPC) for object references
- complex server design
- client cache consistency problems
- page-level locking, objects get copied multiple times, large
objects
19
Page Server Architecture
client process
Application
Object
Object Manager
Cache server process
File/Index pages
Manager page references
Log/Lock Page Cache Page
locks
Page Cache Manager Manager Cache
log records
Manager
Storage Allocation
and I/O
Page
Cache
database
20
Page Server Architecture - summary
• unit of transfer: page(s)
• server deals only with pages (understands no object semantics)
• server functionality: storage/retrieval of page(s), concurrency
control, recovery
• advantages:
- most DBMS functionality at client
- page transfer -> server overhead minimized
- more clients can be supported
- object clustering improves performance considerably
• disadvantages/problems:
- method execution only on clients
- object-level locking
- object clustering
- large client buffer pool required
21
File Server Architecture
client process
22
File Server Architecture - summary
• unit of transfer: page(s)
• simplification of page server, clients use remote file system (e.g.,
NFS) to read and write DB page(s) directly
• server functionality: I/O handling, concurrency control, recovery
• advantages:
- same as for page server
- NFS: user-level context switches can be avoided
- NFS widely-used -> stable SW, will be improved
• disadvantages/problems:
- same as for page server
- NFS write are slow
- read operations bypass the server -> no request combination
- object clustering
- coordination of disk space allocation
23
Cache Consistency in Client/Server Architectures
Synchronous Asynchronous Deferred
Client sends 1 msg Client sends 1 msg Client sends all write
per lock to server; per lock to the server; lock requests to the
Client waits; Client continues; server at commit time;
Avoidance- Server replies with Server invalidates Client waits;
Based ACK or NACK. cached copies at Server replies when
Algorithms other clients. all cached copies are
freed.
Client sends object Client sends 1 msg Client sends all write
status query to server per lock to the server; lock requests to the
Detection-
for each access; Client continues; server at commit time;
Based
Client waits; After commit, the Client waits;
Algorithms
Server replies. server sends updates Server replies based
to all cached copies. on W-W conflicts only.
concurrency control
(lock)
transaction
management
influences
deadlock management
26
Distributed Storage in Relational DBMSs
• horizontal fragmentation: distribution of “rows”, selection
• vertical fragmentation: distribution of “columns”, projection
• hybrid fragmentation: ”projected columns” from ”selected rows”
• allocation: which fragment is assigned to which node?
• replication: multiple copies at different nodes, how many copies?
• Design factors:
– Most frequent query access patterns
– Available distributed query processing algorithms
• Evaluation Criteria
– Cost metrics for: network traffic, query processing, transaction mgmt
– A system-wide goal: Maximize throughput or minimize latency
27
Distributed Storage in OODBMSs
• Must fragment, allocate, and replicate object data among nodes
• Complicating Factors:
– Encapsulation hides object structure
– Object methods (bound to class not instance)
– Users dynamically create new classes
– Complex objects
– Effects on garbage collection algorithm and performance
• Approaches:
– Store objects in relations and use RDBMS strategies to distribute data
• All objects are stored in binary relations [OID, attr-value]
• One relation per class attribute
• Use nested relations to store complex (multi-class) objects
– Use object semantics
• Evaluation Criteria:
– Affinity metric (maximize)
– Cost metric (minimize)
28
Horizontal Partitioning using Object Semantics
29
Vertical Partitioning using Object Semantics
Fragment #1
Fragment #2
Fragment #3
30
Path Partitioning using Object Semantics
31
More Issues in OODBMS Fragmentation
Local Object Data Remote Object Data
• Replication Options
– Objects
– Classes (collections of objects)
– Methods
– Class/Type specifications
32
Distributed Directory and Catalogue Management
• Directory Information:
– Description and location of records/objects
• Size, special data properties (e.g.,executable, DB type,
user-defined type, etc.)
• Fragmentation scheme
– Definitions for views, integrity constraints
• Options for organizing the directory:
– Centralized Issues: bottleneck, unreliable
– Fully replicated Issues: consistency, storage overhead
33
Distributed Query Processing and Optimization
• Construction and execution of query plans, query optimization
• Goals: maximize parallelism (response time optimization)
minimize network data transfer (throughput optimization)
• Basic Approaches to distributed query processing:
Pipelining – functional decomposition Processing Timelines
34
Creating the Distributed Query Processing Plan
• Factors to be considered:
- distribution of data
- communication costs
- lack of sufficient locally available information
• 4 processing steps:
(1) query decomposition
(2) data localization control site (uses global information)
(3) global optimization
(4) local optimization local sites (use local information)
35
Generic Layered Scheme for Planning a Dist. Query
calculus query on distributed objects
Global
query decomposition
schema
This approach was
control site
Local
local optimization schema
38
Distributed Transaction Managment
• Transaction Management (TM) in centralized DBS
– Achieves transaction ACID properties by using:
• concurrency control (CC)
• recovery (logging)
• TM in DDBS
– Achieves transaction ACID properties by using:
Transaction Manager
Log Manager
Concurrency Control Recovery Protocol
(Isolation) (Durability)
Buffer Manager
39
Classification of Concurrency Control Approaches
Isolation
CC Approaches
Pessimistic Optimistic
Centralized Basic
40
Two-Phase-Locking Approach (2PL) Isolation
Obtain Lock
2PL Lock Graph
Number of locks
Release Lock
Obtain Lock
Number of locks
Release Lock
1
Lock Request
2
Lock Granted
3
Operation
4
End of Operation
5
Release Locks
42
Communication Structure of Distributed 2PL
Isolation
1
Lock Request
2
Operation
3
End of Operation
4
Release Locks
43
Distributed Deadlock Management Isolation
Example
waits for T1 x
to release Lc
Site X
T1 x T2 x Distributed Waits-For Graph
holds lock Lc holds lock Lb
- requires many messages
T1 x needs a to update lock status
T2 y needs b
waits for T1 - combine the “wait-for”
waits for T2
on site y
on site x tracing message with lock
to complete
to complete status update messages
waits for T2 y
- still an expensive algorithm
to release Ld
Site Y T1 y T2 y
holds lock La holds lock Ld
44
Communication Structure of Centralized 2P Commit Protocol
Atomicity
Coordinating TM Participating Sites
1
Prepare
Make local commit/abort decision
Write log entry
2
Vote-Commit or Vote-Abort
Count the votes
If (missing any votes
or any Vote-Abort)
Then Global-Abort
Else Global-Commit Who participates?
3
Depends on the CC alg.
Global-Abort or Global-Commit
Write log entry
4 Other communication
ACK structures are
possible:
– Linear
– Distributed
45
State Transitions for 2P Commit Protocol Atomicity
Advantages:
Coordinating TM Participating Sites
- Preserves atomicity
- All processes are
Initial “synchronous within
1
Prepare
one state transition”
Initial
Wait 2 Disadvantages:
Vote-Commit
or Vote-Abort
- Many, many messages
3
Global-Abort or Ready - If failure occurs,
Global-Commit the 2PC protocol blocks!
Attempted solutions for the
Commit Abort 4
ACK blocking problem:
Commit Abort 1) Termination Protocol
2) 3-Phase Commit
3) Quorum 3P Commit
46
Failures in a Distributed System
Types of Failure:
– Transaction failure
– Node failure
– Media failure
– Network failure
• Partitions each containing 1 or more sites
Who addresses the problem?
Issues to be addressed:
– How to continue service Termination Protocols
– How to maintain ACID properties Modified Concurrency Control
while providing continued service & Commit/Abort Protocols
– How to ensure ACID properties after Recovery Protocols,
recovery from the failure(s) Termination Protocols,
& Replica Control Protocols
47
Termination Protocols Atomicity, Consistency
Use timeouts to detect
Coordinating TM Participating Sites
potential failures that could
block protocol progress
Initial
1 Timeout states:
Prepare Coordinator: wait, commit, abort
Initial Participant: initial, ready
Wait
2
Vote-Commit Coordinator Termination Protocol:
3
or Vote-Abort Wait – Send global-abort
Global-Abort or Ready Commit or Abort – BLOCKED!
Global-Commit
Participant Termination Protocol:
Commit Abort 4 Ready – Query the coordinator
ACK
Commit Abort If timeout
then query other participants;
If global-abort | global-commit
then proceed and terminate
else BLOCKED!
48
Replica Control Protocols Consistency
STRICT UPDATE
LAZY UPDATE
BEGIN END
Obtain Locks Period of data use 2-Phase Commit Release
Locks
COMMIT POINT
49
Strict Replica Control Protocol Consistency
• Read-One-Write-All (ROWA)
• Part of the Concurrency Control Protocol
and the 2-Phase Commit Protocol
– CC locks all copies
– 2PC propagates the updated values with 2PC
messages (or an update propagation phase is
inserted between the wait and commit states
for those nodes holding an updateable value).
50
Lazy Replica Control Protocol Consistency
51
Recovery in Distributed Systems Atomicity, Durability
N8
N4
network
Partition #1
N7 N5
Partition #3
N6
Issues:
Termination of interrupted transactions
Partition integration upon recovery from a network failure
– Data availability while failure is ongoing
53
Data Availability in Partitioned Networks
54
Quorums ACID
55
Read-Quorums and Write-Quorums Isolation
57
Conclusions
• Nearly all commerical relational database systems offer some
form of distribution
– Client/server at a minimum
• Only a few commercial object-oriented database systems
support distribution beyond N clients and 1 server
58