0% found this document useful (0 votes)
36 views58 pages

Distributed Database Systems: Vera Goebel

The document discusses distributed database systems and client/server architectures. It begins with an overview of layered database management system architectures and different types of distributed DBMS architectures including taxonomy, models, and key problems. It then covers topics like distributed data modeling, query processing, transaction management, and concurrency control. Finally, it describes common DBMS configurations like parallel database systems and different client/server architectures for organizing database services.

Uploaded by

hames
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views58 pages

Distributed Database Systems: Vera Goebel

The document discusses distributed database systems and client/server architectures. It begins with an overview of layered database management system architectures and different types of distributed DBMS architectures including taxonomy, models, and key problems. It then covers topics like distributed data modeling, query processing, transaction management, and concurrency control. Finally, it describes common DBMS configurations like parallel database systems and different client/server architectures for organizing database services.

Uploaded by

hames
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Distributed Database Systems

Vera Goebel
Department of Informatics
University of Oslo

2011

1
Contents
• Review: Layered DBMS Architecture
• Distributed DBMS Architectures
– DDBMS Taxonomy
• Client/Server Models
• Key Problems of Distributed DBMS
– Distributed data modeling
– Distributed query processing & optimization
– Distributed transaction management
• Concurrency Control
• Recovery

2
Functional Layers Applications
of a DBMS Interface Data Model / View Management

Transaction Management
Control Semantic Integrity Control / Authorization

Compilation Query Processing and Optimization

Execution Storage Structures

Data Access Buffer Management

Consistency Concurrency Control / Logging

Database

3
Dependencies among DBMS components

Transaction
Mgmt
Access Path Sorting
Mgmt component

Log component Lock


(with savepoint mgmt) component

Central
Components System Buffer
Mgmt

Indicates a dependency

4
Centralized DBS
• logically integrated
• physically centralized

T2
T1 T3

network
T4 DBS
T5

Traditionally: one large mainframe DBMS + n “stupid” terminals

5
Distributed DBS
• Data logically integrated (i.e., access based on one schema)
• Data physically distributed among multiple database nodes
• Processing is distributed among multiple database nodes

T1 T2 DBS1 Why a Distributed DBS?


Performance via parallel execution
- more users
T3 network - quick response
DBS2 More data volume
- max disks on multiple nodes
DBS3

Traditionally: m mainframes for the DBMSs + n terminals


6
DBMS Implementation Alternatives
Distribution
Distributed Distributed Homog. Distributed
DBMS DBMS
Homogeneous Federated DBMS Multi-DBMS

Client/Server Distribution

Distributed Distributed Heterog. Distributed Heterog.


Heterogeneous DBMS Federated DBMS Multi-DBMS

Centralized Centralized Homog. Centralized Homog. Autonomy


DBMS DBMS
Homogeneous Federated DBMS Multi-DBMS

Centralized Centralized Heterog. Centralized Heterog.


Heterogeneous DBMS Federated DBMS Multi-DBMS

Heterogeneity
7
Common DBMS Architectural Configurations
No Autonomy Federated Multi

N2 D1 D2 GTM & SM
N1

N3 N4 D1 D2 D3 D4
D3 D4

Fully integrated nodes Independent DBMSs Fully independent DBMSs


Complete cooperation on: Implement some Independent, global
- Data Schema “cooperation functions”: manager tries to
- Transaction Mgmt coordinate the DBMSs
- Transaction Mgmt using only existing
Fully aware of all other - Schema mapping DBMS services
nodes in the DDBS Aware of other DBSs Unaware of GTM and
in the federation other DBSs in the MultiDBS

8
Parallel Database System Platforms
(Non-autonomous Distributed DBMS)
Shared Everything Shared Nothing

Proc1 Proc2 ... ProcN Fast Interconnection Network

Mem1
Fast Interconnection Mem2 Proc1 ... ProcN
Network ...
Mem1 MemN
MemM

Disk 1
... Disk p Disk 1
... Disk N

Typically a special-purpose hardware Can be a special-purpose hardware


configuration, but this can be emulated configuration, or a network of workstations.
in software.

9
Distributed DBMS
• Advantages:
– Improved performance
– Efficiency
– Extensibility (addition of new nodes)
– Transparency of distribution
• Storage of data
• Query execution
– Autonomy of individual nodes
• Problems:
– Complexity of design and implementation
– Data consistency
– Safety
– Failure recovery

10
Client/Server Database Systems

The “Simple” Case of Distributed


Database Systems?

11
Client/Server Environments
data (object) server + n smart clients (workstations)

• objects stored and administered on server


• objects processed (accessed and modified) on
workstations [sometimes on the server too]
• CPU-time intensive applications on workstations
– GUIs, design tools
• Use client system local storage capabilities
• combination with distributed DBS services
=> distributed server architecture

12
Clients with Centralized Server Architecture

workstations
...

local communication
network

Data interface
Server database functions

...
Disk 1 Disk n
database

13
Clients with Distributed Server Architecture
workstations
...

local communication network


Data Server 1 Data Server m
interface interface
distributed DBMS ... distributed DBMS
local data mgnt. functions local data mgnt. functions

... ...
Disk 1 Disk n Disk 1 Disk p
database database

14
Clients with Data Server Approach
“3-tier client/server”
...
user interface
Application
query parsing
Server
data server interface
communication
channel

Data application server interface

Server database functions

...
Disk 1 Disk n
database
15
In the Client/Server DBMS Architecture,
how are the DB services organized?

There are several architectural options!

16
Client/Server Architectures
server process
Relational client process

Application

cursor SQL
management

client process
Object-Oriented
Application
server process
objects,
object/page cache pages, or
management files
DBMS

17
Object Server Architecture
server process
client process
Log/Lock Object
Object
Manager Manager
Cache
Application File/Index
Manager
objects
object references
queries Page Cache
Object Manager Page
method calls Manager
Cache
locks
log records Storage Allocation
Object and I/O
Cache

Database

18
Object Server Architecture - summary
• Unit of transfer: object(s)
• Server understands the object concept and can execute methods
• most DBMS functionality is replicated on client(s) and server(s)
• Advantages:
- server and client can run methods, workload can be balanced
- simplifies concurrency control design (centralized in server)
- implementation of object-level locking
- low cost for enforcing “constraints” on objects
• Disadvantages/problems:
- remote procedure calls (RPC) for object references
- complex server design
- client cache consistency problems
- page-level locking, objects get copied multiple times, large
objects

19
Page Server Architecture
client process

Application

Object
Object Manager
Cache server process
File/Index pages
Manager page references
Log/Lock Page Cache Page
locks
Page Cache Manager Manager Cache
log records
Manager
Storage Allocation
and I/O
Page
Cache

database

20
Page Server Architecture - summary
• unit of transfer: page(s)
• server deals only with pages (understands no object semantics)
• server functionality: storage/retrieval of page(s), concurrency
control, recovery
• advantages:
- most DBMS functionality at client
- page transfer -> server overhead minimized
- more clients can be supported
- object clustering improves performance considerably
• disadvantages/problems:
- method execution only on clients
- object-level locking
- object clustering
- large client buffer pool required
21
File Server Architecture

client process

Application server process

Object locks Log/Lock Space


Object Manager log records Allocation
Cache Manager
File/Index NFS
pages
Manager
page references
Page Page Cache
Cache Manager
database

22
File Server Architecture - summary
• unit of transfer: page(s)
• simplification of page server, clients use remote file system (e.g.,
NFS) to read and write DB page(s) directly
• server functionality: I/O handling, concurrency control, recovery
• advantages:
- same as for page server
- NFS: user-level context switches can be avoided
- NFS widely-used -> stable SW, will be improved
• disadvantages/problems:
- same as for page server
- NFS write are slow
- read operations bypass the server -> no request combination
- object clustering
- coordination of disk space allocation
23
Cache Consistency in Client/Server Architectures
Synchronous Asynchronous Deferred
Client sends 1 msg Client sends 1 msg Client sends all write
per lock to server; per lock to the server; lock requests to the
Client waits; Client continues; server at commit time;
Avoidance- Server replies with Server invalidates Client waits;
Based ACK or NACK. cached copies at Server replies when
Algorithms other clients. all cached copies are
freed.

Client sends object Client sends 1 msg Client sends all write
status query to server per lock to the server; lock requests to the
Detection-
for each access; Client continues; server at commit time;
Based
Client waits; After commit, the Client waits;
Algorithms
Server replies. server sends updates Server replies based
to all cached copies. on W-W conflicts only.

Best Performing Algorithms


24
Comparison of the 3 Client/Server Architectures
Page & File Server Object Server
• Simple server design • Complex server design
• Complex client design • “Relatively” simple client
• Fine grained design
concurrency control • Fine-grained concurrency
difficult control
• Very sensitive to • Reduces data movement,
client buffer pool size relatively insensitive to
and clustering clustering
• Sensitive to client buffer
Conclusions: pool size
• No clear winner
• Depends on object size and application’s object access pattern
• File server ruled out by poor NFS performance
25
Problems in Distributed DBMS Services
Distributed Database Design
Distributed Directory/Catalogue Mgmt
Distributed Query Processing and Optimization
Distributed Transaction Mgmt
– Distributed Concurreny Control
– Distributed Deadlock Mgmt
– Distributed Recovery Mgmt
directory management

query processing distributed DB design reliability (log)

concurrency control
(lock)
transaction
management
influences
deadlock management

26
Distributed Storage in Relational DBMSs
• horizontal fragmentation: distribution of “rows”, selection
• vertical fragmentation: distribution of “columns”, projection
• hybrid fragmentation: ”projected columns” from ”selected rows”
• allocation: which fragment is assigned to which node?
• replication: multiple copies at different nodes, how many copies?

• Design factors:
– Most frequent query access patterns
– Available distributed query processing algorithms
• Evaluation Criteria
– Cost metrics for: network traffic, query processing, transaction mgmt
– A system-wide goal: Maximize throughput or minimize latency

27
Distributed Storage in OODBMSs
• Must fragment, allocate, and replicate object data among nodes
• Complicating Factors:
– Encapsulation hides object structure
– Object methods (bound to class not instance)
– Users dynamically create new classes
– Complex objects
– Effects on garbage collection algorithm and performance
• Approaches:
– Store objects in relations and use RDBMS strategies to distribute data
• All objects are stored in binary relations [OID, attr-value]
• One relation per class attribute
• Use nested relations to store complex (multi-class) objects
– Use object semantics
• Evaluation Criteria:
– Affinity metric (maximize)
– Cost metric (minimize)
28
Horizontal Partitioning using Object Semantics

• Divide instances of a class into multiple groups


– Based on selected attribute values
Node1: Attr A < 100 Attr A >= 100 : Node2

– Based on subclass designation Class C

Node1: Subcl X Subcl Y : Node2

• Fragment an object’s complex attributes


from it’s simple attributes
• Fragment an object’s attributes based on
typical method invocation sequence
– Keep sequentially referenced attributes together

29
Vertical Partitioning using Object Semantics

• Fragment object attributes based on the


class hierarchy
Instances of Class X

Instances of Subclass subX

Instances of Subclass subsubX

Fragment #1
Fragment #2
Fragment #3

Breaks object encapsulation?

30
Path Partitioning using Object Semantics

• Use the object composition graph for complex


objects
• A path partition is the set of objects
corresponding to instance variables in the
subtree rooted at the composite object
– Typically represented in a ”structure index”
Composite
{SubC#1-OID, SubC#2-OID, …SubC#n-OID}
Object OID

31
More Issues in OODBMS Fragmentation
Local Object Data Remote Object Data

Local Good performance; Fetch the remote data and execute


Methods Issue: Replicate methods so they the methods;
are local to all instances? Issue: Time/cost for data transfer?
Send data to remote site, execute, Send additional input values via
Remote and return result RPC, execute and return result;
Methods OR fetch the method and execute; Issues: RPC time?
Issues: Time/cost for two Execution load on remote node?
transfers?
Ability to execute locally?

• Replication Options
– Objects
– Classes (collections of objects)
– Methods
– Class/Type specifications
32
Distributed Directory and Catalogue Management

• Directory Information:
– Description and location of records/objects
• Size, special data properties (e.g.,executable, DB type,
user-defined type, etc.)
• Fragmentation scheme
– Definitions for views, integrity constraints
• Options for organizing the directory:
– Centralized Issues: bottleneck, unreliable
– Fully replicated Issues: consistency, storage overhead

– Partitioned Issues: complicated access protocol, consistency


– Combination of partitioned and replicated e.g., zoned,
replicated zoned

33
Distributed Query Processing and Optimization
• Construction and execution of query plans, query optimization
• Goals: maximize parallelism (response time optimization)
minimize network data transfer (throughput optimization)
• Basic Approaches to distributed query processing:
Pipelining – functional decomposition Processing Timelines

Rel A Node 1 Node 2 Rel B Node 1:


Node 2:
Select A.x > 100 Join A and B on y

Parallelism – data decomposition Processing Timelines


Node 1:
Frag A.1 Node 1
Node 2:
Select A.x > 100 Node 3 Node 3:
Frag A.2 Node 2 Union

34
Creating the Distributed Query Processing Plan

• Factors to be considered:
- distribution of data
- communication costs
- lack of sufficient locally available information

• 4 processing steps:
(1) query decomposition
(2) data localization control site (uses global information)
(3) global optimization
(4) local optimization local sites (use local information)

35
Generic Layered Scheme for Planning a Dist. Query
calculus query on distributed objects
Global
query decomposition
schema
This approach was
control site

algebraic query on distributed objects


initially designed for
Fragment distributed relational
data localization DBMSs.
schema
fragment query It also applies to
Statistics on distributed OODBMSs,
global optimization fragments Multi-DBMSs, and
optimized fragment query
distributed Multi-
with communication operations DBMSs.
local sites

Local
local optimization schema

optimized local queries


36
Distributed Query Processing Plans in OODBMSs
• Two forms of queries
– Explicit queries written in OQL
– Object navigation/traversal } Reduce to logical
algebraic expressions

• Planning and Optimizing


– Additional rewriting rules for sets of objects
• Union, Intersection, and Selection
– Cost function
• Should consider object size, structure, location, indexes, etc.
• Breaks object encapsulation to obtain this info?
• Objects can ”reflect” their access cost estimate
– Volatile object databases can invalidate optimizations
• Dynamic Plan Selection (compile N plans; select 1 at runtime)
• Periodic replan during long-running queries
37
Distributed Query Optimization
• information needed for optimization (fragment statistics):
- size of objects, image sizes of attributes
- transfer costs
- workload among nodes
- physical data layout
- access path, indexes, clustering information
- properties of the result (objects) - formulas for estimating the
cardinalities of operation results
• execution cost is expressed as a weighted combination of I/O,
CPU, and communication costs (mostly dominant).

total-cost = CCPU * #insts + CI/O * #I/Os + CMSG * #msgs + CTR * #bytes

Optimization Goals: response time of single transaction or system throughput

38
Distributed Transaction Managment
• Transaction Management (TM) in centralized DBS
– Achieves transaction ACID properties by using:
• concurrency control (CC)
• recovery (logging)
• TM in DDBS
– Achieves transaction ACID properties by using:

Transaction Manager
Log Manager
Concurrency Control Recovery Protocol
(Isolation) (Durability)
Buffer Manager

Commit/Abort Protocol Replica Control Protocol


(Atomicity) (Mutual Consistency) Strong algorithmic
dependencies

39
Classification of Concurrency Control Approaches
Isolation
CC Approaches

Pessimistic Optimistic

Locking Timestamp Hybrid Timestamp


Locking
Ordering Ordering

Centralized Basic

phases of pessimistic validate read compute write


Primary Multi- transaction execution
Copy Version

phases of optimistic read compute validate write


Distributed Conser-
vative transaction execution

40
Two-Phase-Locking Approach (2PL) Isolation
Obtain Lock
2PL Lock Graph

Number of locks
Release Lock

BEGIN LOCK END Transaction


POINT Duration

Obtain Lock
Number of locks

Release Lock

Strict 2PL Lock Graph

BEGIN END Transaction


Period of data item use Duration
41
Communication Structure of Centralized 2PL
Isolation
Data Processors at Coordinating TM Central Site LM
participating sites

1
Lock Request
2
Lock Granted

3
Operation

4
End of Operation
5
Release Locks

42
Communication Structure of Distributed 2PL
Isolation

Data Processors at Participating LMs Coordinating TM


participating sites

1
Lock Request
2
Operation

3
End of Operation

4
Release Locks

43
Distributed Deadlock Management Isolation

Example
waits for T1 x
to release Lc
Site X
T1 x T2 x Distributed Waits-For Graph
holds lock Lc holds lock Lb
- requires many messages
T1 x needs a to update lock status
T2 y needs b
waits for T1 - combine the “wait-for”
waits for T2
on site y
on site x tracing message with lock
to complete
to complete status update messages
waits for T2 y
- still an expensive algorithm
to release Ld
Site Y T1 y T2 y
holds lock La holds lock Ld

44
Communication Structure of Centralized 2P Commit Protocol
Atomicity
Coordinating TM Participating Sites

1
Prepare
Make local commit/abort decision
Write log entry
2
Vote-Commit or Vote-Abort
Count the votes
If (missing any votes
or any Vote-Abort)
Then Global-Abort
Else Global-Commit Who participates?
3
Depends on the CC alg.
Global-Abort or Global-Commit
Write log entry
4 Other communication
ACK structures are
possible:
– Linear
– Distributed
45
State Transitions for 2P Commit Protocol Atomicity
Advantages:
Coordinating TM Participating Sites
- Preserves atomicity
- All processes are
Initial “synchronous within
1
Prepare
one state transition”
Initial
Wait 2 Disadvantages:
Vote-Commit
or Vote-Abort
- Many, many messages
3
Global-Abort or Ready - If failure occurs,
Global-Commit the 2PC protocol blocks!
Attempted solutions for the
Commit Abort 4
ACK blocking problem:
Commit Abort 1) Termination Protocol
2) 3-Phase Commit
3) Quorum 3P Commit

46
Failures in a Distributed System
Types of Failure:
– Transaction failure
– Node failure
– Media failure
– Network failure
• Partitions each containing 1 or more sites
Who addresses the problem?
Issues to be addressed:
– How to continue service Termination Protocols
– How to maintain ACID properties Modified Concurrency Control
while providing continued service & Commit/Abort Protocols
– How to ensure ACID properties after Recovery Protocols,
recovery from the failure(s) Termination Protocols,
& Replica Control Protocols

47
Termination Protocols Atomicity, Consistency
Use timeouts to detect
Coordinating TM Participating Sites
potential failures that could
block protocol progress
Initial
1 Timeout states:
Prepare Coordinator: wait, commit, abort
Initial Participant: initial, ready
Wait
2
Vote-Commit Coordinator Termination Protocol:
3
or Vote-Abort Wait – Send global-abort
Global-Abort or Ready Commit or Abort – BLOCKED!
Global-Commit
Participant Termination Protocol:
Commit Abort 4 Ready – Query the coordinator
ACK
Commit Abort If timeout
then query other participants;
If global-abort | global-commit
then proceed and terminate
else BLOCKED!
48
Replica Control Protocols Consistency

• Update propagation of committed write operations


Number of locks

STRICT UPDATE
LAZY UPDATE

BEGIN END
Obtain Locks Period of data use 2-Phase Commit Release
Locks

COMMIT POINT

49
Strict Replica Control Protocol Consistency

• Read-One-Write-All (ROWA)
• Part of the Concurrency Control Protocol
and the 2-Phase Commit Protocol
– CC locks all copies
– 2PC propagates the updated values with 2PC
messages (or an update propagation phase is
inserted between the wait and commit states
for those nodes holding an updateable value).

50
Lazy Replica Control Protocol Consistency

• Propagates updates from a primary node.


• Concurrency Control algorithm locks the
primary copy node (same node as the
primary lock node).
• To preserve single copy semantics, must ensure
that a transaction reads a current copy.
– Changes the CC algorithm for read-locks
– Adds an extra communication cost for reading data
• Extended transaction models may not require
single copy semantics.

51
Recovery in Distributed Systems Atomicity, Durability

Select COMMIT or ABORT (or blocked) for each interrupted subtransaction


Commit Approaches:
Redo – use the undo/redo log to perform all the write operations again
Retry – use the transaction log to redo the entire subtransaction (R + W)
Abort Approaches:
Undo – use the undo/redo log to backout all the writes that were actually
performed
Compensation – use the transaction log to select and execute ”reverse”
subtransactions that semantically undo the write operations.
Implementation requires knowledge of:
– Buffer manager algorithms for writing updated data
from volatile storage buffers to persistent storage
– Concurrency Control Algorithm
– Commit/Abort Protocols
– Replica Control Protocol
52
Network Partitions in Distributed Systems
Partition #2
N1 N2
N3

N8
N4
network
Partition #1
N7 N5
Partition #3
N6

Issues:
 Termination of interrupted transactions
 Partition integration upon recovery from a network failure
– Data availability while failure is ongoing

53
Data Availability in Partitioned Networks

• Concurrency Control model impacts data availability.


• ROWA – data replicated in multiple partitions is not
available for reading or writing.
• Primary Copy Node CC – can execute transactions if
the primary copy node for all of the read-set and all
of the write-set are in the client’s partition.

Availability is still very limited . . . We need a new idea!

54
Quorums ACID

• Quorum – a special type of majority


• Use quorums in the Concurrency Control,
Commit/Abort, Termination, and Recovery Protocols
– CC uses read-quorum & write-quorum
– C/A, Term, & Recov use commit-quorum & abort-quorum
• Advantages:
– More transactions can be executed during site failure and
network failure (and still retain ACID properties)
• Disadvantages:
– Many messages are required to establish a quorum
– Necessity for a read-quorum slows down read operations
– Not quite sufficient (failures are not “clean”)

55
Read-Quorums and Write-Quorums Isolation

• The Concurrency Control Algorithm serializes valid


transactions in a partition. It must obtain
– A read-quorum for each read operation
– A write-quorum for each write operation
• Let N=total number of nodes in the system
• Define the size of the read-quorum Nr and the size
of the write-quorum Nw as follows:
– Nr + Nw > N Simple Example:
N=8 Nr = 4 Nw = 5
– Nw > (N/2)
• When failures occur, it is possible to have a valid
read quorum and no valid write quorum
56
Commit-Quorums and Abort-Quorums ACID

• The Commit/Abort Protocol requires votes from all


participants to commit or abort a transaction.
– Commit a transaction if the accessible nodes can form a commit-
quorum
– Abort a transaction if the accessible nodes can form an abort-quorum
– Elect a new coordinator node (if necessary)
– Try to form a commit-quorum before attempting
to form an abort-quorum
• Let N=total number of nodes in the system
• Define the size of the commit-quorum Nc and the size of
abort-quorum Na as follows: Simple Examples:
– Na + Nc > N; 0  Na, Nc  N N=7 Nc = 4 Na = 4
N=7 Nc = 5 Na = 3

57
Conclusions
• Nearly all commerical relational database systems offer some
form of distribution
– Client/server at a minimum
• Only a few commercial object-oriented database systems
support distribution beyond N clients and 1 server

• Future research directions:


– Distributed data architecture and placement schemes (app-influenced)
– Distributed databases as part of Internet applications
– Continued service during disconnection or failure
• Mobile systems accessing databases
• Relaxed transaction models for semantically-correct continued operation

58

You might also like