Unit 3
Unit 3
Syllabus: SQL: Data definition in SQL, update statements and views in SQL: Data storage and definitions,
Data retrieval queries and update statements, Query Processing & Query Optimization: Overview,
measures of query cost, selection operation, sorting, join, evaluation of expressions, transformation of
relational expressions, estimating statistics of expression results, evaluation plans. Case Study of ORACLE
and DB2.
Course Objective: Students can use SQL operations to manipulate the database and learn how to design
and create a good database using functional dependencies and normalization.
Course Outcome: Analyze and renovate an information model into a relational database schema and to
use a DDL, DML and DCL utilities to implement the schema using a DBMS.
_______________________________________________________________________________________
SQL Command
DDL: Data Definition Language
All DDL commands are auto-committed. That means it saves all the changes permanently in the database.
Command Description
Create to create new table or database
Alter for alteration
Truncate delete data from a table
Drop to drop a table
Rename to rename a table
Table 3.1: DDL Commands
DML: Data Manipulation Language
DML commands are not auto-committed. It means changes are not permanent to database, they can be
rolled back.
Command Description
Insert to insert a new row
Update to update existing row
Delete to delete a row
Merge merging two rows or two tables
Table 3.2: DML Commands
TCL: Transaction Control Language
These commands are used to keep a check on other commands and their effect on the database. These
commands can terminate changes made by other commands by rolling back to original state. It can also
make changes permanent.
Command Description
Creating a Table
create command is also used to create a table. We can specify names and datatypes of various columns
along.
Syntax:
create table table-name
{
column-name1 datatype1,
column-name2 datatype2,
column-name3 datatype3,
column-name4 datatype4
};
create table command will tell the database system to create a new table with given table name and
column information.
Example: create table Student (id int, name varchar, age int);
alter command: Used for alteration of table structures. There are various uses of alter command, such as,
to add a column to the existing table
to rename any existing column
to change the datatype of any column or to modify its size
to drop a column
Using alter command we can add a column to an existing table.
Syntax: alter table table-name add (column-name datatype);
Example: alter table Student add (address char);
To Rename a column: Using alter command user can rename an existing column.
Syntax: alter table table-name rename old-column-name to column-name;
Example: alter table Student rename address to Location;
truncate command
The truncate command removes all records from a table. But this command will not destroy the table's
structure. When we apply truncate command on a table its Primary key is initialized.
Syntax: truncate table table-name
Example: truncate table Student;
drop command
drop query completely removes a table from the database. This command will also destroy the table
structure.
Syntax: drop table table-name
Example: drop table Student;
rename command
rename command is used to rename a table.
Syntax: rename table old-table-name to new-table-name
Example: rename table Student to Student-record;
DML Commands
1) INSERT command
Insert command is used to insert data into a table.
Syntax: INSERT into table-name values (data1, data2,)
Example:
Consider a table Student with following fields.
S_id S_Name age
2) UPDATE command
Update command is used to update a row of a table.
Syntax: UPDATE table-name set column-name = value where condition;
Example:
update Student set age=18 where s_id=102;
S_id S_Name age
101 Adam 15
102 Alex 18
103 chris 14
Example:
UPDATE Student set s_name='Abhi’, age=17 where s_id=103;
The above command will update two columns of a record.
3) Delete command
Delete command is used to delete data from a table. Delete command can also be used with the condition
to delete a particular row.
Syntax: DELETE from table-name;
Example: DELETE from Student;
The above command will delete all the records from Student table.
TCL command
Transaction Control Language (TCL) commands are used to manage transactions in the database. These are
used to manage the changes made by DML statements. It also allows statements to be grouped together
into logical transactions.
Commit command
Commit command is used to permanently save any transaction into the database.
Syntax: commit;
Rollback command
This command restores the database to last committed state. It is also used with savepoint command to
jump to a save point in a transaction.
Syntax: rollback to save point-name;
Savepoint command
savepoint command is used to temporarily save a transaction so that you can rollback to that point
whenever necessary.
Syntax: savepoint savepoint-name;
DCL command
System: creating a session, table etc. are all types of system privilege.
Object: any command or query to work on tables comes under object privilege.
DCL defines two commands,
Grant: Gives user access privileges to the database.
Revoke: Take back permissions from the user.
Example: grant create session to username;
S_id s_Name Age Address
Syntax:
SELECT column-name1, column-name2, column-name3, column-name N
from table-name WHERE [condition];
Example: SELECT s_id, s_name, age, address from Student WHERE s_id=101;
SELECT Query
The Select query is used to retrieve data from a table. It is the most used SQL query. We can retrieve
complete tables, or partial by mentioning conditions using WHERE clause.
Syntax:
SELECT column-name1, column-name2, column-name3, column-nameN from table-name;
Example: SELECT s_id, s_name, age from Student.
Like clause
Like clause is used as a condition in SQL query. Like clause compares data with an expression using
wildcard operators. It is used to find similar data from the table.
Wildcard operators
There are two wildcard operators that are used in like clause.
Percent sign % represents zero, one or more than one character.
Underscore sign _ represents only one character.
Example: SELECT * from Student where s_name like 'A%';
Order by Clause
Order by clause is used with a Select statement for arranging retrieved data in sorted order. The Order by
clause by default sort data in ascending order. To sort data in descending order DESC keyword is used with
Order by clause.
Group by Clause
Group by clause is used to group the results of a SELECT query based on one or more columns. It is also
used with SQL functions to group the result from one or more tables.
Syntax:
SELECT column_name, function (column_name)
FROM table_name
WHERE condition
GROUP BY column_name
Example select name, salary
from Emp
where age > 25
group by salary
HAVING Clause
Having clause is used with SQL Queries to give a more precise condition for a statement. It is used to
mention condition in Group based SQL functions, just like WHERE clause.
Syntax:
select column_name, function(column_name)
FROM table_name
WHERE column_name condition
GROUP BY column_name
HAVING function (column_name) condition
Example SELECT *
from sale group customer
having sum(previous_balance) > 3000
Distinct keyword
The distinct keyword is used with a Select statement to retrieve unique values from the table. Distinct
removes all the duplicate records while retrieving from the database.
Syntax: SELECT distinct column-name from table-name;
Example: select distinct salary from Emp;
SQL Constraints
SQL Constraints are rules used to limit the type of data that can go into a table, to maintain the accuracy
and integrity of the data inside the table.
Constraints are used to make sure that the integrity of data is maintained in the database.
Following are the most used constraints that can be applied to a table:
NOT NULL Constraint
NOT NULL constraint restricts a column from having a NULL value. Once NOT NULL constraint is applied to
a column, you cannot pass a null value to that column.
Example: CREATE table Student (s_id int NOT NULL, Name varchar (60), Age int);
UNIQUE Constraint
UNIQUE constraint ensures that a field or column will only have unique values. A UNIQUE constraint field
will not have duplicate data.
Example: CREATE table Student (s_id int NOT NULL UNIQUE, Name varchar (60), Age int);
CHECK Constraint
A check constraint is used to restrict the value of a column between a range. It performs check on the
values, before storing them into the database. It’s like condition checking before saving data into a column.
Example: create table Student (s_id int NOT NULL CHECK (s_id > 0), Name varchar (60) NOT NULL,Age int);
SQL Functions
SQL provides many built-in functions to perform operations on data. These functions are useful while
performing mathematical calculations, string concatenations, sub-strings etc. SQL functions are divided
into two categories:
Aggregate Functions: These functions return a single value after calculating from a group of values.
Aggregate functions includes: MAX(), MIN(), SUM(), AVG () and COUNT ()
Scalar Functions: Scalar functions return a single value from an input value.
UCASE (): Used to convert the value of string column to the uppercase character.
Join in SQL
SQL Join is used to fetch data from two or more tables, which is joined to appear as a single set of data.
SQL Join is used for combining column from two or more tables by using values common to both tables.
Join Keyword is used in SQL queries for joining two or more tables. Minimum required condition for joining
table is (n-1) where n, is a number of tables. A table can also join to itself known as, Self-Join.
Now, let us join these two tables in our SELECT statement as shown below
SQL> SELECT ID, NAME, AGE, AMOUNT FROM CUSTOMERS, ORDERS WHERE CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;
This query will produce below result
+----+----------+-----+--------+
| ID | NAME | AGE | AMOUNT |
+----+----------+-----+--------+
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+----+----------+-----+--------+
Here, it is noticeable that the join is performed in the WHERE clause. Several operators can be used to join
tables, such as =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; they can all be used to join tables. However,
the most common operator is the equal to symbol.
Below are the different types of Join available in SQL:
INNER JOIN returns rows when there is a match in both tables.
LEFT JOIN returns all rows from the left table, even if there are no matches in the right table.
RIGHT JOIN returns all rows from the right table, even if there are no matches in the left table.
FULL JOIN returns rows when there is a match in one of the tables.
SELF JOIN is used to join a table to itself as if the table were two tables, temporarily renaming at
least one table in the SQL statement.
CARTESIAN JOIN returns the Cartesian product of the sets of records from the two or more joined
tables.
Figure
3.1: Join Concept
Example:
Orders
OrderI CustomerID EmployeeID OrderDate ShipperID
D
Customers
Custom CustomerN ContactN Addr Ci PostalC Coun
erID ame ame ess ty ode try
Table 3.7: Join Command Example
Inner Join:
1. SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
2. SELECT Orders.OrderID, Customers.CustomerName, Shippers.ShipperName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID)
INNER JOIN Shippers ON Orders.ShipperID = Shippers.ShipperID);
Left Join:
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;
Right Join:
SELECT Orders.OrderID, Employees.LastName, Employees.FirstName
FROM Orders
RIGHT JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
ORDER BY Orders.OrderID;
Self-Join:
SELECT A.CustomerName AS CustomerName1, B.CustomerName AS CustomerName2, A.City
FROM Customers A, Customers B
WHERE A.CustomerID <> B.CustomerID
AND A.City = B.City
ORDER BY A.City;
Query Processing
Query Processing is a procedure of transforming a high-level query (such as SQL) into a correct and
efficient execution plan expressed in low-level language. A query processing selects a most appropriate
plan that is used in responding to a database request. When a database system receives a query for update
or retrieval of information, it goes through a series of compilation steps, called execution plan.
Query Analyzer
The syntax analyzer takes the query from the users, parses it into tokens and analyses the tokens
and their order to make sure they follow the rules of the language grammar.
If an error is found in the query submitted by the user, it is rejected and an error code together with
an explanation of why the query was rejected is return to the user.
A simple form of the language grammar that could use to implement SQL statement is given below:
QUERY = SELECT + FROM + WHERE
SELECT = ‘SELECT’ + <CLOUMN LIST>
FROM = ‘FROM’ + <TABLE LIST>
WHERE = ‘WHERE’ + VALUE1 OP VALUE2
VALUE1 = VALUE / COLUMN NAME
VALUE2 = VALUE / COLUMN NAME
OP = >, <, >=, <=, =, <>
Query Decomposition
The query decomposition is the first phase of the query processing whose aims are to transfer the high-
level query into a relational algebra query and to check whether that query is syntactically and semantically
correct.
Thus, the query decomposition is start with a high-level query and transform into query graph of low-
level operations, which satisfy the query.
The SQL query is decomposed into query blocks (low-level operations), which form the basic unit.
Hence nested queries within a query are identified as separate query blocks.
The query decomposer goes through five stages of processing for decomposition into low-level
operation and translation into algebraic expressions.
The three relations PROJECT, DEPARTMENT, EMPLOYEE is representing as a leaf nodes P, D and E, while
the relational algebra operations of the represented by internal tree nodes.
Same SQL query can have man different relational algebra expressions and hence many different query
trees.
The query parser typically generates a standard initial (canonical) query tree.
2) Query Normalization:
The primary phase of the normalization is to avoid redundancy. The normalization phase converts the
query into a normalized form that can be more easily manipulated.
In the normalization phase, a set of equivalency rules are applied so that the projection and selection
operations included on the query are simplified to avoid redundancy.
The projection operation corresponds to the SELECT clause of SQL query and the selection operation
correspond to the predicate found in WHERE clause.
The equivalency transformation rules that are applied to SQL query is shown in following
table, in which UNARYOP means UNARY operation, BINOP means BINARY operation and REL1,
REL2, REL3 are the relations.
As an employee cannot be both ‘Programmer’ and ‘Analyst’ simultaneously, the above predicate on
the EMPLOYEE relation is contradictory.
4. Query Simplifier:
The objectives of this phase are to detect redundant qualification, eliminate common sub-expressions and
transform sub-graph to semantically equivalent but easier and efficiently computed form.
Commonly integrity constraints view definitions, and access restrictions are introduced into the graph at
this stage of analysis so that the query must be simplified as much as possible.
Integrity constraints define constants which must hold for all state of the database, so any query that
contradicts integrity constraints must be avoided and can be rejected without accessing the database.
The final form of simplification is obtaining by applying idempotency rules of Boolean algebra.
P ∧ (P) = P
Description Rule Format
P ∨ TRUE = P
1. PRED AND PRED = PRED
P ∧ FALSE = FALSE
2. PRED AND TRUE = PRED
P ∧ (~P) = FALSE
3. PRED AND FALSE = FALSE
P1 ∧ (P1 ∨ P2) = P1
4. PRED AND NOT(PRED) = FALSE
P ∨ (P) = P
5. PRED1 AND (PRED1 OR PRED2) = PRED1
P ∨ TRUE = TRUE
6. PRED OR PRED = PRED
P ∨ FALSE = P
7. PRED OR TRUE = TRUE
P ∨ (~P) = TRUE
8. PRED OR FALSE = PRED
P1 ∨ (P1 ∧ P2) = P1
9. PRED OR NOT(PRED) = TRUE
10. PRED1 OR (PRED1 AND PRED2) = PRED1
Table 3.9: Boolean Algebra Rules
5) Query Restructuring:
In the final stage of the query decomposition, the query can be restructured to give a more efficient
implementation.
Transformation rules are used to convert one relational algebra expression into an equivalent form that
is more efficient.
The query can now be regarded as a relational algebra program, consisting of a series of operations on
relation.
Query Optimization
The primary goal of query optimization is of choosing an efficient execution strategy for processing a query.
The query optimizer attempts to minimize the use of certain resources (mainly the number of I/O and
CPU time) by selecting a best execution plan (access plan).
A query optimization start during the validation phase by the system to validate the user has
appropriate privileges.
Now an action plan is generating to perform the query.
Relational algebra query tree generated by the query simplifier module of query decomposer.
Estimation formulas used to determine the cardinality of the intermediate result table.
A cost Model
Statistical data from the database catalogue.
The output of the query optimizer is the execution plan in form of optimized relational algebra query.
A query typically has many possible execution strategies, and the process of choosing a suitable one for
processing a query is known as Query Optimization.
The basic issues in Query Optimization are:
· How to use available indexes.
· How to use memory to accumulate information and perform immediate steps such as sorting.
· How to determine the order in which joins should be performed.
The term query optimization does not mean giving always an optimal (best) strategy as the execution
plan.
It is just a responsibly efficient strategy for execution of the query.
The decomposed query block of SQL is translating into an equivalent extended relational algebra
expression and then optimized.
There are two main techniques for implementing Query Optimization:
The first technique is based on Heuristic Rules for ordering the operations in a query execution strategy.
The second technique involves the systematic estimation of the cost of the different execution strategies
and choosing the execution plan with the lowest cost.
Semantic query optimization is used with the combination with the heuristic query transformation
rules.
It uses constraints specified on the database schema such as unique attributes and other more complex
constraints, in order to modify one query into another query that is more efficient to execute.
1. Heuristic Rules:
The heuristic rules are used as an optimization technique to modify the internal representation of the
query. Usually, heuristic rules are used in the form of query tree of query graph data structure, to
improve its performance.
One of the main heuristic rules is to apply SELECT operation before applying the JOIN or other BINARY
operations.
This is because the size of the file resulting from a binary operation such as JOIN is usually a multi-value
function of the sizes of the input files.
The SELECT and PROJECT reduced the size of the file and hence, should be applied before the JOIN or
other binary operation.
Heuristic query optimizer transforms the initial (canonical) query tree into final query tree using
equivalence transformation rules. This final query tree is efficient to execute.
5. Commutativity of ⋈ AND x:
Л A1,A2,A3…An (σ C (R) ) = σ C (Л A1,A2,A3…An (R))
R⋈cS=S⋈cR
6. Commuting σ with ⋈ or x:
RxS=SxR
σ c (R ⋈ S) = (σ c (R) ) ⋈ S
If all attributes in selection condition c involved only attributes of one of the relation schemas (R).
Alternatively, selection condition c can be written as (c1 AND c2) where condition c1 involves only
σ c (R ⋈ S) = (σ c1 (R) ) ⋈ (σ c2 (S) )
attributes of R and condition c2 involves only attributes of S then:
7. Commuting Л with ⋈ or x:
The projection list L = {A1,A2,..An,B1,B2,…Bm}.
A1…An attribute of R and B1…Bm attributes of S.
-R⋃S=S⋃R
8. Commutativity of SET Operation:
-R⋂S=S⋂R
9. Associatively of ⋈, x, ⋂, and ⋃:
Minus (R-S) is not commutative.
- If ∅ stands for any one of these operation throughout the expression then:
(R ∅ S) ∅ T = R ∅ (S ∅ T)
σ c (R ∅ S) = (σ c (R)) ⋃ (σ c (S))
Л c (R ∅ S) = (Л c (R)) ⋃ (Лc (S))
11. The Л operation comute with ⋃:
Л L (R ⋃ S) = (Л L(R)) ⋃ (Л L(S))
12. Converting a (σ,x) sequence with ⋃:
(σ c (R x S)) = (R ⋈ c S)