Introduction To Databases: Name: Akanksha Sharma
Introduction To Databases: Name: Akanksha Sharma
Databases
Name : Akanksha Sharma
Database Concepts
Data ?
Data refers to a collection of natural phenomena descriptors, including the
results of experience, observation or experiment, or a set of premises. This may
consist of numbers, words, or images, particularly as measurements or
observations of a set of variables.
Information ?
Information is a quality of a message from a sender to one or more receivers.
Information is always about something like size of a parameter, occurrence of
an event, etc. Information does not have to be accurate. It may be a truth or a
lie, or just the sound of a falling tree.
Knowledge?
Knowledge is used to mean the confident understanding of a subject with the
ability to use it for a specific purpose if appropriate .
DBMS
DBMS is a set of software programs that control the organization,
storage, management, and retrieval of data in a database.
DBMS includes:
A modeling language to define the schema of each database
hosted in the DBMS, according to the DBMS data model.
Data structures (fields, records, files and objects) optimized to deal
with very large amounts of data stored on a permanent data storage
device.
A database query language and report writer to allow users to
interactively interrogate the database, analyze its data and update it
according to the users privileges on data.
A transaction mechanism, that would guarantee the ACID
properties, in order to ensure data integrity, despite concurrent user
accesses (concurrency control), and faults (fault tolerance).
Typical Database Applications
Traditional (Employee, student, product database)
Online Shopping
Search Engines
Data Warehousing (OLAP)
Data Mining
Geographical Information Systems
Data-Level Models
Flat File Structure
A database with a single table is called a flat file structure. A flat-file structure is good only
for extremely simple databases and not practical for most business applications. Many
spreadsheets include some database features like sorting entries and counting or
summarizing entries that meet certain criteria.
Hierarchical Data Model
The hierarchical data model is set up like a "forest" or collection of tree structures. The
hierarchical data model is a special case of the network data model. This data model is
very efficient for certain kinds of applications where the data to be modeled is also like a
tree. The best-known hierarchical database management system is IBM's IMS.
Network Data Model
The network data model is similar to the entity-relationship model with all relationships
restricted to be binary, many-one relationships. This restriction allows a simple directed
graph model to be used. The network data model is fast, but it is difficult to conceptualize
complex data structures using this model. An example of a network database management
system is IDMS.
Relational Data Model
Relational model is based on predicate logic and set theory. You have sets of statements
of fact, and the underlying system can determine new sets of facts .The real power comes
from your complete control over determining new facts. All relationships between facts are
explicit in the database, and the command language can use and manipulate them. The
mathematics behind the model make this manipulation feasible.
RDBMS
Relation: Two dimensional table
The relation itself corresponds to our familiar notion of a table:
A relation is a collection of tuples, each of which contains
values for a fixed number of attributes. Relations are
sometimes referred to as flat files, because of their
resemblance to an unstructured sequence of records. Each
Vendor Global
tuple in a relation must be unique -- that is, there can be no Revenue
duplicates.
Oracle 7,312
Attribute: Table column
Other commonly used terms for attribute are 'property' and IBM 3,483
'field.' The set of permissible values for each attribute is called
the domain for that attribute. Microsoft 3,052
Tuple: Table row Sybase 524
A tuple is an instance of an entity or relationship or whatever
is represented by the relation.
NCR 457
Key: A single attribute or combination of attributes whose Teradata
values uniquely identify the tuples of the relation. That is,
each row has a different value for the key attribute(s). The
relational model requires that every relation have a key and
that for any tuple in the relation, the key fields have non-null
values -- no two tuples may have the same key value and
every tuple must have a value for the key attribute.
Case Study : Oracle
Case Study : Oracle
Oracle Database Fundamentals
Oracle stores each data item in its own field . In Oracle, the fields relating to a particular person, thing, or event are
bundled together to form a single, complete unit of data, called a record . Each record is made up of a number of
fields. No two fields in a record can have the same field name. Oracle stores records relating to each other in a
table. A table consists of a number of records . Each field occupies one column and each record occupies one row
. Different tables are created for the various groups of information. Every table in Oracle has a field or a
combination of fields that uniquely identifies each record in the table. When a field in one table matches the
primary key of another table, the field is referred to as a foreign key. When a foreign key exists in a table, the
foreign key's table is sometimes referred to as a lookup table .
Inserting Data
- insert into tablename (columnname, columnname, ...) values (somevalue, somevalue, ...);
Selecting Data
- select columnname, columnname... from tablename;
- select "First Name"||' '||'Last Name' 'Full Name' from employee where deptid=1 and salary>5000;
Updating Data
- update tablename set columnname=somevalue where conditions;
- delete from tablename where conditions;
- drop table tablename;
Case Study : Microsoft
RDBMS Concepts
Using Oracle PL/SQL
Basic Structure of PL/SQL
Variables and Types
Simple PL/SQL Programs
Control Flow in PL/SQL
Basic Structure
PL/SQL stands for Procedural Language/SQL. It extends SQL by adding constructs found
in procedural languages, resulting in a structural language that is more powerful than SQL.
The basic unit in PL/SQL is a block. All PL/SQL programs are made up of blocks, which
can be nested within each other and each block performs a logical action in he program. A
block has the following structure:
DECLARE
/* Declarative section: variables, types, and local subprograms. */
BEGIN
/* Executable section: procedural and SQL statements go here. */
/* This is the only section of the block that is required. */
EXCEPTION
/* Exception handling section: error handling statements go here. */
END;
Only the executable section is required. The other sections are optional.
The only SQL statements allowed in a PL/SQL program are SELECT, INSERT, UPDATE,
DELETE and several other data manipulation statements plus some transaction control.
Variables
The DECLARE section defines and (optionally) initialises
variables. If not initialised specifically they default to NULL.
DECLARE
number1 NUMBER(2);
number2 NUMBER(2) := 17;
text VARCHAR2(12) := 'Hello world';
text DATE := SYSDATE; -- current date and time
BEGIN
SELECT street_number
INTO number1
FROM address
WHERE name = 'Smith';
END;
DECLARE
a NUMBER;
b NUMBER;
BEGIN
SELECT e,f INTO a,b FROM T1 WHERE e>1;
INSERT INTO T1 VALUES(b,a);
END;
.
run;
Control Flow in PL/SQL
An IF statement looks like:
LOOP
END LOOP;
DECLARE DECLARE
a NUMBER; i NUMBER := 1;
b NUMBER; BEGIN
BEGIN LOOP
SELECT e,f INTO a,b FROM T1 INSERT INTO T1 VALUES(i,i);
WHERE e>1; i := i+1;
IF b=1 THEN EXIT WHEN i>100;
INSERT INTO T1 VALUES(b,a); END LOOP;
ELSE END;
INSERT INTO T1 .
VALUES(b+10,a+10); run;
END IF;
END;
.
run;
Joins
CROSS JOIN (Cartesian product) is the simplest join;
INNER JOIN (sometimes called the "EQUI-JOIN")
where tables are combined based on a common
column;
OUTER JOIN which involves combining all rows of
one table with only matching rows from the other
table;
SELF JOIN which is a table joined to itself.
Cross Join
A cross join returns the cartesian product of the sets of records
from the two joined tables. If A and B are two sets, then cross
join = A × B.
Examples :
Explicit –
SELECT *
FROM employee CROSS JOIN department
Implicit –
SELECT *
FROM employee, department;
Inner Joins
An equi-join, also known as an equijoin, is a specific
type of comparator-based join, or theta join, that
uses only equality comparisons in the join-predicate.
Using other comparison operators (such as <)
disqualifies a join as an equi-join.
Example –
SELECT *
FROM employee
INNER JOIN department
ON employee.DepartmentID =
department.DepartmentID
Natural join
Outer Joins
Left outer join
The result of a left outer join for tables A and B always contains
all records of the "left" table (A), even if the join-condition does
not find any matching record in the "right" table (B). This means
that a left outer join returns all the values from the left table, plus
matched values from the right table (or NULL in case of no
matching join predicate).
Example –
SELECT *
FROM employee LEFT OUTER JOIN department
ON employee.DepartmentID = department.DepartmentID
Outer Joins
Right outer join
Every record from the "right" table (B) will appear in the joined
table at least once. If no matching row from the "left" table (A)
exists, NULL will appear in columns from A for those records that
have no match in A. A right outer join returns all the values from
the right table and matched values from the left table (NULL in
case of no matching join predicate).
Example –
SELECT *
FROM employee RIGHT OUTER JOIN department
ON employee.DepartmentID = department.DepartmentID
Self Join
A self-join is simply a Employees
normal SQL join that joins
one table to itself. This is EmployeeID EmployeeName ManagerID
64 Claire Smith-Jones 63
65 Grover Rivers 63
Example –
SELECT E1.EmployeeName AS Employee,
E2.EmployeeName AS Manager
FROM Employees AS E1 INNER JOIN Employees AS E2
ON E1.ManagerID = E2.EmployeeID
Normalization
Introduction
Entity: The word ‘entity’ is the general name for the information that is to be stored within a
single table. Information about the entities is known as attributes.
Primary key: A primary key uniquely identifies a row of data found within a table. When
multiple attributes are used to derive a primary key, this key is known as a concatenated
primary key.
Relationship:
one-to-one (1:1) - A one-to-one relationship signifies that each instance of a given entity
relates to exactly one instance of another entity.
one-to-many (1:M) - A one-to-many relationship signifies that each instance of a given
entity relates to one or more instances of another entity.
many-to-many (M:N) - A many-to-many relationship signifies that many instances of a
given entity relate to many instances of another entity.
Foreign key: A foreign key forms the basis of a 1:M relationship between two tables. The
foreign key can be found within the M table, and maps to the primary key found in the 1
table.
The Three Normal Forms
First Normal Form
A table is in first normal form (1NF) if there are no repeating
groups.
How to Normalize ?
- Remove the repeating group of attributes to form a new entity
- Add to it the original key
The Three Normal Forms
Second Normal Form
A table is in Second Normal Form(2NF) if it is in 1NF and each
non-key field is functionally dependent on the entire primary key.
How to Normalize ?
- Examine tables with a composite key (a key made up of two
parts)
- For each non-key attribute, determine if its key is the first part,
or the second part, or if neither then the answer is both parts
- Remove the partial key and its dependents to form a new table
The Three Normal Forms
Third Normal Form
A table is in Third Normal Form(3NF) if it is in 2NF and there are
no transitive dependencies.
How to Normalize ?
- Identify any dependencies between non-key attributes within
each table
- Remove them to form a new table
- Promote one of the attributes to be the key of the new table
- This becomes the Foreign Key link in the original table (shown
with a *).
Example
Department: ( DepartmentName, SupervisorNumber ) – SupervisorNumber is a
foreign key
Supervisor: ( SupervisorNumber, SupervisorName )
EmployeeDepartment: ( DepartmentName, EmployeeNumber, StartDate )
Employee: ( EmployeeNumber, EmployeeName )
EmployeeProject: ( EmployeeNumber, ProjectNumber, StartDate )
Project: ( ProjectNumber, ProjectName )
To check whether these tables are in NF you must answer the following questions
1. Does the table contain any repeating groups?
If not, and the table has a primary key then it is First normal form (1NF)
2. Does the table contain any partial dependencies?
If not, and it is in 1NF then it is in 2NF
3. Does the table contain any transitive dependencies or derived attributes?
If not, and it is in 2NF then it is in 3NF
Indexing
Types of Single-level Ordered Indexes
Primary Indexes
Clustering Indexes
Secondary Indexes
Multilevel Indexes
Types of Single-Level Indexes
Primary Index
Defined on an ordered data file
The data file is ordered on a key field
Includes one index entry for each block in the data file; the
index entry has the key field value for the first record in the
block, which is called the block anchor
A similar scheme can use the last record in a block.
A primary index is a nondense (sparse) index, since it
includes an entry for each disk block of the data file and the
keys of its anchor record rather than for every search value.
Primary
index on
the
ordering
key field
Types of Single-Level Indexes
Clustering Index
Defined on an ordered data file
Includes one index entry for each distinct value of the field;
the index entry points to the first data block that contains
records with that field value.
Data
Operational
Warehouse
EIS /DSS
Metadata
Web Browsers
Operational
Systems/Data Middleware/API
Data
Preparation
Warehouse Architected
Admin Tool Datamarts
Warehouse Databases
Oracle Apps
Seibel
Teradata EDW
ODS
Intermediate
Excel Files Tables
Flat Files