0% found this document useful (0 votes)

15 views33 pages

Unit 2 DW&DM Notes Mr. Rohit Pratap Singh

Uploaded by

anuragsiddharth04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views33 pages

Unit 2 DW&DM Notes Mr. Rohit Pratap Singh

Uploaded by

anuragsiddharth04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Unit 2

DATA WAREHOUSING SCHEMAS

Schema is a structure that represents how entities and attributes

are inter connected with each other in a database. As like other
other databases, data warehouses also manages schema. A
database use relational model whereas data warehouse uses star,
snowﬂake, and fact constellations.
Schemas in Data Warehousing:

a. Star
b. Snow Flake
c. Fact Constellations

[Star, snow ﬂake already studied in chapter - 1]

FACT CONSTELLATIONS

1. It also helps in representing multidimensional model.

2. Itis a collection of multiple facts tables having some
common dimension tables.
3. It can be viewed as a collection of several star schemas and
hence, also known as Galaxy schema.
4. It is more complex than start and snowﬂake schema
Client/Server Computing model and data
warehousing

In client server computing, the clients requests a resource and the

server provides that resource. A server may serve multiple clients
at the same time while a client is in contact with only one server.
Both the client and server usually communicate via a computer
network but sometimes they may reside in the same system

• The client server computing works with a system of request

and response. The client sends a request to the server and the
server responds with the desired information.
• The client and server should follow a common
communication protocol so they can easily interact with
each other. All the communication protocols are available at
the application layer.
• A server can only accommodate a limited number of client
requests at a time. So it uses a system based to priority to
respond to the requests.
• Denial of Service attacks hinders servers ability to respond
to authentic client requests by inundating it with false
requests.
• An example of a client server computing system is a web
server. It returns the web pages to the clients that requested
them.

Hardware and Operating Systems for Data Warehousing

Data warehouses are normally very concerned with I/O performance. This is in contrast
to OLTP systems

Hardware Considerations
1. Processing Power:
• Multi-core processors: Data warehouses benefit from parallel processing capabilities
offered by multi-core CPUs.
• High clock speeds: Faster processors can handle complex queries and data
transformations more efficiently.
2. Memory (RAM):
• Sufficient RAM allows for faster data access and query processing.
• In-memory processing: Consider systems with large amounts of RAM or in-memory
databases for improved performance.
3. Storage:
• High-performance storage: Use solid-state drives (SSDs) or high-speed storage arrays
for fast data access.
• Scalable storage: Ensure the storage system can scale with growing data volumes.
4. Network:
• High-speed network connections: Fast network infrastructure minimizes data transfer
latency between components of the data warehouse architecture.
• Redundancy: Implement redundant network connections to ensure high availability
and fault tolerance.
5. Scalability:
• Scalable architecture: Choose hardware that supports horizontal scaling to
accommodate growing data and user loads.
• Distributed processing: Consider distributed computing frameworks like Apache
Hadoop or Spark for scalable processing.
6. Data Redundancy and Fault Tolerance:
• RAID configurations: Use RAID (Redundant Array of Independent Disks) for data
redundancy and fault tolerance.
• Backup systems: Implement regular backups and disaster recovery solutions to protect
against data loss.
7. Hardware Acceleration:
• GPU acceleration: Graphics processing units (GPUs) can accelerate certain data
processing tasks, such as machine learning algorithms and complex analytics.
Operating System Considerations
1. Compatibility:
• Ensure compatibility with the chosen database management system (DBMS) and other
software components of the data warehouse stack.
2. Performance:
• Choose operating systems known for stability, performance, and reliability.
• Linux distributions like CentOS, Red Hat Enterprise Linux (RHEL), or Ubuntu Server are
popular choices for data warehousing due to their stability and performance.
3. Security:
• Select an operating system with robust security features and regular updates to protect
against vulnerabilities.
• Implement access controls, firewalls, and encryption to secure data and infrastructure.
4. Manageability:
• Choose an operating system with robust management tools and support for
automation.
• Consider systems with centralized management capabilities for easier administration of
multiple servers.
5. Compatibility with Tools and Software:
• Ensure compatibility with data warehousing software, ETL tools, monitoring tools, and
other components of the data warehouse ecosystem.
6. Scalability and Resource Management:
• Operating systems should support resource management features like process
scheduling, memory management, and disk I/O optimization to ensure efficient
resource utilization.
7. Virtualization and Containerization:
• Consider virtualization or containerization technologies like VMware, Docker, or
Kubernetes for flexible deployment and resource allocation.
Warehousing Strategy
WAREHOUSE MANAGEMENT AND SUPPORTPROCESS

Warehouse management involves planing, designing, developing,

implementing, and maintaining data warehouse project in order to
manage warehouse and subsequent activities efﬁciently.
Datawarehouse processes involves 6 processes:

1. Receiving: Receiving involves the process of transfer of the

goods to the warehouse. Warehouse must have ability to
verify that it has received right product in right quantity in
right condition and at right time. This process must be
performed right to ensure correctness of subsequent actions.

2. Put Away: Put-away is the second warehouse process and

is the movement of goods from the receiving warehouse
to the most optimal warehouse storage location.
Failing to place goods in their most ideal location can
impair the productivity of warehouse operation. When
goods are put away properly, there are several beneﬁts like
cargo is stored faster and more efﬁciently, travel time is
minimised, safety of goods is ensured and warehouse
space is utilisation is maximised.

3. Storage: Storage is the warehouse process in which goods

are placedinto their most appropriate storage space. When
done properly, the storage process fully maximises the
available space in your warehouse and increases labor
efﬁciency.
4. Picking: Picking is the warehouse process that collects
products in a warehouse to fulﬁls customer orders.
Correctness of this process is required to achieve higher
accuracy, as errors can have a direct impact on your
customersatisfaction.
5. Packing: Packing is the warehouse process that
consolidates picked items in a sales order and prepares them
for shipment to the customer.One of the primary tasks of
packing is to ensure that damages are minimised from the
time items leave the warehouse..

6. Shipment: Shipping is the ﬁnal warehouse process and

the start of the journey of goods from the warehouse to the
customer. Shipping is considered successful only if the
right order is sorted and loaded, is dispatched to the right
customer, travels through the right transit mode, and is
delivered safely and on time
WAREHOUSE PLANNING AND IMPLEMENTATION

Planning a warehouse involves following steps:

a. Define your inventory: Define the products, types of
products and quantity of product you will store in the house.
b. Determine your storage needs: Types of storage
required, racks,shelves etc.
c. Assess your space: Measure the physical space
available and allocate it for storage, receiving and
shippingareas.
d. Evaluate your equipment needs: Equipments required
to handle products more efficiently like conveyer etc.
e. Plan your layout: Plan layout that maximise storage space.
f. Establish process and procedures: Decide on the
processes and procedures for receiving, storing, and shipping
products, and train your staff on these procedures

Implementing a warehouse require following steps:

b. Requirement analysis and capacity planning: Deﬁning

enterprise needs, defining architectures, carrying out capacity
planning and selecting hardware and software tools.
c. Hardware Integration: Once the hardware and software has
been selected, they require to be put by integrating the servers,
the storage methods, and the user software tools.
d. Modelling: Modelling is a significant stage that involves
designing the warehouse schema and views
e. Sources: The information for the data warehouse is likely to
come from several data sources. SO, defining all the sources
falls here.
f. ETL: The data from the source system will require to go
through an ETL phase. The process of designing and
implementing the ETL phase may contain defining a
suitable ETL tool vendors and purchasing and implementing
the tools.
g. Populating the data warehouses: Once the ETL tools have
been agreed upon, testing the tools will be needed, perhaps
using a staging area. Once everything is working adequately,
the ETL tools may be used in populating the warehouses
(adding products in a warehouse) given the schema and view
definition

PARALLEL PROCESSORS AND CLUSTER SYSTEM

Parallel Processors:
a. Two or more processors work together to achieve a single task.
b. One task is divided into multiple tasks, each task is handled by
different processor in this way multiple processors work on different
parts of a single task to complete it.
c. Each processor will operate normally and will perform operations in
parallel as instructed.
d. At the end results from al the processors are combined to achieve a
endresult.

a. Two or more computers work together to provide high speed. Each

computer is known a s a node.
b. All computers work together, make us feel that single entity is
working.The connected computers execute operations all together
thus creating the idea of a single system
c. Each computer is connected to another node using LAN, all perform
their own separate tasks.
d. All the tasks get executed at a fast pace with multiple nodes.
DISTRIBUTED DATABASE SYSTEMS

a. A distributed database is essentially a database that is distributed

across numerous sites, i.e., on various computers or over a network
of computers, and is not restricted to a single system.

b. A distributed database system is spread across several locations with

nocommon physical components.

c. This can be necessary when different people from all over the world
need to access a certain database
d. It is of 2 types:
e. i. Homogenous: A homogeneous database stores data uniformly
across all locations. All sites utilise the same operating system,
database management system, and data structures. They are therefore
simple to handle.
Heterogenous: With a heterogeneous distributed database, many
locations may employ various software and schema, which may
cause issues with queries and transactions. Moreover, one site could
not be even aware of the existence of the other sites. Various
operating systems and database applications may be used by various
machines. Translations are therefore necessary for communication
across various sites

Warehousing Software

1.Odoo –

The best retail inventory management software

2. NetSuite –

The best warehouse management system for e-commerce

3. Infoplus –

The best warehouse management software for small business

4.Highjump –

The best warehouse management software with Agile solution

5. Blue Link ERP –

The best warehouse management software for medium-size

businesses
Problem Solving In AI
Solve the problem by performing logical algorithms, utilizing polynomial and
differential equations, and executing them using modeling paradigms

1. Chess.
2. Tower of Hanoi Problem
3. N-Queen Problem
4. Travelling Salesman Problem.
5. Water-Jug Problem.

Process of solving a problem consists of five steps

1. Defining The Problem
2. Analyzing The Problem
3. Identification Of Solutions.
4. Choosing a Solution
5. Implementation
Importance of Artificial Intelligence
1. Making our lives easier
2. Speed up your tasks and processes of work
3. Accuracy
4. Fully-utilized Data

Problem Solving In AI Examples

1. Tower of Hanoi Problem
Tower of Hanoi also called The problem of Benares Temple or Tower of
Brahma or Lucas' Tower

The objective of the puzzle is to move the entire stack to the last rod, obeying
the following rules:

1. Only one disk may be moved at a time.

2. Each move consists of taking the upper disk from one of the stacks
and placing it on top of another stack or on an empty rod.
3. No disk may be placed on top of a disk that is smaller than it.
2.N-Queen Problem
n x n chessboard

4 x 4 chessboard

8 x 8 chessboard

16 x 16 chessboard

1. 4 Queen problem.
N Queen is the problem of placing N chess queens on an N×N chessboard so
that no two queens attack each other.

1. No two queens same row

2. No two queens same column

3. No two queens same diagonal.

2.8 x 8 chessboard

3.Travelling Salesman Problem.

Travelling salesman problem also called the traveling salesperson problem

or TSP asks the following question: "Given a list of cities and the distances
between each pair of cities, what is the shortest possible route that visits each
city exactly once and returns to the origin city.
4. Water-Jug Problem.

Condition of Water Jug Problem

2 Smaller Size Jug Weight (Sum) is Greater then Equal to Large Size water Jug

Example 1
You have an 8 litre jug full of water and two smaller jugs, one that contains 5
litres and the other 3 litres. None of the jugs have markings on them, nor do you
have any additional measuring device.

Among the 3 jugs, divide the 8 liters into 2 equal parts i.e. 4 liters in jug A and 4
liters in jug B. How?

Solution

Condition of Water Jug Problem

2 Smaller Size Jug Weight (Sum) is Greater then Equal to Large Size water Jug.

5(litres)+ 3(litres) >= 8(litres)

Example 2

Two jugs one having the capacity to hold 3 gallons of water and the other has the
capacity to hold 4 gallons of water. There is no other measuring equipment
available and the jugs also do not have any kind of marking on them.

How can you get exactly 2 gallons of water in the 4-gallon

jug? Solution

Step 1
(0,0)

Step 2
((1,3)

Step 3
(1,0)

Step 4

(0,1)

Step 5
(4,1)

Step 6
(2,3)
Example 3

Solution
Condition of Water Jug Problem
2 Smaller Size Jug Weight (Sum) is Greater then Equal to Large Size water Jug.

8(litres)+ 5(litres) >= 12(litres)

SOLVED 7 steps

Step 1
12,0,0
Step 2

4,8,0

Step 3

4,3,5

Step 4

9,3,0

Step 5
9,0,3
Step 6

1,8,3

Step 7

1,6,5

Step 8

6,6,0

Decision Tree
Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions. A decision tree can contain
categorical data (YES/NO) as well as numeric data.
It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
Decision Tree is a Supervised learning technique.
A decision tree is a flowchart-like structure used to make decisions or predictions.
It consists of nodes representing decisions or tests on attributes, branches
representing the outcome of these decisions, and leaf nodes representing final
outcomes or predictions.
x
Machine Learning
Machine learning (ML) is a subdomain of artificial intelligence (AI) that focuses on
developing systems that learn—or improve performance—based on the data they ingest

Machine Learning is the field of study that gives computers the capability to
learn without being explicitly programmed
Issues in Machine Learning

1. Process Complexity of Machine Learning

2. Monitoring and maintenance

3. Inadequate Training Data

4. Poor quality of data
5. Customer Segmentation
History of Machine Learning
Before some years (about 40-50 years), machine learning was science fiction, but today
it is the part of our daily life. Machine learning is making our day to day life easy
from self-driving cars to Amazon virtual assistant "Alexa". However, the idea behind
machine learning is so old and has a long history. Below some milestones are given
which have occurred in the history of machine learning:

Data Science Vs Machine Learning

Data Science Machine Learning

Branch that deals with data. Machines utilize data science
techniques to learn about the data.

Many operations It is three types

data gathering, 1.Supervised learning,
data cleaning, 2.Unsupervised learning,
data manipulation, etc. 3.Reinforcement learning

Need the entire analytics universe. Combination of Machine and Data

Science.

It is a broad term for multiple It fits within data science.

disciplines.

Data scientists spent lots of time in ML engineers spend a lot of time for
handling the data, cleansing the data, managing the complexities that occur
and understanding its patterns. during the implementation of
algorithms and mathematical
concepts behind that.

Example: Netflix uses Data Science Example: Facebook uses Machine

technology. Learning technology.
Framework for building ML Systems-KDD process mode
KDD Process
KDD (Knowledge Discovery in Databases) is a process that involves the extraction of
useful, previously unknown, and potentially valuable information from large datasets.
Focus is on the discovery of useful knowledge, rather than simply finding patterns in
data
Techniques
1. Data cleaning
2. Data integration
3. Data selection
4. Data transformation
5. Data mining,
6. Pattern evaluation
7. knowledge representation and visualization.

Advantages of KDD
1. Improves decision-making.
2. Increased efficiency
3. Better customer service
4. Fraud detection

Disadvantages of KDD
1. Privacy concerns
2. Complexity
3. Data Quality
4. High cost.

Supervised learning

Supervised learning is a learning mechanism that infers the underlying relationship

between the observed data (also called input data) and a target variable

Classification: A classification problem is when the output variable is a category, such

as <Red= or <blue=, <disease= or <no disease=.

Regression: A regression problem is when the output variable is a real value, such as
<dollars= or <weight=.
Types:-
• Regression
• Logistic Regression
• Classification
• Naive Bayes Classifiers
• K−NN (k nearest neighbors)
• Decision Trees
• Support Vector Machine

Unsupervised Machine Learning

Unsupervised learning is a type of machine learning in which models are
trained using unlabeled dataset and are allowed to act on that data without any
supervision.”

We have the input data but no corresponding output data. in which the desired
output is unknown.

Working of Unsupervised Learning

Working of unsupervised learning can be understood by the below diagram:

Here, we have taken an unlabeled input data, which means it is not

categorized and corresponding outputs are also not given. Now, this unlabeled
input data is fed to the machine learning model in order to train it. Firstly, it
will interpret the raw data to find the hidden patterns from the data and then
will apply suitable algorithms such as k-means clustering, Decision tree, etc.
Once it applies the suitable algorithm, the algorithm divides the data objects
into groups according tothe similarities and difference between the objects.

Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:

1. K-means clustering

2. KNN (k-nearest neighbors)

3. Hierarchal clustering

4. Anomaly detection

5. Neural Networks

6. Principle Component Analysis (PCA)

7. Independent Component Analysis

8. Apriori algorithm

9. Singular value decomposition

Disadvantages of Unsupervised Learning

1. More difficult than supervised learning
2. The result less accurate

Types of Unsupervised Learning Algorithm

The unsupervised learning algorithm can be further categorized into two
types of problems.
Clustering: Clustering is a method of grouping the objects into clusters
such that objects with most similarities remains into a group and has less
or no similarities with the objects of another group.
Association: An association rule is an unsupervised learning method
which is used for finding the relationships between variables in the large
database. It determines the set of items that occurs together in the dataset.
Association rule makes marketing strategy more effective. Such as people
who buy X item (suppose a bread) are also tend to purchase Y
(Butter/Jam) item. A typical example of Association rule is Market Basket
Analysis.

Differences between Classification and Clustering

Classification Clustering

Classification is used for clustering is used for

supervised learning unsupervised learning.

Classification is more Less Complex only

complex grouping is done

Input instances based on Grouping the instances

their corresponding class based on their similarity
labels

Two Step Process Single Step Process

( Train+ Predict)

No of Categories Known No of Group Unknown

Examples are Examples are

1. Logistic regression 1. k−means clustering

2. Naive Bayes classifier
2. Fuzzy c−means
3 Support vector machines
3. Gaussian (EM) clustering
Support Vector Machines (SVM)
1. Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms,
2. which is used for Classification as well as Regression problems
3. Primarily, it is used forClassification problems in Machine Learning.

.The goal of the SVM algorithm is to create the best line ordecision boundary that can
segregate n-dimensional space into classes so that we can easily put thenew data point
in the correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases arecalled as support vectors, and hence algorithm is termed as Support
Vector Machine.
SVM algorithm can be used for Face detection, image classification, text
categorization.

Support Vectors – Data points that are closest to the hyperplane is called
support vectors.
Separating line will be defined with the help of these data points.
Hyperplane − As we can see in the above diagram, it is a decision plane or
space which is divided
between a set of objects having different classes.
Margin − It may be defined as the gap between two lines on the closet data
points of different
classes. It can be calculated as the perpendicular distance from the line to the
support vectors.
Large margin is considered as a good margin and small margin is considered
as a bad margin.

Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or
dog, so such a model can be created by using the SVM algorithm. We will
first train our model with lots of images of cats and dogs so that it can learn
about different features of cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary between these two
data (cat and dog) and choose extreme cases (support vectors), it will see the
extreme case of cat and dog. On the basis of the support vectors, it will classify
it as a cat. Consider the below diagram
Bayesian Network

1. Solve a problem which has uncertainty

2. It is also called a Bayes network, belief network, decision network,

or Bayesian model.

3. Conditional dependencies using a directed acyclic graph.

4. Supervised learning algorithms

ANN (Artificial Neural Network)

An Artificial neural network is usually a computational network based on
biological neural networks that construct the structure of the human brain.

Unsupervised learning algorithms

Artificial Neural Network primarily consists of three layers
Genetic Algorithm

1. Genetic algorithm is a general heuristic search method designed for finding the
optimal solution to a problem.
2. Supervised Learning algorithms
3. Operators such as selection, crossover, and mutation.

GA applications
Some examples of GA applications
1. Include optimizing decision trees for better performance,
2.
Solving sudoku puzzles,
3. Hyperparameter optimization,
4. Causal inference.

The algorithm starts with a set of trial structures, or parents, and uses their fitness to
create a new generation, or offspring
GAs are well-suited for problems with large search spaces, or when the fitness function
is noisy. They can be competitive with other methods, and can be implemented in
parallel.
Reinforcement Learning

1. Reinforcement Learning is a feedback−based Machine learning technique in

which an agent learns to behave in an environment by performing the actions and
seeing the results of actions. For each good action, the agent gets positive
feedback, and for each bad action, the agent gets negative feedback or penalty

2. Solve More Complex Problem of Supervised and Unsupervised Learning

3. No labeled data, so the agent is bound to learn by its experience only.
4. Game-playing, robotics

MitraStar GPT-2541GNAC Users Manual
100% (1)
MitraStar GPT-2541GNAC Users Manual
226 pages
DataWarehouse Concept
100% (1)
DataWarehouse Concept
18 pages
Notes Unit 2 DW&DM 4th Year
No ratings yet
Notes Unit 2 DW&DM 4th Year
10 pages
Understanding SAP Versions: Optimizing Packaged Applications
No ratings yet
Understanding SAP Versions: Optimizing Packaged Applications
20 pages
CitizensCharter2020 Philhealth
No ratings yet
CitizensCharter2020 Philhealth
570 pages
Lecture 13 - 22 Software Testing - Black Box Testing
No ratings yet
Lecture 13 - 22 Software Testing - Black Box Testing
57 pages
Experiment No - 1 Implement DDL Commmand DDL:-: 1.create Command
No ratings yet
Experiment No - 1 Implement DDL Commmand DDL:-: 1.create Command
15 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
RedHat Ansible Automation Platform 2.4 - Getting Started With Automation Platform Planning Guide
No ratings yet
RedHat Ansible Automation Platform 2.4 - Getting Started With Automation Platform Planning Guide
45 pages
JIRA Questions
50% (2)
JIRA Questions
5 pages
Cpp final OP final akash
No ratings yet
Cpp final OP final akash
26 pages
1.2 System Software
No ratings yet
1.2 System Software
32 pages
Hoffer Mdm12e PP Ch03
No ratings yet
Hoffer Mdm12e PP Ch03
34 pages
Veritas Netbackup™ Upgrade Guide: Release 8.0
No ratings yet
Veritas Netbackup™ Upgrade Guide: Release 8.0
131 pages
BC0058 SLM Unit 02
No ratings yet
BC0058 SLM Unit 02
13 pages
Block Chain
No ratings yet
Block Chain
14 pages
Data-ware-unit-2 (1)
No ratings yet
Data-ware-unit-2 (1)
23 pages
01-11 Sistem Informasi Pendataan Narapidana Pada Lapas Menggunakan Web
No ratings yet
01-11 Sistem Informasi Pendataan Narapidana Pada Lapas Menggunakan Web
11 pages
Unit2__1_DWstrategy_4.11.24
No ratings yet
Unit2__1_DWstrategy_4.11.24
64 pages
Unit2 1 DWstrategy
No ratings yet
Unit2 1 DWstrategy
65 pages
Chapter_4_Data Warehouse Indexes
No ratings yet
Chapter_4_Data Warehouse Indexes
11 pages
Physics Pta One Mark
No ratings yet
Physics Pta One Mark
24 pages
TN 1469 Creating an Industrial Graphic Directly From a Tag in 2023 R2 - InSource
No ratings yet
TN 1469 Creating an Industrial Graphic Directly From a Tag in 2023 R2 - InSource
6 pages
Network+ Guide To Networks, Fourth Edition
No ratings yet
Network+ Guide To Networks, Fourth Edition
34 pages
Unit2 - DWDM Notes
No ratings yet
Unit2 - DWDM Notes
63 pages
Customer Relationship Management: Concepts and Technologies
No ratings yet
Customer Relationship Management: Concepts and Technologies
53 pages
Data Warehousing, An Introduction
No ratings yet
Data Warehousing, An Introduction
12 pages
Data Warehouses: FPT University
No ratings yet
Data Warehouses: FPT University
39 pages
Supply Chain of Dell
No ratings yet
Supply Chain of Dell
16 pages
Unit II
No ratings yet
Unit II
92 pages
2 Data Warehousing (1) - 240611 - 232451
No ratings yet
2 Data Warehousing (1) - 240611 - 232451
18 pages
Data Warehousing Fundamentals Paulraj Ponniah
75% (4)
Data Warehousing Fundamentals Paulraj Ponniah
518 pages
04 Data Warehouse
No ratings yet
04 Data Warehouse
13 pages
2483381 - MB5M - Neither Quality Nor Blocked Stock Displayed _ SAP Knowledge Base Article
No ratings yet
2483381 - MB5M - Neither Quality Nor Blocked Stock Displayed _ SAP Knowledge Base Article
3 pages
Software Requir SRS
No ratings yet
Software Requir SRS
3 pages
Data Warehouse Unit-I
No ratings yet
Data Warehouse Unit-I
33 pages
DWDM - UNIT-2
No ratings yet
DWDM - UNIT-2
12 pages
CV Ionescu Robert Constantin 21 01 2021
No ratings yet
CV Ionescu Robert Constantin 21 01 2021
3 pages
UNIT 2
No ratings yet
UNIT 2
17 pages
CBEC4103 Data Warehousing
No ratings yet
CBEC4103 Data Warehousing
10 pages
Lesson 2 - Data Warehouse
No ratings yet
Lesson 2 - Data Warehouse
5 pages
DWDM - Unit 2
No ratings yet
DWDM - Unit 2
26 pages
W Bs Dictionary
No ratings yet
W Bs Dictionary
255 pages
Data Warehousing-Notes(Module -I & II) (1) (1)
No ratings yet
Data Warehousing-Notes(Module -I & II) (1) (1)
32 pages
Cloud Computing Security Breaches
No ratings yet
Cloud Computing Security Breaches
54 pages
Core 19 Warehousing and Inventory Management Sem VI
No ratings yet
Core 19 Warehousing and Inventory Management Sem VI
52 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
135 pages
electricity
No ratings yet
electricity
10 pages
Data Warehousing Notes
No ratings yet
Data Warehousing Notes
34 pages
DWDM UNIT2
No ratings yet
DWDM UNIT2
7 pages
Iot - 2023 - May-june - End-sem (Sem-5) (2019 Pattern)
No ratings yet
Iot - 2023 - May-june - End-sem (Sem-5) (2019 Pattern)
2 pages
Building Blocks & Trends in Data Warehouse
No ratings yet
Building Blocks & Trends in Data Warehouse
45 pages
Lab 1
No ratings yet
Lab 1
6 pages
7BCEE1A-Datamining and Data Warehousing
No ratings yet
7BCEE1A-Datamining and Data Warehousing
128 pages
Chap 2 - Data Warehousing Part I (2)
No ratings yet
Chap 2 - Data Warehousing Part I (2)
31 pages
CVE. Fortra's GoAnywhere MFT (En)
No ratings yet
CVE. Fortra's GoAnywhere MFT (En)
6 pages
Rainfall Analysis Implementing On Data Warehouse
No ratings yet
Rainfall Analysis Implementing On Data Warehouse
12 pages
DMW Unit 1
No ratings yet
DMW Unit 1
56 pages
Data Warehousing
No ratings yet
Data Warehousing
71 pages
Hcgty
No ratings yet
Hcgty
63 pages
Data Warehousing & Data Mining Unit-2 Notes
100% (1)
Data Warehousing & Data Mining Unit-2 Notes
36 pages
Data Warehousing Research Paper
50% (2)
Data Warehousing Research Paper
7 pages
Warehousing
No ratings yet
Warehousing
15 pages
Data Warehousing
No ratings yet
Data Warehousing
20 pages
Lessons Learnt Best Practices
No ratings yet
Lessons Learnt Best Practices
7 pages
How To Configure Delivery Creation Transaction VL10X
95% (42)
How To Configure Delivery Creation Transaction VL10X
15 pages
Qualys Multi-Vector EDR: Lab Tutorial Supplement
No ratings yet
Qualys Multi-Vector EDR: Lab Tutorial Supplement
24 pages
Data Mining
No ratings yet
Data Mining
65 pages
Chapter 1
No ratings yet
Chapter 1
9 pages
Updated Aivran's CV
No ratings yet
Updated Aivran's CV
3 pages
MFA Fact Sheet Jan22 508 PDF
No ratings yet
MFA Fact Sheet Jan22 508 PDF
1 page
Design and Development of Warehouse Management System
No ratings yet
Design and Development of Warehouse Management System
11 pages
What Is a Data Warehouse
No ratings yet
What Is a Data Warehouse
9 pages
Introduction to warehousing operations
No ratings yet
Introduction to warehousing operations
10 pages
Chapter One Rasheed
No ratings yet
Chapter One Rasheed
8 pages
Ware House Management
No ratings yet
Ware House Management
72 pages
Unit-IV(Warehousing)KMBN OM01
No ratings yet
Unit-IV(Warehousing)KMBN OM01
13 pages
Guide For Datawarehousing
No ratings yet
Guide For Datawarehousing
24 pages
Warehousing ERP and WMS 3
No ratings yet
Warehousing ERP and WMS 3
10 pages
N 779126
No ratings yet
N 779126
7 pages
Data Warehouse: Tobiasgroup, Inc
No ratings yet
Data Warehouse: Tobiasgroup, Inc
18 pages
The Data Warehousing Development Lifecycle
100% (1)
The Data Warehousing Development Lifecycle
5 pages
Warehousing and Warehouse Management
No ratings yet
Warehousing and Warehouse Management
6 pages
MDBS Case 8
No ratings yet
MDBS Case 8
4 pages
UrBackup Solutions for Reliable System Backup: Definitive Reference for Developers and Engineers
From Everand
UrBackup Solutions for Reliable System Backup: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to BackupPC: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to BackupPC: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The Architecture of Storage Networks
From Everand
The Architecture of Storage Networks
Pasquale De Marco
No ratings yet
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
From Everand
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
Rob Botwright
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet

Uploaded by

Uploaded by

Unit 2

DATA WAREHOUSING SCHEMAS

Schema is a structure that represents how entities and attributes

[Star, snow ﬂake already studied in chapter - 1]

1. It also helps in representing multidimensional model.

In client server computing, the clients requests a resource and the

• The client server computing works with a system of request

Hardware and Operating Systems for Data Warehousing

Warehouse management involves planing, designing, developing,

1. Receiving: Receiving involves the process of transfer of the

2. Put Away: Put-away is the second warehouse process and

3. Storage: Storage is the warehouse process in which goods

6. Shipment: Shipping is the ﬁnal warehouse process and

Planning a warehouse involves following steps:

Implementing a warehouse require following steps:

b. Requirement analysis and capacity planning: Deﬁning

PARALLEL PROCESSORS AND CLUSTER SYSTEM

a. Two or more computers work together to provide high speed. Each

a. A distributed database is essentially a database that is distributed

b. A distributed database system is spread across several locations with

The best retail inventory management software

The best warehouse management system for e-commerce

The best warehouse management software for small business

The best warehouse management software with Agile solution

5. Blue Link ERP –

The best warehouse management software for medium-size

Process of solving a problem consists of five steps

Problem Solving In AI Examples

1. Only one disk may be moved at a time.

1. No two queens same row

2. No two queens same column

3. No two queens same diagonal.

3.Travelling Salesman Problem.

Travelling salesman problem also called the traveling salesperson problem

Condition of Water Jug Problem

Condition of Water Jug Problem

5(litres)+ 3(litres) >= 8(litres)

How can you get exactly 2 gallons of water in the 4-gallon

8(litres)+ 5(litres) >= 12(litres)

1. Process Complexity of Machine Learning

2. Monitoring and maintenance

3. Inadequate Training Data

Data Science Vs Machine Learning

Data Science Machine Learning

Many operations It is three types

Need the entire analytics universe. Combination of Machine and Data

It is a broad term for multiple It fits within data science.

Example: Netflix uses Data Science Example: Facebook uses Machine

Supervised learning is a learning mechanism that infers the underlying relationship

Classification: A classification problem is when the output variable is a category, such

Unsupervised Machine Learning

Working of Unsupervised Learning

Here, we have taken an unlabeled input data, which means it is not

Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:

2. KNN (k-nearest neighbors)

6. Principle Component Analysis (PCA)

7. Independent Component Analysis

9. Singular value decomposition

Disadvantages of Unsupervised Learning

Types of Unsupervised Learning Algorithm

Differences between Classification and Clustering

Classification is used for clustering is used for

Classification is more Less Complex only

Input instances based on Grouping the instances

Two Step Process Single Step Process

No of Categories Known No of Group Unknown

Examples are Examples are

1. Logistic regression 1. k−means clustering

1. Solve a problem which has uncertainty

2. It is also called a Bayes network, belief network, decision network,

3. Conditional dependencies using a directed acyclic graph.

4. Supervised learning algorithms

ANN (Artificial Neural Network)

Unsupervised learning algorithms

1. Reinforcement Learning is a feedback−based Machine learning technique in

2. Solve More Complex Problem of Supervised and Unsupervised Learning

You might also like