0% found this document useful (0 votes)
60 views96 pages

DSA2324 Lecture 01 Introduction To Data Science

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views96 pages

DSA2324 Lecture 01 Introduction To Data Science

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

DATA SCIENCE AND Master degree in

MECHATRONICS AND SMART


TECHNOLOGY ENGINEERING
AUTOMATION

Lecture 01: Introduction to data SPEAKER


Davide Previtali
science PLACE
University of Bergamo
Who am I
Davide Previtali
• Currently: Fixed-term Assistant Professor (RTD-A) @ Control systems
and Automation Laboratory
• Studies: Ph.D. in Engineering and Applied Sciences @ University of
Bergamo. MSc in Computer Science Engineering @ University of
Bergamo
• Research topics: control systems, black-box and preference-based
optimization, machine learning
[email protected]

Control systems and Automation Laboratory (CAL)


• @ University of Bergamo
• Members: 6 professors, 6 Ph.D. students
• Research topics: control systems, advanced control, optimization,
fault diagnosis, system identification, machine learning
• http://cal.unibg.it/
• https://www.linkedin.com/school/cal-unibg/
2 /94
Outline
1. Course introduction

2. Data science and the data-driven company

3. Data and its types

4. What we are going to do with data (supervised and unsupervised learning)

5. Static and dynamical models in supervised learning

6. From business problems to data science tasks

7. The data mining life cycle (CRISP-DM)

3 /94
Outline
1. Course introduction

2. Data science and the data-driven company

3. Data and its types

4. What we are going to do with data (supervised and unsupervised learning)

5. Static and dynamical models in supervised learning

6. From business problems to data science tasks

7. The data mining life cycle (CRISP-DM)

4 /94
Course prerequisites
It is strongly suggested to have a good knowledge of the following topics:
• Linear algebra • Statistics
• Calculus 1 and Calculus 2 • Dynamical systems (minor)

Please fill out the following questionnaire to assess your knowledge on the
prerequisites:

https://forms.gle/LpvgCqZNM1oEYuwV8

5 /94
Course prerequisites
How to «refresh» the prerequisites?
• Linear algebra
UniBg course, course by Gilbert Strang @MIT on YouTube, Addendum provided
among the materials for this course

• Calculus 1 and Calculus 2


UniBg course

• Statistics
Brief review at the beginning of this course, UniBg course

• Dynamical systems
UniBg course

6 /94
Evaluation

• Written exam, 1:30 hour


• True-false and multiple-choice questions Up to 20 points
• Open questions

+
• Data science project with discussion
• You will receive more information on the Up to 10 points
project during the course

7 /94
Educational objectives
At the end of the course, you will be able to:

• Formulate a business problem as a data science problem

• Formulate and solve regression and classification problems

• Formulate and solve image analysis and object recognition problems

• Apply clustering and dimensionality reduction techniques

• Identify a dynamical system

• Evaluate the goodness of a model estimated from data

• Visualize and present the results of a data science project

8 /94
Teaching materials
Provided materials
• Lessons’ slides

• MATLAB code Focus: learn the methods, not the


programming language

All the course materials are available at the following Microsoft Team

https://teams.microsoft.com/l/team/19%3ATl9qqZhFm
x62dyo_Crb2ntZWlv9Uqdb-
0oSPCIN4GBM1%40thread.tacv2/conversations?groupId
=0c35adb4-0a2f-4c2f-aa25-1dd394f1f490&tenantId=

9 /94
Teaching materials
Provided materials
• Lessons’ slides

• MATLAB code Focus: learn the methods, not the


programming language

Structure of the course

The course will be divided in theory and practice sessions

✓ Theory lessons will be (mostly) of 2 hours on Mondays

✓ Practice lessons will be (mostly) of 3 hours on Fridays

10 /94
Teaching materials
Suggested books
• Foster Provost, Tom Fawcett. • T. Hastie, R. Tibshirani, J.
Data Science for Business: Friedman. The elements of
What you need to know about statistical learning: data
data mining and data-analytic mining, inference, and
thinking, O'Reilly Media, Inc. prediction, 2° Edition,
(2013) Springer (2009)

• G. James, D. Witten, T. Hastie, • Andrew Gelman, Jennifer Hill.


R. Tibshirani. An Introduction Data Analysis Using
to Statistical Learning, 2° Regression and
Edition, Springer (2021) Multilevel/Hierarchical
Models, Cambridge University
Press (2006)

11 /94
Teaching materials
Suggested books
• Cole Nussbaumer Knaflic. • I. Goodfellow, Y. Bengio, A.
Storytelling with data: a data Courville. Deep Learning, The
visualization guide for MIT Press (2016)
business professionals, Wiley
(2015)

• Christopher Bishop. Pattern • Michel Verhaegen, Vincent


recognition and machine Verdult. Filtering and system
learning, Springer (2006) identification: a least
squares approach, Cambridge
University Press (2007)

12 /94
Teaching materials
Interactions and feedback

• During the course I will give you activities to do and tests to answer

• They are optional but they help you assess your level of understanding before the
exam

• In addition, they will give a bonus of (at most) +3 points to the final grade

We will use the assignments of Microsoft Teams

13 /94
Syllabus
1. Introduction to data science 10.Decision trees

2. Exploratory data analysis 11. Neural networks

3. Recap of statistics 12.Convolutional neural networks

4. Maximum likelihood estimation 13.Clustering methods

5. Linear regression 14.Principal component analysis

6. Logistic regression 15.Output-error method for system identification


Extra
7. Bias-variance trade-off

8. Overfitting and regularization

9. Validation and cross-validation

14 /94
Outline
1. Course introduction

2. Data science and the data-driven company

3. Data and its types

4. What we are going to do with data (supervised and unsupervised learning)

5. Static and dynamical models in supervised learning

6. From business problems to data science tasks

7. The data mining life cycle (CRISP-DM)

15 /94
«Data is the new oil»

16 /94
«Data is the new oil» • Fuels
• Oils • Automobiles,
• … • Planes,
• Generators,
• Engines,
• ...
Barrels
of oil
Asphalt
• Infrastructures,
• Streets,
• …

Crude oil • Bottles,


extraction Refinement • Containers,
process • Films,
Plastic • ...

17 /94
«Data is the new oil» • Fuels
• Oils • Automobiles,
• … • Planes,
• Generators,
• Engines,
• ...
Barrels DATA
of oil
Asphalt
• Infrastructures,
• Streets,
• …

Crude oil DATUM • Bottles,


extraction Refinement • Containers,
process • Films,
Plastic • ...

18 /94
«Data is the new oil» PRODUCT
• Automobiles,
QUALITY • Planes,
• Generators,
• Engines,
• ...
Barrels DATA
of oil
GOODS
FORECAST
• Infrastructures,
• Streets,
• …

Crude oil DATUM • Bottles,


extraction Refinement • Containers,
process • Films,
PROCESS • ...
OPTIMIZATION

19 /94
«Data is the new oil» PRODUCT
• Machine
parameters
QUALITY
optimization

Barrels DATA
of oil
GOODS
FORECAST • Production and
purchasing
management

Crude oil DATUM • Reduction of


extraction Refinement materials
process used
PROCESS
OPTIMIZATION

20 /94
«Data is the new oil» PRODUCT
• Machine
parameters
QUALITY
optimization

Barrels DATA
of oil
GOODS
FORECAST • Production and
purchasing
Data management
acquisition

Crude oil DATUM • Reduction of


extraction Refinement materials
process used
PROCESS
OPTIMIZATION

21 /94
«Data is the new oil» PRODUCT
• Machine
parameters
QUALITY
optimization

Barrels DATA
of oil
GOODS
Descriptive FORECAST • Production and
purchasing
Data analytics management
acquisition and
reporting
Crude oil DATUM • Reduction of
extraction Refinement materials
process used
PROCESS
OPTIMIZATION

22 /94
«Data is the new oil» PRODUCT
• Machine
parameters
QUALITY
optimization

Barrels DATA
of oil
GOODS
Descriptive Modeling
FORECAST • Production and

Data analytics and purchasing


management
acquisition and forecasting
reporting
Crude oil DATUM • Reduction of
extraction Refinement materials
process used
PROCESS
OPTIMIZATION

23 /94
«Data is the new oil» PRODUCT
• Machine
parameters
QUALITY
optimization

Barrels DATA
of oil
GOODS
Descriptive Modeling
FORECAST • Production and

Data analytics and Actions


purchasing
management
acquisition and forecasting
reporting
Crude oil DATUM • Reduction of
extraction Refinement materials
process used
PROCESS
OPTIMIZATION

24 /94
Data is the new oil and data science is «sexy»
The data scientist role has been deemed the sexiest job of the 21st century [7]

• Virtually every aspect of business is now open to data collection (operations,


manufacturing, supply-chain management, customer behaviour, marketing campaigns)

• Collected information need to be analyzed properly in order to get actionable results

• A huge amount of data requires specific infrastructures to be handled

• A huge amount of data requires computational power to be analyzed

• We can let computers perform decisions given past data

• Rising of specific job titles

25 /94
Job positions that involve data
Data analyst Data scientist Data engineer Machine learning
engineer
• Data retrieval • Use different machine • Design and maintain • Design and
(database queries) learning techniques to data management implementation of
• Spot trends and derive insights from systems machine learning
patterns in the data data to guide • Data collection and methods
• Visualize the data and business decisions management • Extend existing
produce reports to • Make predictions on • Make data accessible machine learning
present information to products, assets and to the other members frameworks and
third parties consumer behavior of the data science libraries
• … based on past data team • …
• … • …

And many more…

Often, career opportunities require a good mix of all the aforementioned skills

26 /94
What is data science?
Data science is a set of fundamental principles, processes and techniques that guide the
extraction of knowledge from data with the goal of improving decision-making

It is an interdisciplinary academic field that is based on:


• Mathematics
• Statistics
• Machine learning and artificial intelligence
• Specialized programming

Data mining is the extraction of knowledge from data, via technologies that incorporate
data science principles

27 /94
The data-driven company
Data-driven decision-making (DDD) refers to the
practice of basing decisions on the analysis of data, rather
than purely on intuition [1, 2]
• Some decisions can be made automatically (finance,
recommendations)

• Data engineering and processing support many data-


oriented business tasks but do not necessarily involve
extracting knowledge or data-driven decision making

• Data, and the capability to extract useful knowledge from


data, should be regarded as key strategic asset
✓ Need to invest to acquire the right data (even lose money)
✓ Understand data science even if you will not do it

Picture taken from [1]

28 /94
Anti-hippo culture

29 /94
The road to becoming data-driven

1 2 3 4 5
Data Denial Data Data Aware Data Data-Driven
Indifference Informed
Data are not Data are Data play a
used and are There is no collected and central role in
viewed with interest to used for Data are the most
distrust acquire or monitoring, mainly used disparate
use data but no by managers decisions that
decisions are in decision- are made in the
made based making various
on them business
sectors

30 /94
Why become data-driven?

Data-driven
companies are 1$
invested in analytics
5% more pays back 13 $ [3]
productive [2]

31 /94
Why become data-driven?
Retail $0,8T
Travels $480B
Business value created by
Logistics $475B
Artificial Intelligence by Automotive & assembly $405B
Materials $300B
2030 [4] Advanced electronics & semiconductors $291B
Healthcare systems & services $267B
$13 High tech
Telecom
$267B
$174B

Trillions
Oil & gas $173B
Agriculture $164B

It is difficult to find an industrial sector that will not benefit from artificial intelligence in
the near future

32 /94
Outline
1. Course introduction

2. Data science and the data-driven company

3. Data and its types

4. What we are going to do with data (supervised and unsupervised learning)

5. Static and dynamical models in supervised learning

6. From business problems to data science tasks

7. The data mining life cycle (CRISP-DM)

33 /94
What are data?

1 zetabyte = 1012 gigabytes


Amount of data created, consumed and
We refer to data as any piece of stored worldwide [8]
200
information that has been collected 181
180

Data volume in zetabytes


and stored in a computer 160 147
140
120
120
97
100
Examples: 79
80 64,2
• Sensor measurements 60
41
40 33
• Customer information 26
20 9 12,515,5 18
5 6,5
• Transaction history 2
0
• Social media posts

• …
Forecasted

34 /94
Types of data: structured vs unstructured
Structured data
𝐇𝐨𝐮𝐬𝐞 𝐚𝐫𝐞𝐚
Data that are organized following a # 𝐛𝐞𝐝𝐫𝐨𝐨𝐦𝐬 𝐏𝐫𝐢𝐜𝐞 [k$]
[feet 2 ]
predefined scheme and stored in 523 1 115
tabular formats (excel sheets, SQL 645 1 150

databases…) 708 2 210


⋮ ⋮ ⋮

Unstructured data

Data that can have an internal structure Audio files Text files Video files Image files

but do not follow a predefined data


model or scheme

35 /94
Types of data: quantitative vs qualitative
Ordinal qualitative data
Nominal qualitative data can be ordered. Other examples:
cannot be ordered low/high income, age ranges…

𝐑𝐮𝐧𝐧𝐞𝐫 𝐧𝐚𝐦𝐞 𝐒𝐞𝐱 𝐏𝐥𝐚𝐜𝐞𝐦𝐞𝐧𝐭 𝐓𝐢𝐦𝐞 [seconds]


Orlando Dillon M First 14.75
Izabella Kent F Second 15.01
Sophia Sanders F Third 15.33
⋮ ⋮ ⋮

Qualitative (or categorical) data Quantitative (or continuous) data


assume non-numerical values, typically assume numerical values
belonging to pre-defined categories

36 /94
Data are dirty
𝐇𝐨𝐮𝐬𝐞 𝐚𝐫𝐞𝐚 𝐂𝐨𝐦𝐩𝐥𝐞𝐭𝐢𝐨𝐧
# 𝐛𝐞𝐝𝐫𝐨𝐨𝐦𝐬 𝐏𝐫𝐢𝐜𝐞 [k$]
[feet 2 ] 𝐝𝐚𝐭𝐞
Common data problems:
523 1 23/06/1998 115
• Missing values 645 1 01/07/2000 0.001
708 unknown 19/01/1980 210
• Unlikely values (outliers)
1034 3 31-Jan-2001 unknown
• Inconsistent formats unknown 4 17/12/2005 355
2545 unknown 14/02/1999 440
• …
⋮ ⋮ ⋮ ⋮

Typically, data must be cleaned before usage (data cleaning)

37 /94
Outline
1. Course introduction

2. Data science and the data-driven company

3. Data and its types

4. What we are going to do with data (supervised and unsupervised learning)

5. Static and dynamical models in supervised learning

6. From business problems to data science tasks

7. The data mining life cycle (CRISP-DM)

38 /94
What are we going to do with data?
In this course, we will use data for:

• Descriptive analysis and visualization


LECTURE 02

• Supervised learning (in particular, regression and classification)


LECTURE 05-12, 15

• Unsupervised learning (in particular, clustering and dimensionality reduction)


LECTURE 13,14

39 /94
Supervised vs unsupervised learning
Many data science tasks can be tackled either by supervised or unsupervised learning methods

• Supervised learning: predict the values of one or more dependent variables (output(s))
based on the values of one or more independent variables (input(s))

𝝋 𝒚
Inputs Outputs
(Features) (Targets)
Typically, we will focus on supervised learning problems with only one output

• Unsupervised learning: there are no outputs! The goal may be to discover groups of similar
entities within the data or to project the data from a high-dimensional space (#inputs > 3)
down to two or three dimensions for the purpose of visualization

40 /94
Data science tasks
• Regression*: predict the values assumed by the continuous output(s) from the input(s)
Example: ➢ Predict the prices of houses based on their area

➢ Predict the prices of houses based on their area and number of bedrooms
𝐇𝐨𝐮𝐬𝐞 𝐚𝐫𝐞𝐚
# 𝐛𝐞𝐝𝐫𝐨𝐨𝐦𝐬 𝐏𝐫𝐢𝐜𝐞 [k$]
[feet 2 ]
523 1 115
645 1 150
708 2 210
⋮ ⋮ ⋮

𝜑∈ℝ 𝑦∈ℝ
𝝋 ∈ ℝ2×1
*: covered in this course : supervised : unsupervised 41 /94
Data science tasks
• Classification*: predict the values assumed by the categorical output(s) from the input(s)
Example: ➢ Develop an application that recognizes cats in images

𝐈𝐦𝐚𝐠𝐞 𝐋𝐚𝐛𝐞𝐥 Input: an image

Cat 𝜑= ∈ ℕ𝑊×𝐻×𝐷

Not cat Images are basically matrices of


numbers that describe color intensity

Cat
Output: the class label

𝑦 ∈ Cat, Not cat


Not cat
(single output)

*: covered in this course : supervised : unsupervised 42 /94


Data science tasks
• Classification*: predict the values assumed by the categorical output(s) from the input(s)
Example: ➢ Distinguish cats from dogs based on their height and weight
Height cm
𝝋 ∈ ℝ2×1
(height and weight of the animal)

Cats Dogs
Output: the class label

𝑦 ∈ cat, dog
(single output)

Weight kg

*: covered in this course : supervised : unsupervised 43 /94


Data science tasks
• Causal modeling: identify which inputs (causes) actually influence the outputs (effects)
and, possibly, to what extent
Example: ➢ Did a particular marketing campaign influence the consumers to purchase our
product?

Causal modeling typically involves substantial investments in data, such as randomized


controlled experiments (A/B tests) and sophisticated methods for drawing causal observation
data (“counterfactual” analysis)

What would be the difference in sales if we used an advertisement instead of another?

Technical note: regression and classification are based on correlation, causal modeling is based on
causality

*: covered in this course : supervised : unsupervised 44 /94


Data science tasks
• Causal modeling: identify which inputs (causes) actually influence the outputs (effects)
and, possibly, to what extent

Correlation does not imply causation!


If we take a look at the data representing
monthly ice cream sales and monthly shark
attacks around the United States each year, we
can see that the two variables are highly
correlated
• Does this mean that consuming ice cream
causes shark attacks? No! The more likely
explanation is that more people consume ice
cream and get in the ocean when it’s warmer
outside, explaining the high correlation

Picture taken from [9]

*: covered in this course : supervised : unsupervised 45 /94


Data science tasks
• Clustering*: organize the data into different groups based on their similarity
Example: ➢ Understand which types of customers are similar to each other by grouping
individuals according to several characteristics → personalized marketing
campaigns
Amount spent $

Middle-aged people
with high budget
𝝋 ∈ ℝ2×1
(customer age and
amount spent)
Young people with
low budget
Output: none
Older people with
medium budget

Customer age years

*: covered in this course : supervised : unsupervised 46 /94


Data science tasks
• Co-occurrence grouping: find associations between different entities (characterized by a set
of features) based on transactions involving them
Example: ➢ What items are commonly purchased together? (market basket analysis)

Mouse Mousepad Keyboard

Clustering looks at the similarity between entities based on their features, co-occurrence grouping
considers the similarity of entities based on their appearing together in transactions (e.g., “a keyboard is
not similar to a mouse, although they are typically bought together”)

*: covered in this course : supervised : unsupervised 47 /94


Data science tasks
• Profiling: find the typical behavior of an individual, group or population
Example: ➢ What is the typical credit card usage of a customer segment?

➢ Profile the typical wait time of customers who call into a call center
Proportion of calls

We could profile the “normal” wait time


of customers by reporting the median 𝝋∈ℝ
(not the average, the distribution is
skewed!) and the standard deviation
(wait time)

Output: none

Wait time min


Picture taken from [1]

*: covered in this course : supervised : unsupervised 48 /94


Data science tasks
• Link prediction: predict connections between entities in a network, usually by suggesting that
a link should exist, and possibly also estimating the strength of the link
Example: ➢ Friend recommendations in social networks
People you may know
Tomasz Flynn
Add to friends
Remove

Erika Joseph
Add to friends
Remove

*: covered in this course : supervised : unsupervised 49 /94


Data science tasks
• Dimensionality reduction*: take a large dataset (many inputs and, possibly, many outputs)
and replace it with a smaller dataset, retaining as much information as possible
Example: ➢ Represent a collection of movies in a two-dimensional space (Netflix Prize)
Latent dimension 2

Inputs:
• Movie title
• Year of release
• User id
• User rating
• Rating date

Output: none (in this


example)
Picture taken from [1] Latent dimension 1
*: covered in this course : supervised : unsupervised 50 /94
Data science tasks
• Similarity matching: find similar entities based on data known about them
Example: ➢ Recommendation systems

Inputs:
• Song titles
• Song genres
• Audio signals
• ⋮
• User ratings
• ⋮

Clustering is used for exploratory data analysis (“can we partition the Output: none (in this
data into different groups of similar entities?”), similarity matching has example)
the specific goal of finding similar entities

*: covered in this course : supervised : unsupervised 51 /94


Data science tasks vs algorithms

Data science task Algorithm (or method)


(the problem that we are trying to
solve, what we are trying to do)
Regression, classification, …
≠ (how we solve it, a sequence of
operations to follow)
Neural networks, 𝐾NN, 𝐾-means
clustering, …

• Different data science tasks can be solved by the same algorithms


𝐾-means clustering can be used both for clustering and similarity matching

• Different algorithms can solve the same data science task


A regression problem can be solved by the linear regression method, neural networks and 𝐾NN

In this course, we will study methods for solving different data science tasks

52 /94
Syllabus
1. Introduction to data science 9. Decision trees (regression and classification)

2. Exploratory data analysis 10.Neural networks (regression, classification,


dimensionality reduction…)
3. Recap of statistics 11. Convolutional neural networks (regression,
classification, …)
4. Maximum likelihood estimation 12.Clustering methods (clustering)

5. Linear regression (regression) 13.Principal component analysis (dimensionality


reduction)
6. Logistic regression (classification) 14.Output-error method for system identification
(regression)
7. Bias-variance trade-off

8. Overfitting and regularization

9. Validation and cross-validation

: supervised : unsupervised 53 /94


Outline
1. Course introduction

2. Data science and the data-driven company

3. Data and its types

4. What we are going to do with data (supervised and unsupervised learning)

5. Static and dynamical models in supervised learning

6. From business problems to data science tasks

7. The data mining life cycle (CRISP-DM)

54 /94
Models in supervised learning
Most supervised learning methods rely on mathematical models that describe the
relationship between the inputs and the outputs
Data-generating system

𝝋 𝒚
𝒮
Inputs Outputs
Supervised learning methods
We want 𝒚 ≈ 𝒚

estimate ℳ from data
𝝋 ෝ
𝒚

Inputs Estimated outputs

Mathematical model that describes 𝒮

55 /94
Models in supervised learning
We view both 𝒮 and ℳ as mathematical functions that map inputs (features) to outputs
(targets)

𝝋
Inputs
𝒮
𝒚
Outputs ≡ 𝒚=𝑓 𝝋

𝝋
Inputs


𝒚
Estimated outputs ≡ ෝ = 𝑓መ 𝝋
𝒚

The goal of supervised learning methods is to learn a function 𝑓መ ⋅ that approximates 𝑓 ⋅


well on the whole domain of 𝝋

56 /94
Models in supervised learning
We view both 𝒮 and ℳ as mathematical functions that map inputs (features) to outputs
(targets)

𝝋
Inputs
𝑓 ⋅
𝒚
Outputs ≡ 𝒚=𝑓 𝝋

𝝋
Inputs
𝑓መ ⋅

𝒚
Estimated outputs ≡ ෝ = 𝑓መ 𝝋
𝒚

The goal of supervised learning methods is to learn a function 𝑓መ ⋅ that approximates 𝑓 ⋅


well on the whole domain of 𝝋

57 /94
Dataset notation
Before moving on, we introduce the following notation that we will use for any dataset

We refer to each row of the


𝐇𝐨𝐮𝐬𝐞 𝐚𝐫𝐞𝐚
# 𝐛𝐞𝐝𝐫𝐨𝐨𝐦𝐬 𝐏𝐫𝐢𝐜𝐞 [k$]
[feet 2 ] dataset as an observation
⋮ ⋮ ⋮
523 1 115 𝑖-th observation (in this case it
645 1 150 represents a house but, in
708 2 210
general, it can be any entity)
⋮ ⋮ ⋮
𝝋 𝑖 ,𝑦 𝑖
523
𝝋 𝑖 = 𝑦 𝑖 = 115
1
We denote the dataset as 𝒟 = 𝝋 1 ,𝑦 1 ,…, 𝝋 𝑁 ,𝑦 𝑁
𝑁 (𝑁 observations in total)
= 𝝋 𝑖 ,𝑦 𝑖 𝑖=1

58 /94
Static systems (and models)
A system whose outputs can be determined directly 𝝋 𝑖 𝒚 𝑖
from the inputs is said to be a static system
Inputs
𝑓 ⋅ Outputs
(“memoryless” system)

Example: Ohm’s law 𝑦 𝑖

𝐼 𝑡
𝑅 The output 𝐼 𝑡 at time 𝑡 only
𝑉 𝑡 depends on the input 𝑉 𝑡 at
𝐼 𝑡 =
𝑅 the same time instant
𝑉 𝑡
𝑓 𝜑 𝑖
𝜑 𝑖
We can view each voltage/current measurement by itself (i.e. as an observation
𝜑 𝑖 ,𝑦 𝑖 in its own right), we do not need to consider 𝑉 𝑡 and 𝐼 𝑡 as signals
“The time 𝑡 can be omitted”

59 /94
Static systems (and models)
Static systems need not describe only physics phenomena

𝐈𝐦𝐚𝐠𝐞 𝐋𝐚𝐛𝐞𝐥
𝐇𝐨𝐮𝐬𝐞
#
𝐚𝐫𝐞𝐚 𝐏𝐫𝐢𝐜𝐞 [k$]
𝐛𝐞𝐝𝐫𝐨𝐨𝐦𝐬 Cat
[feet 2 ]
523 1 115
Not cat
645 1 150
708 2 210
⋮ ⋮ ⋮ Cat

𝑓 ⋅ : mapping from house area and # Not cat


bedrooms to price

𝑓 ⋅ : mapping from image to label

60 /94
Learning static systems
In the regression setting, the simplest model that can be used to describe static systems
(but also dynamical systems!) is the linear model
𝑑−1

𝑦 𝑖 = 𝜃0 + 𝜃1 𝜑1 𝑖 + ⋯ + 𝜃𝑑−1 𝜑𝑑−1 𝑖 + 𝜖 𝑖 = ෍ 𝜃𝑗 𝜑𝑗 𝑖 + 𝜖 𝑖
1×1
𝑗=0

𝑖 −th observation
= 𝝋(𝑖)⊤ 𝜽 + 𝜖(𝑖) • 𝜑0 = 1
1×𝑑 𝑑×1 1×1 • 𝝋 𝑖 = 𝜑0 𝜑1 𝑖 ⋯ 𝜑𝑑−1 𝑖 ⊤ ∈ ℝ𝑑×1
⋯ • 𝜽 = 𝜃0 𝜃1 ⋯ 𝜃𝑑−1 ⊤ ∈ ℝ𝑑×1

• 𝑦 𝑖 ∈ℝ

• The vector 𝜽 is called parameters vector → to be found by minimizing a cost function


• The vector 𝝋(𝑖) is called features vector for the 𝑖-th observation → attributes of entities
• The quantity 𝜖(𝑖) is the error due to not perfect explanation of 𝑦(𝑖) using 𝝋(𝑖)

61 /94
Learning static systems
To “learn” means to estimate the values of the parameters in 𝜽 = 𝜃0 𝜃1 ⋯ 𝜃𝑑−1 ⊤

Key idea: find the values of 𝜽 that minimize a “cost” (or “loss”), i.e. an “error” or
“something bad” → it is good to minimize something bad
• This is achieved through optimization

A typical cost in the regression setting is the following


𝑁 𝑁
1 ⊤𝜽 2
1 2
𝐽 𝜽 = ෍ 𝑦 𝑖 −𝝋 𝑖 = ෍𝜖 𝑖
𝑁 𝑁
𝑖=1 𝑖=1

With this cost, we are minimizing the sum of the squared errors between the observed
outputs (i.e. those reported in our dataset) and the outputs estimated by the linear model

62 /94
Learning static systems
Scalar (single) parameter 𝜃 Multiple parameters 𝜽

Cost function
𝑁
1 2
𝐽 𝜽 = ෍𝜖 𝑖
𝑁
𝑖=1

Minimizer of the
cost function:
෡ = arg min 𝐽 𝜽
𝜽
𝜽

This rationale is followed by the linear regression method

𝑦ො 𝑖 = 𝑓መ 𝝋 𝑖 =𝝋 𝑖 ෡
⊤𝜽

63 /94
Dynamical systems (and models)
A system whose outputs (at a certain time instant) 𝒖 𝑡 𝒚 𝑡
cannot be determined directly from the inputs (at the Input
𝒮 Output
same time instant) is said to be a dynamical system SIGNALS SIGNALS

Dynamical models are mathematical models that describe the future evolution of the
variables involved as a function of their past trend

Dynamical systems usually involve the time: the outputs 𝒚 𝑡 at a certain time 𝑡 depend
on the outputs at previous times

This dependency on the past endows the model with a “memory” (i.e. the dynamics)

64 /94
Dynamical systems (and models)
This dependency on the past endows the model with a “memory” (i.e. the dynamics)
6
4

Voltage [V]
2
𝑉 𝑡 Electric 𝜔 𝑡
0
Voltage motor Angular −2
velocity −4
−6
0 2 4 6 8 10 12 14 16 18 20

Angular velocity [rpm]


150
We are dealing with a
dynamical system because,
although the input is 0
constant, the output
keeps evolving
−150 0 2 4 16
6 8 10 12 14 18 20
Time [s]

65 /94
Dynamical systems (and models)
Dynamical systems can be defined in continuous-time or in discrete-time

Physics phenomena are (inherently) continuous


• In this case, the system is described by differential equations

Example: resistor-capacitor circuit (continuous-time)


𝑅 𝑑𝑉𝐶 (𝑡)
𝑖 𝑡 𝑖 𝑡 = 𝐶 𝑉ሶ𝐶 𝑡 𝑉ሶ𝐶 𝑡 =
𝑑𝑡

𝑉 𝑡 𝑉𝐶 𝑡 𝑉 𝑡 = 𝑅 ⋅ 𝑖 𝑡 + 𝑉𝐶 𝑡
𝐶
1 1
𝑉ሶ𝐶 𝑡 + 𝑉𝐶 𝑡 = 𝑉 𝑡
𝑅𝐶 𝑅𝐶

66 /94
Dynamical systems (and models)
However, computers can only manage a finite amount of data. Thus, signals 𝑠 𝑡 should
be sampled at a sampling time 𝑇𝑠 so that we can store a finite amount of data
corresponding to the time instants 𝑘𝑇𝑠 , 𝑘 = 1, … , 𝑁, i.e.
𝑡 → continuous-time

𝑠 𝑡 𝑘 → discrete-time

𝑠 0 , 𝑠 𝑇𝑠 , 𝑠 2𝑇𝑠 , 𝑠 3𝑇𝑠 , …
𝑠0 𝑠𝑘

In the following, for discrete-time 𝑠2


𝑠 10
systems, we will use the notation s 𝑘 𝑠1

with the meaning of 𝑠 𝑘𝑇𝑠 (i.e. the


measurement of 𝑠 ⋅ at the time 𝑘𝑇𝑠 )

67 /94
Dynamical systems (and models)
Example: resistor-capacitor circuit (continuous-time → discrete-time)
𝑅
𝑖 𝑡
1 1
𝑉ሶ𝐶 𝑡 + 𝑉𝐶 𝑡 = 𝑉 𝑡
𝑅𝐶 𝑅𝐶
𝑉 𝑡 𝑉𝐶 𝑡
𝐶
Numerical differentiation
𝑉𝐶 𝑘 + 1 𝑇𝑠 − 𝑉𝐶 𝑘𝑇𝑠
𝑉ሶ𝐶 𝑘𝑇𝑠 ≈ 𝑡 = 𝑘𝑇𝑠
𝑇𝑠

𝑉𝐶 𝑘 + 1 − 𝑉𝐶 𝑘 1 1
𝑠 𝑘 = 𝑠 𝑘𝑇𝑠 + 𝑉𝐶 𝑘 = 𝑉𝑘
𝑇𝑠 𝑅𝐶 𝑅𝐶
Shift back by 1 step and 𝑇𝑠 𝑇𝑠
re-organize equation 𝑉𝐶 𝑘 = 1 − 𝑉𝐶 𝑘 − 1 + 𝑉 𝑘−1
𝑅𝐶 𝑅𝐶

68 /94
From signals to feature vectors
𝒖 𝑡 𝒚 𝑡 𝝋𝑘 𝒚𝑘
Input
𝒮 Output Inputs
𝑓 ⋅ Outputs
SIGNALS SIGNALS (features)
𝑅 𝑇𝑠
𝑖 𝑡 𝑉𝐶 𝑘 = 1 − 𝑉𝐶 𝑘 − 1
𝑅𝐶
𝑇𝑠
𝑉 𝑡 𝑉𝐶 𝑡 + 𝑉 𝑘−1
𝐶 𝑅𝐶


1 1 𝑦 𝑘 =𝑓 𝝋𝑘 = 𝝋 𝑘 ⊤𝜽
𝑉ሶ𝐶 𝑡 + 𝑉𝐶 𝑡 = 𝑉 𝑡
𝑅𝐶 𝑅𝐶 • 𝝋 𝑘 = 𝑉𝐶 𝑘 − 1 𝑉 𝑘−1 ⊤

𝑇𝑠 𝑇𝑠 ⊤
• 𝜽= 1− 𝑅𝐶 𝑅𝐶
• 𝑦 𝑘 = 𝑉𝑐 𝑘
69 /94
Static vs dynamical systems
Static systems Dynamical systems

𝝋 𝑖 𝑦 𝑖 𝝋𝑘 𝑦𝑘
𝑓 ⋅ 𝑓 ⋅
Inputs Outputs Inputs Outputs

• For static systems, we will index the observations with the index 𝑖
• For dynamical systems, we will index the observations with the index 𝑘
𝑘 can be interpreted as the 𝑘-th sampling step

In either case, our aim will be to learn 𝑓 ⋅ from data


• In the static case, we talk about (model) “learning”
• In the dynamical case, we talk about (system) “identification”
Both are supervised learning tasks!

70 /94
Machine Learning (ML), Artificial Intelligence (AI),
Data Science and System Identification
Other tools
AI
Planning
ML
Experiment Search
design
Deep
learning

Subspace
methods Time series Reasoning

Frequency
Visualization
domain
methods
SYSTEM
IDENTIFICATION DATA SCIENCE
AND CONTROL

71 /94
Why do we need models?
All in all, we need a model to better understand the phenomena that are of our interest.
Models are useful for:

• Decision-making: suppose that we are testing a new vaccine. We have two groups of
people. We give the vaccine to the first group (test group) and a placebo to the second
one (control group). Then, we measure some variables from the patients. How can we
determine if the vaccine was effective or not?

• Communication: a model allows to communicate to third parties the main insights


and results of your analysis

72 /94
Why do we need models?
All in all, we need a model to better understand the phenomena that are of our interest.
Models are useful for:

• Prediction: forecast the values that the output variables will assume based on the
values assumed by the inputs variables and on which we have no data about

𝐇𝐨𝐮𝐬𝐞 𝐚𝐫𝐞𝐚
# 𝐛𝐞𝐝𝐫𝐨𝐨𝐦𝐬 𝐏𝐫𝐢𝐜𝐞 [k$]
[feet 2 ]
523 1 115
How much does a 600 feet 2 house with 2
645 1 150 bedrooms cost?
708 2 210
⋮ ⋮ ⋮

73 /94
Why do we need models?
All in all, we need a model to better understand the phenomena that are of our interest.
Models are useful for:

• Inference: understand how changes in the inputs affect the outputs


𝐇𝐨𝐮𝐬𝐞 𝐚𝐫𝐞𝐚
# 𝐛𝐞𝐝𝐫𝐨𝐨𝐦𝐬 𝐏𝐫𝐢𝐜𝐞 [k$]
[feet 2 ]
• Does increasing house area increase
523 1 115
the house price (and by how much)?
645 1 150 • Is # bedrooms actually associated
708 2 210 with the price of a house?

⋮ ⋮ ⋮

Prediction vs inference: prediction is not necessarily concerned with the structure of the
model 𝑓መ ⋅ and its complexity (𝑓መ ⋅ can be seen as a black-box) while inference uses the model to
understand the relationship between each input and each output

74 /94
Why do we need models?
All in all, we need a model to better understand the phenomena that are of our interest.
Models are useful for:

• Simulation: we can simulate, with a computer, the response (outputs) of a model due
to certain inputs. By looking at the model’s response, we can get a better grasp of the
modeled system

75 /94
Why do we need models?
All in all, we need a model to better understand the phenomena that are of our interest.
Models are useful for:

• Control: often, in control engineering, we need a model of a system to design a


controller that limits the deviation of the controlled variables 𝒚 𝑡 from the reference
variables 𝒔 𝑡 (setpoints)

𝒔 𝑡 + 𝒖 𝑡 𝒚 𝑡
Controller 𝒮

76 /94
Why do we need models?
All in all, we need a model to better understand the phenomena that are of our interest.
Models are useful for:

• Fault diagnosis: we can check the presence of faults by comparing signals that come
from the real system with those simulated by the estimated model
Faults

𝒖(𝑡) 𝒚(𝑡)
𝒮
Model-based fault diagnosis system

+ Residuals 𝒓ത (𝑡) Decision Diagnostic


Residuals
𝒓(𝑡) processing logic decision

ℳ 𝜄(𝑡)
ෝ(𝑡)
𝒚 Residuals evaluation
Residuals generation

77 /94
Outline
1. Course introduction

2. Data science and the data-driven company

3. Data and its types

4. What we are going to do with data (supervised and unsupervised learning)

5. Static and dynamical models in supervised learning

6. From business problems to data science tasks

7. The data mining life cycle (CRISP-DM)

78 /94
Business problems as data science tasks
Each data-driven project is unique. First and foremost, decompose the business problem
into data science subtasks that can be solved by existing methods
Machine learning engineers
Data science focus on these aspects

(sub)task(s) Algorithm(s)/
Business • Regression
method(s)
• Classification solve
problem decompose • Causal modeling
• Clustering analyze
• Co-occurrence grouping
• Profiling
• Link prediction Analyze the results
• Dimensionality reduction (to derive insights and drive
Data scientists focus • Similarity matching
on these aspects business-related decisions)

79 /94
Business problems as data science tasks

• Spam e-mail detection system Classification • Market segmentation Clustering


Co-occurrence
• Credit approval Classification • Market basket analysis grouping

• Fraud detection Profiling • Language models (word2vec)


Similarity matching
Link
• Recognize objects in images Classification • Social network analysis prediction

• Find the relationship between house • Low-order data representations


prices and house sizes Regression Dimensionality reduction
Similarity
• Movies recommendation matching
• Predict the stock market Regression
• A/B testing Causal modeling

80 /94
Selecting data-driven projects
Focus on data science and machine learning
projects that are valuable and feasible x
x x Valuable
What data- xx
driven x for your
x x x
Think about automating tasks rather than business
methods x xx x
can do x x
automating jobs x

What are the main drivers of the business values?


Data science Domain
experts experts
What are the main pain points in your business?

81 /94
Selecting data-driven projects
MANUFACTURING LINE MANAGER
Data science Machine learning
Mix clay Shape
Mix clay Shape mug
mug Add
Add glaze
glaze

Fire kiln
Fire kiln Final inspection NO DEFECT NO DEFECT DEFECT

• Optimize production yield • Automatic visual inspection

82 /94
Selecting data-driven projects
RECRUITING
Data science Machine learning
Mario Rossi
Email Phone Personal info
YES
Education
outreach screen
Employement

Mario Rossi
Onsite
Offer Personal info
interview NO
Education
Employement

• Optimize recruiting process • Automatic resume screening

83 /94
Selecting data-driven projects
MARKETING
Data science Machine learning

Version A Version B

• A/B testing websites • Recommendation system

84 /94
Outline
1. Course introduction

2. Data science and the data-driven company

3. Data and its types

4. What we are going to do with data (supervised and unsupervised learning)

5. Static and dynamical models in supervised learning

6. From business problems to data science tasks

7. The data mining life cycle (CRISP-DM)

85 /94
CRISP-DM process
Picture taken from [1]
Cross Industry Standard Process for Data
Mining (CRISP-DM)

Iteration is the rule rather than the exception:


• Business understanding
• Data understanding
• Data preparation
• Modeling
• Evaluation
• Deployment

86 /94
CRISP-DM: Business understanding
Cast the business problem into one or more data science problems
• Regression
• Classification Think carefully about the use scenario:
• Causal modeling
• Clustering • What exactly do we want to do?
• Co-occurrence grouping
• Profiling • How exactly would we do it?
• Link prediction
• Dimensionality reduction • What parts of this use scenario constitute possible data
• Similarity matching mining models?

87 /94
CRISP-DM: Data understanding
Identify the available and needed data

Costs/benefits of acquiring each source of data

Are the data at our disposal related to the business problem?

Can we use a proxy for the data that we do not have?

As data understanding progresses, the solution paths may differ

88 /94
CRISP-DM: Data preparation
Clean and prepare the data for usage

Usually, data mining algorithms require data in a specific format


which is different from the one that is readily available
• Convert string to numbers, infer missing data, import data from excel files, …

Data preprocessing/cleaning/labeling (most of data science project time is spent


here) [5]

Pay attention to not use historical data that will not be available when decisions need
to be made

89 /94
CRISP-DM: Modeling
Estimate a mathematical model to extract patterns from data

In most cases, standard algorithms can be directly applied on data

The aim is to find a model that performs well on unseen data

The type of the model is chosen based on:


• What data science task we want to solve
• Performance measures
• Availability of libraries for deployment

90 /94
CRISP-DM: Evaluation
Assess the validity of the results

We could find patterns that exist only in the particular dataset that
we have at our disposal (overfitting)

Does the model satisfy the original business goals?

The devised solution and the model’s decisions should be comprehensible by the
stakeholders

Usually, evaluation is performed before deploying. In this case, build environments that
closely mimic the real use scenario

91 /94
CRISP-DM: Deployment
Put the model (or the data mining steps) into production

Usually requires to re-code the model, to make it compatible with


existing technologies

This step can require a notable investment in time. Usually, the data science team
builds a prototype that is then passed on to the development team

For this reason, it is suggested to include a member of the development team in the
early phases of the data science project

Deployment can involve not only the final model, but also previous phases (data
collection, model building, evaluation)
92 /94
Workflow of a machine learning project
Build a home assistant device
Amazon Google Apple Baidu
Echo Home Siri DuerOS
1. Collect data
«Alexa» «Hello»

2. Train model
• Iterate many times
until good enough A B
Audio #1 «Alexa»
Audio #2 «Hello»
3. Deploy model
«Alexa»
• Get data back
• Maintain/update model

93 /94
Workflow of a data science project
Optimize a
manufacturing line

Mix clay Shape mug Add glaze Fire kiln Final inspection
1. Collect data
𝐌𝐢𝐱𝐢𝐧𝐠 𝐭𝐢𝐦𝐞
𝐂𝐥𝐚𝐲 𝐛𝐚𝐭𝐜𝐡 # 𝐒𝐮𝐩𝐩𝐥𝐢𝐞𝐫
[minutes]
2. Analyze data 001 Supplier 1 35
• Iterate many times to 034 Supplier 1 22
get good insights 109 Supplier 2 28

3. Deploy model 𝐌𝐮𝐠 𝐓𝐞𝐦𝐩𝐞𝐫𝐚𝐭𝐮𝐫𝐞 𝐓𝐢𝐦𝐞 𝐢𝐧 𝐤𝐢𝐥𝐧


𝐇𝐮𝐦𝐢𝐝𝐢𝐭𝐲
𝐛𝐚𝐭𝐜𝐡 # 𝐢𝐧 𝐤𝐢𝐥𝐧 [F] [hours]
• Deploy changes
001 0.002% 1410 22
• Re – analyze new data 034 0.003% 1520 24
periodically 0.002% 1420 22
109

94 /94
References
1. Provost, Foster, and Tom Fawcett. “Data Science for Business: What you need to know about data
mining and data-analytic thinking”. O'Reilly Media, Inc., 2013. Chapters 1-2.
2. Brynjolfsson, E., Hitt, L. M., and Kim, H. H. “Strength in numbers: How does data driven decision making
affect firm performance?”. Tech. rep., available at SSRN: http://ssrn.com/abstract=1819486, 2011
3. Nucleus Research, 2014. http://bit.ly/XQFDbv.
4. Notes from the AI frontier: Modeling the impact of AI on the world economy, 2018.
5. Pyle, D. “Data Preparation for Data Mining”. Morgan Kaufmann, 1999. Chapter 1.
6. G. James, D. Witten, T. Hastie, R. Tibshirani. “An Introduction to Statistical Learning”. 2° Edition,
Springer, 2021. Chapters 1-2.
7. Data scientist: The Sexiest Job the 21st Century, 2012.
8. Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2020,
with forecasts from 2021 to 2025, 2022.
9. Correlation does not imply causation: 5 real-world examples, 2021.

96 /94

You might also like