EEL 6935 Data Analytics: Introduction To Data Science & Machine Learning
EEL 6935 Data Analytics: Introduction To Data Science & Machine Learning
LECTURE 1
Jan. 9, 2018
Data
e
• Highly interdisciplinary
• No. 1 job on the market *
Data Domain
Biological Sciences
Health Care
Physical Sciences Hardware Volume
Social Sciences Software Big Variety
Business
Finance
Internet
IoT
Data Velocity
Veracity
Data Science
Sports
Cybersecurity
What does a Data Scientist do?
• Understands the physical process (science) that generates data
• e.g., how a transmitted signal travels in air – wireless communications, how people behave in
stock market – economics, how DNA transcribes RNA – genetics, how a planet moves on its
orbit – astronomy
• Clustering: group similar objects together • Density estimation: estimate the distribution
of data within the space of possible values
Poor fit & generalization Good fit & generalization Perfect fit, Poor generalization
Model too simple! Model good enough! Model too complex, fits noise!
OVER-FITTING !
& 3
𝐸"#$ = ∑ w
' ,45{* +, , /0, }
2
#
<
𝑦 𝑥9 , w = : 𝑤< 𝑥9
<=>
What is Machine Learning?
• Regularization: avoid over-fitting by adding a penalty term to error function to
shrink coefficients (shrinkage)
& 3 A
𝐸"#$ = '
∑,45{* +, ,w /0, }2 @
B
w 2
• Validation set: partition available data into a training set and a validation set to
optimize model complexity (M in previous slide)
What is Machine Learning?