0% found this document useful (0 votes)
101 views

EEL 6935 Data Analytics: Introduction To Data Science & Machine Learning

This document provides an introduction to a lecture on data science and machine learning. It defines data science as a highly interdisciplinary field that uses mathematics, computer science, engineering, and other areas. It also discusses what data scientists do, such as understanding data generation processes, modeling data using statistics and probability, and developing algorithms to learn from data and discover patterns. The document then defines machine learning as using algorithms to discover patterns in data and infer information about the data source. It provides examples of different machine learning techniques like supervised, unsupervised, and semi-supervised learning. Finally, it discusses concepts like model complexity, regularization, and cross validation that are important for machine learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views

EEL 6935 Data Analytics: Introduction To Data Science & Machine Learning

This document provides an introduction to a lecture on data science and machine learning. It defines data science as a highly interdisciplinary field that uses mathematics, computer science, engineering, and other areas. It also discusses what data scientists do, such as understanding data generation processes, modeling data using statistics and probability, and developing algorithms to learn from data and discover patterns. The document then defines machine learning as using algorithms to discover patterns in data and infer information about the data source. It provides examples of different machine learning techniques like supervised, unsupervised, and semi-supervised learning. Finally, it discusses concepts like model complexity, regularization, and cross validation that are important for machine learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

EEL

6935 Data Analytics

LECTURE 1

Introduction to Data Science & Machine Learning

Jan. 9, 2018
Data

e
• Highly interdisciplinary
• No. 1 job on the market *

* McKinsey, Forbes, Harvard Business Review, Glassdoor, CareerCast


What is Data Science?
Mathematics Computer Science Engineering Library & Info. Science
Probability Machine Learning Signal Processing Information Retrieval
Statistics Data Mining Pattern Recognition Info. Management
Optimization Database Operations Research Ontology
Linear Algebra High P. Computing Data Compression Knowledge Represent.

Data Domain
Biological Sciences
Health Care
Physical Sciences Hardware Volume
Social Sciences Software Big Variety
Business
Finance
Internet
IoT
Data Velocity
Veracity
Data Science
Sports
Cybersecurity
What does a Data Scientist do?
• Understands the physical process (science) that generates data
• e.g., how a transmitted signal travels in air – wireless communications, how people behave in
stock market – economics, how DNA transcribes RNA – genetics, how a planet moves on its
orbit – astronomy

• Models data using probability & statistics

• Develops algorithms that


• learn from data
• infer about the data source
(i.e., generalize the information contained in data to the data source)

• Discovers patterns/regularities in data


What is Machine Learning?
• Through algorithms, discover patterns in data, and
use them to infer about the data source, e.g.,
• Feature extraction: extract the meaningful part
from each object/instance in data
• hand-designed for a specific application or
• learned from data in an unsupervised fashion
• very important! active research area, hardest part in big
data problems
• Supervised Learning: learn a model from training
data with ground truth available and use the
learned model for new/test data
• Classification: assign each object to a category, e.g.,
handwritten digit recognition, face recognition
• Regression: estimate relationships between response
and explanatory variables, e.g., prediction of travel times
in traffic, estimation of class probabilities
What is Machine Learning?
• Unsupervised Learning: no ground truth in training data

• Clustering: group similar objects together • Density estimation: estimate the distribution
of data within the space of possible values

• Semi-supervised Learning: labeled and unlabeled data together in training


• Anomaly detection: detect instances that significantly deviate from standard patterns
What is Machine Learning?
• Objective: Select model that generalizes well to unseen possible data

Poor fit & generalization Good fit & generalization Perfect fit, Poor generalization
Model too simple! Model good enough! Model too complex, fits noise!
OVER-FITTING !
& 3
𝐸"#$ = ∑ w
' ,45{* +, , /0, }
2

#
<
𝑦 𝑥9 , w = : 𝑤< 𝑥9
<=>
What is Machine Learning?
• Regularization: avoid over-fitting by adding a penalty term to error function to
shrink coefficients (shrinkage)

& 3 A
𝐸"#$ = '
∑,45{* +, ,w /0, }2 @
B
w 2

• Validation set: partition available data into a training set and a validation set to
optimize model complexity (M in previous slide)
What is Machine Learning?

K-fold Cross Validation

You might also like