0% found this document useful (0 votes)
237 views24 pages

Fundamentals of Machine Learning

This document provides an overview of machine learning fundamentals. It discusses the instructor Ekpe Okorafor's background and affiliations. The objectives are to explain what machine learning is, three common techniques (collaborative filtering, clustering, classification), how organizations apply these techniques, and the relationship between algorithms and data volume. More data is usually preferable to a better algorithm.

Uploaded by

Ankitmaurya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
237 views24 pages

Fundamentals of Machine Learning

This document provides an overview of machine learning fundamentals. It discusses the instructor Ekpe Okorafor's background and affiliations. The objectives are to explain what machine learning is, three common techniques (collaborative filtering, clustering, classification), how organizations apply these techniques, and the relationship between algorithms and data volume. More data is usually preferable to a better algorithm.

Uploaded by

Ankitmaurya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Fundamentals of Machine

Learning

Instructor: Ekpe Okorafor


1. Accenture – Big Data Academy
2. Computer Science
African University of Science & Technology
Ekpe Okorafor PhD
Affiliations:
• Accenture Digital – Big Data Academy
 Principal, Big Data & Analytics
• African University of Science & Technology
 Professor, Computer Science / Data Science
 Research Professor - High Performance
Computing Center of Excellence

Research Interests:
• Big Data, Predictive & Adaptive Analytics • High Performance Computing & Network Architectures
• Statistical Machine Learning • Distributed Storage & Processing
• Performance Modelling and Analysis • Massively Parallel Processing & Programming
• Information Assurance and Cybersecurity. • Fault-tolerant Systems

Email: [email protected]; [email protected]


Twitter: @EkpeOkorafor; @Radicube
Objectives

Objectives
• What machine learning is
• What are three common machine learning techniques
• How organizations are applying these techniques
• What is the relationship between algorithms and data
volume

3
Outline

• Overview
• The three C’s of machine learning
• Importance of data and algorithms
• Essential points
• Conclusion

4
Outline

• Overview
• The three C’s of machine learning
• Importance of data and algorithms
• Essential points
• Conclusion

5
Fundamentals of Computer Programming

• Let’s first consider how a typical program works


– Hardcoded conditional logic
– Predefined reactions when those conditions are met

$ cat spam-filter.py
#!/usr/bin/env python

import sys

for line in sys.stdin:


if Make MONEY Fa$t At Home!!! in line:
print This message is likely spam
if Happy Birthday from Aunt Betty in line:
print This message is probably OK

• The programmer must consider all possibilities at design time


• An alternative technique is to have computers learn what to do
6
What is Machine Learning

• Machine learning is a field within artificial intelligence


(AI)
– AI: the science and engineering of making intelligent machines
• Machine learning focuses on automated knowledge
acquisition
– Primarily through the design and implementation of algorithms
– These algorithms require empirical data as input
• Machine learning algorithms learn based on input
provided
– Amount of data is often more important than the algorithm itself

7
What is Machine Learning (cont’d)

• The output produced varies by application


– Product recommendations
– Items grouped based on similarity
– Possible diagnosis of a disease
• These are examples of The Three C’s of machine
learning

8
What is Machine Learning (cont’d)

• The output produced varies by application


– Product recommendations
– Items grouped based on similarity
– Possible diagnosis of a disease
• These are examples of ‘The Three Cs’ of machine
learning

9
Outline

• Overview
• The three C’s of machine learning
• Importance of data and algorithms
• Essential points
• Conclusion

10
The ‘Three C’s’

• Three established categories of machine learning


techniques:
– Collaborative filtering (recommendations)
– Clustering
– Classification

11
Collaborative Filtering

• Collaborative filtering is a technique for


recommendations
– It’s one primary type of recommender system
– We’ll cover it in detail today
• Helps users find items of relevance
– Among a potentially vast number of choices
– Based on comparison of preferences between users

12
Applications Involving Collaborative
Filtering
• Collaborative filtering is domain agnostic
• Can use the same algorithm to recommend practically
anything
– Movies (movielens, Netflix, etc)
– Television (TiVO suggestions)
– Music (Several popular music download and streaming services)
– Colleges (Application to several colleges can be a aunting task)
• Amazon uses CF to recommend a variety of products

13
Clustering

• Clustering algorithms discover structure in collections


of data
– Where no formal structure previously existed
• They discover what clusters (‘groupings’), naturally
occur in data
– By examining various properties of the input data
• Clustering is often used for exploratory analysis
– Divide huge amount of data into smaller groups
– Can then tune analysis for each group

14
Applications Involving Clustering

• Market segmentation
– Group similar customers in order to target them effectively
• Finding related news articles
– Google News
• Epidemiological studies
– For example, identifying cancer cluster and finding root cause
• Computer vision (groups of pixels that cohere into
objects)
– Related pixels clustered to recognize faces or license plates

15
Classification

• The previous two techniques are unsupervised


learning
– The algorithm discovers recommendations or groups
• Classification is a form of ‘supervised’ learning
– Requires training with data that has known labels
• These are healthy cells, those are cancerous
– Learns how to label new records based on that information

16
Applications Involving Classification

• Spam filtering
– Train using a set of spam and non/spam messages
– System will eventually learn to detect unwanted e/mail
• Oncology
– Train using images of benign and malignant tumors
– System will eventually learn to identify cancer
• Risk Analysis
– Train using financial records of customers who do/don’t default
– System will eventually learn to identify risk customers

17
Outline

• Overview
• The three C’s of machine learning
• Importance of data and algorithms
• Essential points
• Conclusion

18
Relationship of Algorithms and Data
Volume
• There are many algorithms for each type of machine
learning
– There is no overall best algorithm
– Each algorithm has advantages and limitations
• Algorithm choice is often related to data volume
– Some scale better than others
• Most algorithms offer better results as volume
increases
– Best approach = simple algorithm + lots of data

19
Relationship of Algorithms and Data
Volume (cont’d)
It’s not who has the best algorithms that wins.
It’s who has the most data. [Banko and Brill, 2001]

20
Outline

• Overview
• The three C’s of machine learning
• Importance of data and algorithms
• Essential points
• Conclusion

21
Essential Points

• Machine learning algorithms learn based on data


provided
• Collaborative filtering recommends items
• Clustering discovers how to group a set of items into
subsets
• Classification is supervised learning that can identify
item types
• More data is usually preferable to a better algorithm

22
Outline

• Overview
• The three C’s of machine learning
• Importance of data and algorithms
• Essential points
• Conclusion

23
Conclusion

In this section you have learned


• What machine learning is
• What are three common machine learning techniques
• How organizations are applying these techniques
• What is the relationship between algorithms and data
volume

24

You might also like