0% found this document useful (0 votes)
270 views

Data Science ppt

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
270 views

Data Science ppt

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

DATA SCIENCE

Using python
INTRODUCTION TO
PYTHON

• Python: A Versatile Programming Language

• Created by Guido van Rossum in the late 1980s

• Open-source and widely used for various applications

• Known for its simplicity and readability


CURRICULUM FOR PYTHON

 Object and Data structure(String, List, Tuple, dictionary,


sets and booleans)
 Statements in Python(if else, for loop and while loop)

 Function in Python

 Modules and packages

 Basic built-in python modules


NECESSARY MODULES FOR
DATA SCIENCE

Matplot Sk-
Numpy Pandas Seaborn
lib Learn
1.NUMPY

o Numpy is a Python library used for working with arrays.

o It also has functions for working in domain of linear


algebra, fourier transform, and matrices.
o Numpy was created in 2005 by Travis Oliphant. It is an
open source project and you can use it freely.
o Numpy stands for Numerical Python.
2.PANDAS

 Pandas is a Python library used for working with data


sets.
 It has functions for analyzing, cleaning, exploring, and
manipulating data.
 The name "Pandas" has a reference to both "Panel Data",
and "Python Data Analysis" and was created by Wes
McKinney in 2008.
3.MATPLOTLIB

 Matplotlib is a low level graph plotting library in python


that serves as a visualization utility.
 Matplotlib was created by John D. Hunter.

 Matplotlib is open source and we can use it freely.

 Matplotlib is mostly written in python, a few segments


are written in C, Objective-C and Javascript for Platform
compatibility.
4.SEABORN

Python Seaborn library is a widely popular


data visualization library that is commonly used for data
science and machine learning tasks. You build it on top of
the matplotlib data visualization library and can perform
exploratory analysis. You can create interactive plots to
answer questions about your data.
5.SK-LEARN

Scikit-learn (Sklearn) is the most useful and robust library


for machine learning in Python. It provides a selection of
efficient tools for machine learning and statistical modeling
including classification, regression, clustering and
dimensionality reduction via a consistence interface in
Python. This library, which is largely written in Python, is
built upon NumPy, SciPy and Matplotlib
DATA
PREPROCESSING
Acquiring and
importing the
dataset.

Data Handling the


Decomposition missing values.
and split.

Scaling and Handling


Normalizing the Categorical
data. Features.
Feature
engineering and
Feature
selection(import
ance).
EDA
(EXPLORATORY DATA
ANALYSIS)
Answering Questions
through data.

Data Visualization (line,


scatter plots).

Analyze various aspects of


the data.

Statistical Analysis.

Correlation
Analysis( Positive and
negative correlation,
multicollinearity)
MODELING( MACHINE
LEARNING MODELS)
Creating All
Machine
Regression:
learning Models
From Scratch Linear Ploynomial Multiple Linear
(Theory and
and with regression Regression Regression
code
modules
implementation)
.

KNN (k-Nearest SVM (Support Logistic


Classification Decision Tree
Neighbor) Vector Machine) Regression

Parameter
Optimazation
Random Forest Naïve Bayes
using Grid
Search
EVALUATION

You might also like