21284254 Python Module 5
21284254 Python Module 5
NumPy - Basics, Creating arrays, Arithmetic, Slicing, Matrix Operations, Random numbers.
Plotting and visualization. Matplotlib - Basic plot, Ticks, Labels, and Legends. Working with
CSV files. – Pandas - Reading, Manipulating, and Processing Data. Introduction to Micro
services using Flask.
13/06/23
Tuesday
Numpy: It provides the data structures, algorithms, and library glue needed
for most scientific applications involving numerical data in Python.
Pandas: It provides high-level data structures and functions designed to make working
with structured or tabular data fast, easy, and expressive.
Matplolib: It is the most popular Python library for producing plots and other two-
dimensional data visualizations.
NumPy - Basics, Creating arrays, Arithmetic, Slicing, Matrix Operations, Random numbers
* NumPy, short for Numerical Python, is one of the most important foundational packages for
numerical computing in Python.
*It has fast array-processing capabilities and is used in data analysis as a container for data to be
passed between algorithms and libraries. Most computational packages providing scientific
functionality use NumPy’s array objects for data exchange.
* One of the reasons NumPy is so important for numerical computations in Python is because it is
designed for efficiency on large arrays of data
- NumPy internally stores data in a contiguous block of memory, independent of other built-in
Python objects
- NumPy operations perform complex computations on entire arrays without the need for Python
for loops
* Mathematical functions for fast operations on entire arrays of data without having to write loops.
* Tools for reading/writing array data to disk and working with memory-mapped files.
* N-dimensional array object, or ndarray, is a fast, flexible container for large datasets in Python.
* An ndarray is a generic multidimensional container for homogeneous data; that is, all of the
elements must be the same type
import numpy as np
data = np.random.randn(2, 3)
data
* Creating ndarrays
- The easiest way to create an array is to use the array function. This accepts any
sequence-like object (including other arrays) and produces a new NumPy array containing the
passed data.
* Nested
sequences, like a list of equal-length lists, will be converted into a multidimensional array
* Python infers the shape of the array from the data
* ndim and shape attributes show number of dimensions and size of the arrays
* To know the data type of the array use dtype metadata object;
* In addition to np.array , there are a number of other functions for creating new arrays.
EX:
- zeros and ones create arrays of 0s or 1s, respectively, with a given length or shape.
- empty creates an array without initializing its values to any particular value.
- To create a higher dimensional array with these methods, pass a tuple for the shape
Data Types for ndarrays
* The data type or dtype is a special object containing the information (or metadata, data about data)
the ndarray needs to interpret a chunk of memory as a particular type of data.
* You can explicitly convert or cast an array from one dtype to another using ndarray’s astype
method
* Calling astype always creates a new array (a copy of the data), even if the new dtype is the same
as the old dtype.
* If we convert floating-point numbers to be of integer dtype, the decimal part will be truncated
* If casting were to fail for some reason (like a string that cannot be converted to float64 ), a
ValueError will be raised.
Arithmetic with NumPy Arrays
* Arrays are important because they enable you to express batch operations on data
without writing any for loops. NumPy users call this vectorization.
* Any arithmetic operations between equal-size arrays applies the operation element-wise.
* Illustration
* Arithmetic operations with scalars propagate the scalar argument to each element in
the array.
* Comparisons between arrays of the same size yield boolean arrays:
* Here we discuss about the different ways you may want to select a subset of your data or
individual elements.
Illustration 1
* As you can see in the above illustration 1, if you assign a scalar value to a slice, as in arr[5:8] = 12
, the value is propagated (or broadcasted ) to the entire selection.
* Note: Array slices are views on the original array. This means that the data is not copied, and any
modifications to the view will be reflected in the source array.
Illustration 2
* If you want a copy of a slice of an ndarray instead of a view, you will need to explicitly copy the
array—
for example, arr[5:8].copy()
* Two-dimensional array: In a 2d array the elements at each index are no longer scalars, but rather
one-dimensional arrays:
* arr3d[1, 0] gives you all of the values whose indices start with (1, 0), forming a 1-dimensional
array
Indexing with Slices
* 2 -D array slicing
* You can pass multiple slices just like you can pass multiple indices
* You can mix integer indexes and slices
- To select the second row but only the first two columns:
- To select the third column but only the first two rows
Transposing Arrays
* Transposing is a special form of reshaping that similarly returns a view on the under‐
lying data without copying anything. Arrays have the transpose method and also the
special T attribute.
Linear Algebra
* In numpy there is a function dot , both an array method and a function in the numpy
namespace, for matrix multiplication:
* numpy.linalg has a standard set of matrix decompositions and things like inverse
and determinant.
26/06/2023
Monday
* pandas: It contains data structures and data manipulation tools designed to make data cleaning
and analysis fast and easy in Python. Panda’s high-level data structures and functions make,
working with structured or tabular data fast, easy, and expressive.
* The primary objects in pandas are the DataFrame , a tabular, column-oriented data structure with
both row and column labels, and the Series , a one-dimensional labeled array object.
* pandas adopts significant parts of NumPy’s idiomatic style of array-based computing, especially
array-based functions and a preference for data processing without for loops
* pandas blends the high-performance, array-computing ideas of NumPy with the flexible data
manipulation capabilities of spreadsheets and relational databases (such as SQL).
* The biggest difference between Panda and Numpy is that pandas is designed for working with
tabular or heterogeneous data. NumPy, by contrast, is best suited for working with homogeneous
numerical array data.
* Importing Panda
import pandas as pd
from pandas import Series, DataFrame
* Series
A Series is a one-dimensional array-like object containing a sequence of values (of similar types
to NumPy types) and an associated array of data labels, called its index.
* Since we did not specify an index for the data, a default one consisting of the integers 0 through N
- 1 (where N is the length of the data) is created.
* You can get the array representation and index object of the Series via its values and index
attributes, respectively
* To create a Series with an index identifying each data point with a label
* You can use labels in the index when selecting single values or a set of values
DataFrame
* The DataFrame has both a row and column index; it can be thought of as a dict of Series all
sharing the same index.
Under the hood, the data is stored as one or more two-dimensional blocks rather than a list, dict,
or some other collection of one-dimensional arrays.
* There are many ways to construct a DataFrame, though one of the most common is from a dict of
equal-length lists or NumPy arrays:
* The resulting DataFrame will have its index assigned automatically as with Series, and
the columns are placed in sorted order:
* For large DataFrames, the head method selects only the first five rows
* If you specify a sequence of columns, the DataFrame’s columns will be arranged in
that order
Accessing data is a necessary first step for using most of the tools. pandas features a number of
functions for reading tabular data as a DataFrame object.
* All these functions, are meant to convert text data into a DataFrame.
Working with CSV: CSV - (Comma Separated value)
* CSV (comma-separated value) files are a common file format for transferring and storing data.
-*CSV is a standard for storing tabular data in text format, where commas are used to separate the
different columns, and newlines (carriage return / press enter) used to separate rows. Typically, the
first row in a CSV file contains the names of the columns for the data.
- The ability to read, manipulate, and write data to and from CSV files using Python is a key skill to
master for any data scientist or business analysis.
* A CSV file is a file with a “.csv” file extension, e.g. “data.csv”, “super_information.csv”. The
“CSV” in this case lets the computer know that the data contained in the file is in “comma separated
value” format
* A “CSV” file, that is, a file with a “csv” filetype, is a basic text file. Any text editor such as
NotePad on windows or TextEdit on Mac, can open a CSV file and show the contents.
* You can create a text file in a text editor, save it with a .csv extension, and open that file in Excel
or Google Sheets to see the table form.
* Pandas is the most popular data manipulation package in Python, and DataFrames are the Pandas
data type for storing tabular 2D data.
* A file may not always have a header. Consider such a CSV file , ex2.csv
* You can allow pandas to assign default column names, or you can specify names yourself:
* Suppose you wanted the message column to be the index of the returned DataFrame. You can
either indicate you want the column at index 4 or named 'message' using the index_col argument:
* You can skip the first, third, and fourth rows of a file with skiprows
* Some frequently used options in pandas.read_csv and pandas.read_table. (It has more than 50
options)
Reading Text Files in Pieces
* When processing very large files or figuring out the right set of arguments to correctly process a
large file, you may only want to read in a small piece of a file or iterate through smaller chunks of
the file.
* If you want to only read a small number of rows (avoiding reading the entire file), specify that
with nrows :
* Using DataFrame’s to_csv method, we can write the data out to a comma-separated
file. Missing values appear as empty strings in the output.
* Missing values appear as empty strings in the output. You might want to denote them
by some other sentinel value:
(import sys: writing to sys.stdout so it prints the text result to the console)
* You can also write only a subset of the columns, and in an order of your choosing:
Q1. Read a CSV file named courses.csv using pandas and do the following tasks:
# 0 1 2 3
#0 Pandas 20000 35 Days 1000
#1 Java 15000 NaN 800
#2 Python 15000 30 Days 500
#3 PHP 18000 30 Days 800
By default, it considers the first row from excel as a header and used it as
DataFrame column names. In case you wanted to consider the first row
from excel as a data record use header=None param and use names
param to specify the column names
* To import
* Plots in matplotlib reside within a Figure object. You can create a new
figure with plt.figure :
* You can’t make a plot with a blank figure. You have to create one or
more subplots using add_subplot :
-For example, to plot x versus y with green dashes, you would execute:
ax.plot(x, y, 'g--')
-The same plot could also have been expressed more explicitly as:
* Line plots can additionally have markers to highlight the actual data
points. The marker can be part of the style string, which must have color
followed by marker type and line style.
* This could also have been written more explicitly as:
2. Called with parameters sets the parameter value (e.g., plt.xlim([0, 10]) ,
sets the x-axis range to 0 to 10)
To change the x-axis ticks, it’s easiest to use set_xticks and set_xticklabels .
* The set_xticks() and set_yticks() function takes a list object as argument. The
elements in the list denote the positions on corresponding action where ticks will be
displayed.
- set_xticks instructs matplotlib where to place the ticks along the data range. By
default these locations will also be the labels.
- we can set any other values (other than the default tick values) as the labels using
set_xticklabels :
Title
Adding legends
y = np.sin(x)
- values from two arrays are plotted using the plot() function.
Matplotlib – Bar Plot
* A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars
with heights or lengths proportional to the values that they represent. The bars can be plotted
vertically or horizontally.
Demonstration
A simple example of the Matplotlib bar plot is given below. It shows the number of students
enrolled for various courses offered at an institute
Matplotlib – Pie Chart
* Pie Chart can only display one series of data. Pie charts show the size of items
(called wedge) in one data series, proportional to the sum of the items. The data
points in a pie chart are shown as a percentage of the whole pie.
Demonstration
* Following code uses the pie() function to display the pie chart of the list of students
enrolled for various computer language courses. The proportionate percentage is
displayed inside the respective wedge with the help of autopct parameter which is set
to %1.2f%.
Matplotlib - Scatter Plot
* Scatter plots are used to plot data points on horizontal and vertical axis in the attempt to show
how much one variable is affected by another. Each row in the data table is represented by a marker
the position depends on its values in the columns set on the X and Y axes. A third variable can be
set to correspond to the color or size of the markers, thus adding yet another dimension to the plot.
Demonstration
* The script below plots a scatter diagram of grades range vs grades of boys and girls in two
different colors.
Python Flask
* Flask is a web framework, it’s a Python module that lets you develop web applications easily.
* Flask is a web application framework written in Python. It was developed by Armin Ronacher,
who led a team of international Python enthusiasts called Poocco. Flask is based on the Werkzeg
WSGI toolkit and the Jinja2 template engine. Both are Pocco projects.
* It’s has a small and easy-to-extend core: it’s a microframework that doesn’t include an ORM
(Object Relational Manager) or such features.
* It does have many cool features like url routing, template engine. It is a WSGI web app
framework.
Flask Components
* WSGI
The Web Server Gateway Interface (Web Server Gateway Interface, WSGI) has been used as a
standard for Python web application development. WSGI is the specification of a common interface
between web servers and web applications.
* Werkzeug
Werkzeug is a WSGI toolkit that implements requests, response objects, and utility functions. This
enables a web frame to be built on it. The Flask framework uses Werkzeg as one of its bases.
* jinja2
jinja2 is a popular template engine for Python. A web template system combines a template with a
specific data source to render a dynamic web page.
* It then starts a web server which is available only on your computer. In a web browser open
localhost on port 5000 (the url) and you’ll see “Hello World” show up.
* It’s a microframework, but that doesn’t mean your whole app should be inside one single Python
file. You can and should use many files for larger programs, to handle complexity.
Micro means that the Flask framework is simple but extensible