Ch No 3 Advance Python
Ch No 3 Advance Python
Summary: This unit will introduce students to the fundamentals/ basics of Python
programming language, its history, evolution, operators, variables, constants, lists,
strings, iterative and select statements. Students will explore three essential Python
libraries: NumPy, Pandas, and Scikit-learn. Students will learn how Python is used to
create programs. They will also learn how to use NumPy for numerical computing,
Pandas for data manipulation and analysis, and Scikit-learn for implementing machine
learning algorithms.
Learning Objectives:
Students will be able to
Key concepts:
1. Basics of python programming language
2. Understanding of character sets, tokens, modes, operators and data types
3. Control Statements
4. CSV Files
5. Libraries NumPy, Pandas, Scikit-learn
Learning Outcomes:
Students will be able to
1. Explain the basics of python programming language and write programs with basic
concepts of tokens.
2. Use selective and iterative statements effectively.
3. Gain practical knowledge on how to use the libraries efficiently.
Features of Python
Python Editors
There are various editors and Integrated Development Environments (IDEs) that you
can use to work with Python. Some popular options are PyCharm, Spyder, Jupyter
Notebook, IDLE etc. Let us look how we can work with Jupyter Notebook.
Jupyter Notebook is an open-source web application that allows you to create and share
documents containing live code, equations, visualizations, and narrative text. It's widely
used in data science and research. It can be installed using Anaconda or with pip.
For more details of installation use the link
https://docs.jupyter.org/en/latest/install/notebook-classic.html
Those who are familiar with Python, open the command prompt in administrative mode and
type
pip install notebook
To run the notebook, Open the command prompt and type
jupyter notebook
https://www.studytrigger.com/wp-content/uploads/2022/08/Tokens-in-Python.jpg
Keywords
Reserved words used for special purpose. List of keywords are given below.
Identifier
An identifier is a name used to identify a variable, function, class, module or other
object. Generally, keywords (list given above) are not used as variables. Identifiers cannot
start
Literals:
Literals are the raw data values that are explicitly specified in a program. Different
types of Literals in Python are String Literal, Numeric Literal (Numbers), Boolean Literal
(True & False), Special Literal (None) and Literal Collections.
Operators:
Operators are symbols or keywords that perform operations on operands to produce a
result. Python supports a wide range of operators:
Punctuators:
Common punctuators in Python include
: ( ) [ ] { } , ; . ` ' ' " " / \ & @ ! ? | ~ etc.
Example
output
Tokens in the above program are given below
In the above program
Sample Program-1
- on the screen
Sample Program-2
Write a program to calculate the area of a rectangle given the length and breadth are 50
and 20 respectively.
Data Types:
Data types are the classification or categorization of data items. It represents the
kind of value that tells what operations can be performed on a particular data. Python
supports Dynamic Typing. A variable pointing to a value of certain data type can be made to
point to a value/object of another data type. This is called Dynamic Typing.
The following are the standard or built-in data types in Python:
Data Type Description
Integer Stores whole number a=10
Boolean is used to represent the truth values of the Result = True
Boolean
expressions. It has two values True & False
Floating point Stores numbers with fractional part x=5.5
Complex Stores a number having real and imaginary part num=a+bj
Immutable sequences (After creation values cannot
String be changed in-place)
Stores text enclosed in single or double quotes
Mutable sequences (After creation values can be
changed in-place)
List
Stores list of comma separated values of any data
type between square [ ]
Immutable sequence (After creation values cannot
be changed in-place)
Tuple
Stores list of comma separated values of any data
type between parentheses ( )
Set is an unordered collection of values, of any type, s = { 25, 3, 3.5}
Set
with no duplicate entry.
Dictionary
Sample Program-3
Write a program to read name and marks of a student and display the total mark.
output
In the above example float( ) is used to convert the datatype into floating point. The explicit
conversion of an operand to a specific type is called type casting.
Control flow statements in Python
Till now, the programs you've created have followed a basic, step-by-step progression,
where each statement executes in sequence, every time. However, there are many
practical programs where we have to selectively execute specific sections of the code or
iterate over parts of the program. This capability is achieved through selective statements
and looping statements.
Selection Statement
The if/ if..else statement evaluates test expression and the statements written below
will execute if the condition is true otherwise the statements below else will get executed.
Indentation is used to separate the blocks.
Syntax:
-else statements
Sample Program-4
Asmita with her family went to a restaurant. Determine the choice of food according to the
options she chooses from the main menu.
Case 1: All Members are vegetarians. They prefer to have veg food. No other options.
(menu-veg)
Program & Output
Case 2: Family Members may choose non-vegetarian foods also if veg foods are not
available. (menu-veg/Nonveg)
Sample Program-5
Write a program to get the length of the sides of a triangle and determine whether it is
equilateral triangle or isosceles triangle or scalene triangle,
Looping Statements
Looping statements in programming languages allow you to execute a block of code
repeatedly. In Python, there are mainly two types of looping statements: for loop and while
loop.
For loop
For loop iterates through a portion of a program based on a sequence, which is an ordered collection
of items.
The keyword is used to start the loop. The loop variable takes on each value in the specified
sequence (e.g., list, string, range). The colon (:) at the end of the for statement indicates the start of
the loop body. The statements within the loop body are executed for each iteration. Indentation is
used to define the scope of the loop body. All statements indented under the for statement are
considered part of the loop. It is advisable to utilize a for loop when the exact number of iterations
is known in advance.
Syntax
for <control-variable> in <sequence/items in range>:
<statements inside body of the loop>
Example -1 Example-2
The for loop iterates over each item in the sequence until it reaches the end of the sequence
or until the loop is terminated using a break statement. It's a powerful construct for iterating
over collections of data and performing operations on each item.
Sample Program-6
Write a program to display even numbers and their squares between 100 and 110.
Sample Program-7
Write a program to read a list, display each element and its type. (use type( ) to display the
data type.)
Sample Program-8
Write a program to read a string. Split the string into list of words and display each word.
Sample Program-9
Write a simple program to display the values stored in dictionary
CSV files are delimited files that store tabular data (data stored in rows and columns). It
looks similar to spread sheets, but internally it is stored in a different format. In csv file,
values are separated by comma. Data Sets used in AI programming are easily saved in csv
format. Each line in a csv file is a data record. Each record consists of more than one
fields(columns). The csv module of Python provides functionality to read and write tabular
data in CSV format.
Let us see an example of opening, reading and writing formats for a file student.csv with
file object file. student.csv contains the columns rollno, name and mark.
importing library import csv
Opening in reading mode file= open(
Opening in writing mode
closing a file file.close( )
writing rows wr=csv.writer(file)
480] )
Reading rows details = csv.reader(file )
for rec in details:
print(rec)
Sample Program-10
Write a Program to open a csv file students.csv and display its details
INTRODUCING LIBRARIES
In Python, functions are organized within libraries similar to how library books are arranged
by subjects such as physics, computer science, and economics. For example, the "math"
library contains numerous functions like sqrt(), pow(), abs(), and sin(), which facilitate
mathematical operations and calculations. To utilize a library in a program, it must be
imported. For example, if we wish to use the sqrt() function in our program, we include the
statement "import math". This allows us to access and utilize the functionalities provided
by the math library.
Python offers a vast array of libraries for various purposes, making it a versatile language for
different domains such as web development, data analysis, machine learning, scientific
computing, and more. Now, let us explore some libraries that are incredibly valuable in the
realm of Artificial Intelligence.
NUMPY
NumPy, which stands for Numerical Python, is a powerful library in Python used for
numerical computing. It is a general-purpose array-processing package. NumPy provides
the ndarray (N-dimensional array) data structure, which represents arrays of any
dimension. These arrays are homogeneous (all elements are of the same data type) and can
contain elements of various numerical types (integers, floats, etc.)
Where and why do we use the NumPy library in Artificial Intelligence?
Suppose you have a dataset containing exam scores of students in various subjects, and you want
to perform some basic analysis on this data. You can utilize NumPy arrays to store exam scores
for different subjects efficiently. With NumPy's array operations, you can perform various
calculations such as calculating average scores for each subject, finding total scores for each
student, calculating the overall average score across all subjects, identifying the highest and
lowest scores. NumPy's array operations streamline these computations, making them both
efficient and convenient. This makes NumPy an indispensable tool for data manipulation and
analysis in data science applications.
PANDAS
The name "Pandas" has a reference to both "Panel Data", and "Python Data
Analysis Pandas is a powerful and versatile library that simplifies tasks of data
manipulation in Python . Pandas is built on top of the NumPy library which means that a lot
of structures of NumPy are used or replicated in Pandas and Pandas is particularly well-
suited for working with tabular data, such as spreadsheets or SQL tables. Its versatility and
ease of use make it an essential tool for data analysts, scientists, and engineers working
with structured data in Python.
Where and why do we use the Pandas library in Artificial Intelligence?
Suppose you have a dataset containing information about various marketing campaigns
conducted by the company, such as campaign type, budget, duration, reach, engagement metrics,
and sales performance. We use Pandas to load the dataset, display summary statistics, and
perform group-wise analysis to understand the performance of different marketing campaigns.
We then visualize the sales performance and average engagement metrics for each campaign type
using Matplotlib, a popular plotting library in Python.
Pandas provides powerful data manipulation and aggregation functionalities, making it easy to
perform complex analysis and generate insightful visualizations. This capability is invaluable in AI
and data-driven decision-making processes, allowing businesses to gain actionable insights from
their data.
train, etc.
A DataFrame is a two-dimensional labeled
data structure like a table of MySQL. It
contains rows and columns, and therefore
has both a row and column index. Each
column can have a different type of value
such as numeric, string, boolean, etc., as in
tables of a database.
Creation of DataFrame
There are several methods to create a DataFrame in Pandas, but here we will discuss two
common approaches:
Using NumPy ndarrays-
DataFrame: Result
Attributes of DataFrames
Attributes are the properties of a DataFrame that can be used to fetch data or any
information related to a particular DataFrame.
The syntax of writing an attribute is:
DataFrame_name . attribute
Let us understand the attributes of DataFrames with the help of DataFrame Teacher
DataFrame:Teacher
read_csv() is used to read the csv file with its correct path.
sep specifies whether the values are separated by comma, semicolon, tab, or any other character.
The default value for sep is a space.
The parameter header marks the start of the data to be fetched. header=0 implies that column
names are inferred from the first line of the file. By default, header=0.
Exporting a DataFrame to a CSV file
We can use the to_csv() function to save a DataFrame to a text or csv file.
For example, to save the DataFrame Teacher into csv file resultout, we should write
Teacher.to_csv(path_or_buf='C:/PANDAS/resultout.csv', sep=',')
When we open this file in any text editor or a spreadsheet, we will find the above data along
with the row labels and the column headers, separated by comma.
Scikit-learn (Learn)
Note for Teachers: This topic can be taught after teaching the Machine Learning Unit.
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python.
It provides a selection of efficient tools for machine learning and statistical modeling via a
consistent interface in Python. Sklearn is built on (relies heavily on) NumPy, SciPy and
Matplotlib. .
Key Features:
Offers a wide range of supervised and unsupervised learning algorithms.
Provides tools for model selection, evaluation, and validation.
Supports various tasks such as classification, regression, clustering, dimensionality
reduction, and more.
Integrates seamlessly with other Python libraries like NumPy, SciPy, and Pandas.
Install scikit-learn using the statement
pip install scikit-learn
load_iris (In sklearn.datasets)
The Iris dataset is a classic and widely used dataset in machine learning, particularly for
classification tasks. It comprises measurements of various characteristics of iris flowers,
such as sepal length, sepal width, petal length, and petal width, along with the
corresponding species of iris to which they belong. The dataset typically includes three
species: setosa, versicolor, and virginica.
Here, each row represents a sample (i.e., an iris flower), and each column represents a
feature (i.e., a measurement of the flower).
For example, the first row [ 5.1 3.5 1.4 0.2] corresponds to an iris flower with the
following measurements:
Sepal length: 5.1 cm
Sepal width: 3.5 cm
Petal length: 1.4 cm
Petal width: 0.2 cm
Output-
Using this model, we can identify the type of flower in the iris dataset. By analyzing the length and
width of the sepals and petals, we can compare them with the features of the setosa, versicolor, and
virginica species to determine the flower's species.
-------------------------------------------------------------------------------------------------
Tutorials
1. https://www.programiz.com/python-programming
2. https://www.analyticsvidhya.com/blog/2021/05/data-types-in-python/
3. https://www.w3schools.com/python/default.asp
4. https://www.geeksforgeeks.org/pandas-tutorial/
5. https://www.learnpython.org/en/Pandas_Basics
6. https://www.geeksforgeeks.org/python-programming-language/
7. https://scikit-learn.org/stable/tutorial/basic/tutorial.html
8. https://pandas.pydata.org/docs/user_guide/10min.html
Courses
1. https://aistudent.community/single_course/2021
2. https://www.kaggle.com/learn/pandas
3. https://www.udemy.com/course/pandas-with-python/
Step-by-Step guide for students to use the IBM Skills Build website to learn Python:
D. Practice Programs