0% found this document useful (0 votes)
9 views

pandas_notes

The document provides an introduction to the Pandas library, covering its installation, key data structures (Series and DataFrame), and various functionalities such as data manipulation, cleaning, aggregation, and visualization. It includes examples of creating Series and DataFrames, handling missing values, filtering data, and performing operations like merging and grouping. Additionally, it discusses working with time series data and offers basic plotting techniques for data visualization.

Uploaded by

ranjeet verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

pandas_notes

The document provides an introduction to the Pandas library, covering its installation, key data structures (Series and DataFrame), and various functionalities such as data manipulation, cleaning, aggregation, and visualization. It includes examples of creating Series and DataFrames, handling missing values, filtering data, and performing operations like merging and grouping. Additionally, it discusses working with time series data and offers basic plotting techniques for data visualization.

Uploaded by

ranjeet verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1.

Introduction to Pandas
Pandas is an open-source Python library for
data manipulation and analysis.
You can use Pandas to:

See who bought the most


Add a new column: "Did they tip?"
Save your customer list to a file
Find out the average number of lemonades bought
And you don’t need to count with fingers —
Pandas does it with just one line of code!
Install and Import:

pip install pandas

import pandas as pd

pd.__version__

Key Data Structures:


Series = one list

DataFrame = full table

2. Pandas Series
Create Series:

sales = pd.Series([2, 3, 5])

Indexing and Slicing:

sales[1]

sales[0:2]

Attributes:

sales.index

sales.values

sales.dtype
Methods:

sales.head()

sales.tail()

sales.sort_values()

sales.count()

sales.isnull()

Operations:

sales + 1

sales * 2

Missing Values:

sales_with_missing = pd.Series([2, None, 5])

sales_with_missing.isnull()

sales_with_missing.fillna(0)

3. Pandas DataFrame
Create DataFrame:

data = pd.DataFrame({

'Name': ['Anna', 'Ben'],

'Lemonades': [2, 3]

})

s1 = pd.Series(data=[1,2,3,4,5])

s2 = pd.Series(data=[10,20,30,40,50])

si = pd.Series(data=[100,200,300,400,500],index=list('abcde'))

data = pd.DataFrame({'C1':s1, 'C2':s2})

Attributes:
data.shape

data.columns

data.dtypes

Accessing Data:

# Using labels

data.loc[1,'C1']

# Using index

data.iloc[1,0]

Methods:

data.head()

data.tail()

data.info()

data.describe()

data.sort_values()

data.count()

data.isnull()

Filtering:

data[data['Lemonades'] > 2]

Adding, Updating, Deleting:

data['Tips'] = [1, 2]

data.loc[data['Name'] == 'X', 'Y'] = 6

data.rename(columns={'Name':'NAMEE'})

del data['Tips']

data = data.drop(columns=['Lemonades'])
4. Indexing and Selection
Set Index:

data.set_index('Name', inplace=True)

Reset Index:

data.reset_index(inplace=True)

MultiIndex Example:

multi_data = pd.DataFrame({

'Park': ['A', 'A', 'B', 'B'],

'Name': ['Anna','Ben','Chris','Daisy'],

'Lemonades': [2, 3, 4, 5]

}).set_index(['Park', 'Name'])

Selecting with Conditions:

multi_data[multi_data['Lemonades'] > 3]

Query Method:

multi_data.query('Lemonades > 3')


5. Working with Data
data.to_csv('lemonade_sales.csv')

data = pd.read_csv('lemonade_sales.csv')

dictt = {'x': [1, 2, 3], 'y': np.array([10, 20, 30]), 'z': 50}

pd.DataFrame(d)

listt = [[1, 2, 100], [2, 4, 100], [3, 8, 100]]

pd.DataFrame(listt, columns=['x', 'y', 'z'])

narr = np.array([[1, 2, 100], [2, 4, 100], [3, 8, 100]])

pd.DataFrame(narr, columns=['x', 'y', 'z'])

6. Data Cleaning and Preprocessing


Handling Missing Data:

data.dropna()

data.fillna(0)

data.isnull()

Remove Duplicates:

data.drop_duplicates()

Replace Values:

data.replace('lemondae', 'lemonade')

Type Conversion:

data['Lemonades'] = data['Lemonades'].astype(int)

String Operations:
data['Name'].str.upper()

7. Data Aggregation and Grouping


GroupBy Example:

grouped = data.groupby('Name')

grouped.sum()

Aggregations:

grouped['Lemonades'].mean()

grouped['Lemonades'].min()

grouped['Lemonades'].max()

Pivot Tables:

data.pivot_table(values='Lemonades', index='Name')

Crosstab:

pd.crosstab(data['Name'], data['Lemonades'])

8. Merging, Joining and Concatenating


Concatenation:

pd.concat([data1, data2])

Append:

data1.append(data2)

Merging:

pd.merge(data1, data2, on='Name')

Join on Index:
data1.join(data2.set_index('Name'), on='Name')

9. Working with Time Series


DateTime Index:

data['Date'] = pd.to_datetime(data['Date'])

data.set_index('Date', inplace=True)

Converting Strings to Dates:

pd.to_datetime('2025-04-29')

Resampling:

data.resample('W').sum()

Shifting/Lagging:

data['Previous Lemonades'] = data['Lemonades'].shift(1)

Rolling Calculations:

data['Rolling Average'] = data['Lemonades'].rolling(window=2).mean()

10. Data Visualization


Basic Plotting:

data['Lemonades'].plot()

Different ‘Kind’ of Plots:


data['Lemonades'].plot(kind='line')

data['Lemonades'].plot(kind='bar')

data['Lemonades'].plot(kind='barh')

data['Lemonades'].plot(kind='hist')

data['Lemonades'].plot(kind='box')

data['Lemonades'].plot(kind='scatter') #provide x&y axis data

data['Lemonades'].plot(kind='area')

data['Lemonades'].plot(kind='pie')

Customizations:

data['Lemonades'].plot(title='Lemonade Sales Over Time')

Pandas Notebook : Download here

You might also like