pandas_notes
pandas_notes
Introduction to Pandas
Pandas is an open-source Python library for
data manipulation and analysis.
You can use Pandas to:
import pandas as pd
pd.__version__
2. Pandas Series
Create Series:
sales[1]
sales[0:2]
Attributes:
sales.index
sales.values
sales.dtype
Methods:
sales.head()
sales.tail()
sales.sort_values()
sales.count()
sales.isnull()
Operations:
sales + 1
sales * 2
Missing Values:
sales_with_missing.isnull()
sales_with_missing.fillna(0)
3. Pandas DataFrame
Create DataFrame:
data = pd.DataFrame({
'Lemonades': [2, 3]
})
s1 = pd.Series(data=[1,2,3,4,5])
s2 = pd.Series(data=[10,20,30,40,50])
si = pd.Series(data=[100,200,300,400,500],index=list('abcde'))
Attributes:
data.shape
data.columns
data.dtypes
Accessing Data:
# Using labels
data.loc[1,'C1']
# Using index
data.iloc[1,0]
Methods:
data.head()
data.tail()
data.info()
data.describe()
data.sort_values()
data.count()
data.isnull()
Filtering:
data[data['Lemonades'] > 2]
data['Tips'] = [1, 2]
data.rename(columns={'Name':'NAMEE'})
del data['Tips']
data = data.drop(columns=['Lemonades'])
4. Indexing and Selection
Set Index:
data.set_index('Name', inplace=True)
Reset Index:
data.reset_index(inplace=True)
MultiIndex Example:
multi_data = pd.DataFrame({
'Name': ['Anna','Ben','Chris','Daisy'],
'Lemonades': [2, 3, 4, 5]
}).set_index(['Park', 'Name'])
multi_data[multi_data['Lemonades'] > 3]
Query Method:
data = pd.read_csv('lemonade_sales.csv')
dictt = {'x': [1, 2, 3], 'y': np.array([10, 20, 30]), 'z': 50}
pd.DataFrame(d)
data.dropna()
data.fillna(0)
data.isnull()
Remove Duplicates:
data.drop_duplicates()
Replace Values:
data.replace('lemondae', 'lemonade')
Type Conversion:
data['Lemonades'] = data['Lemonades'].astype(int)
String Operations:
data['Name'].str.upper()
grouped = data.groupby('Name')
grouped.sum()
Aggregations:
grouped['Lemonades'].mean()
grouped['Lemonades'].min()
grouped['Lemonades'].max()
Pivot Tables:
data.pivot_table(values='Lemonades', index='Name')
Crosstab:
pd.crosstab(data['Name'], data['Lemonades'])
pd.concat([data1, data2])
Append:
data1.append(data2)
Merging:
Join on Index:
data1.join(data2.set_index('Name'), on='Name')
data['Date'] = pd.to_datetime(data['Date'])
data.set_index('Date', inplace=True)
pd.to_datetime('2025-04-29')
Resampling:
data.resample('W').sum()
Shifting/Lagging:
Rolling Calculations:
data['Lemonades'].plot()
data['Lemonades'].plot(kind='bar')
data['Lemonades'].plot(kind='barh')
data['Lemonades'].plot(kind='hist')
data['Lemonades'].plot(kind='box')
data['Lemonades'].plot(kind='area')
data['Lemonades'].plot(kind='pie')
Customizations: