0% found this document useful (0 votes)
2 views

Data+Analysis+Project+on+Customer+Purchases+Dataset

This data analysis project focuses on exploratory data analysis (EDA) of a customer purchases dataset from Kaggle, specifically the 'Online Retail' dataset. The project aims to extract insights using Python and libraries like Pandas, Matplotlib, and Seaborn, with initial steps including data loading, cleaning, and visualizations of sales distribution and top-selling products. Key analyses include total sales per customer and identifying the top 10 selling products.

Uploaded by

Abhiuday Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Data+Analysis+Project+on+Customer+Purchases+Dataset

This data analysis project focuses on exploratory data analysis (EDA) of a customer purchases dataset from Kaggle, specifically the 'Online Retail' dataset. The project aims to extract insights using Python and libraries like Pandas, Matplotlib, and Seaborn, with initial steps including data loading, cleaning, and visualizations of sales distribution and top-selling products. Key analyses include total sales per customer and identifying the top 10 selling products.

Uploaded by

Abhiuday Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Data Analysis Project on Customer Purchases Dataset

In this data analysis project, we will perform exploratory data analysis (EDA) on a customer
purchases dataset obtained from the Kaggle platform. The dataset contains information
about customer transactions, including purchase amounts, product categories, and
customer demographics. Our goal is to extract meaningful insights from the data using
Python and popular data analysis libraries.

Dataset:
For this project, we will use the "Online Retail" dataset available on Kaggle. This dataset
includes transactional data of an online retail store and features attributes such as
InvoiceDate, CustomerID, Quantity, UnitPrice, and Product Description.

Tools and Libraries:


We will utilize the following tools and libraries for this project:
Python programming language
Jupyter Notebook
Pandas: Data manipulation and analysis library
Matplotlib and Seaborn: Data visualization libraries

Code and Analysis:


The code provided below demonstrates the initial steps of an exploratory data analysis on
the "Online Retail" dataset. It loads the dataset, displays basic information, and performs
simple data cleaning. It then proceeds to create visualizations of the distribution of total
sales per customer and the top selling products. Remember to replace the dataset file
path with the actual path to the downloaded "Online Retail.xlsx" file from the Kaggle
dataset.

# Importing necessary libraries


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset


data = pd.read_excel( “Online Retail.xlsx” )

# Display basic information about the dataset


print( data.info( ) )

# Display the first few rows of the dataset


print( data.head( ) )
# Summary statistics of numerical variables
print( data.describe( ) )

# Data cleaning: Removing missing values


data.dropna( inplace=True )

# Exploratory Data Analysis

# Total sales per customer


sales_per_customer = data.grouphy( ‘CustomerID’ ) [ ‘Unitprice’ ].sum( )

# Plotting a histogram of sales per customer


plt.figure( figsize=( 10, 6 ) )
sns.histplot( sales_per_customer, bins=50, kde=True )
plt.title( ‘Distribution of Total sales per Customer’ )
plt.xlabel( ‘Total Sales’ )
plt.ylabel( ‘Frequency’ )
plt.show( )

# Top selling products


top_products = data[ ‘Description’ ].value_counts( ).head( 10 )

# Plotting a bar chart of top selling products


plt.figure( figsize=( 12, 8 ) )
top_products.plot( kind=’barh’, color=’skyblue’ )
plt.title( ‘Top 10 Selling Products’ )
plt.xlabel( ‘Number of Units Sold’ )
plt.ylabel( ‘Product Description’ )
plt.show( )

You might also like