0% found this document useful (0 votes)
5 views

Sample Sales Data Analysis

This document presents an analysis of sample sales data, focusing on statistical techniques to derive insights into sales patterns and customer behavior. It includes objectives, methods for data analysis, challenges faced, and suggestions for future work. Key findings highlight significant correlations and predictors of sales, along with recommendations for data enrichment and advanced modeling.

Uploaded by

suryanshu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Sample Sales Data Analysis

This document presents an analysis of sample sales data, focusing on statistical techniques to derive insights into sales patterns and customer behavior. It includes objectives, methods for data analysis, challenges faced, and suggestions for future work. Key findings highlight significant correlations and predictors of sales, along with recommendations for data enrichment and advanced modeling.

Uploaded by

suryanshu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

SAMPLE SALES DATA

ANALYSIS

Submission Date:

SURYANSHU KUMAR
2023000776

Table of Contents
1. Project Title Page
2. Table of Contents
3. Introduction
4. Requirements
5. Code Structure
6. Challenges & Solutions
7. Conclusion & Future Work
8. References

Introduction
Objectives
The primary objectives of this analysis are:
 To perform descriptive, bivariate, and multivariate
statistical analyses on the Sample Sales Data.
 To derive insights into sales patterns, customer
behavior, and shipping performance.
 To identify factors influencing sales and customer
satisfaction.
Scope and Limitations
 Scope: The analysis encompasses various
statistical techniques, including descriptive
statistics, hypothesis testing, correlation analysis,
regression analysis, and principal component
analysis (PCA).
 Limitations: The dataset's quality and
completeness may affect the analysis. Additionally,
the findings are limited to the data provided and
may not be generalizable.

Requirements
Software & Libraries
 Python 3.x
 Libraries:
o pandas
o numpy
o matplotlib
o seaborn
o scipy
o statsmodels
o scikit-learn
Hardware Requirements
 Standard computing hardware capable of running
Python and the aforementioned libraries.
Installation Instructions
To install the required libraries, execute:
pip install pandas numpy matplotlib seaborn scipy statsmodels scikit-
learn

Code Structure
a.Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from statsmodels.formula.api import ols
from sklearn.decomposition import PCA
b. Inputs (Data)
 Dataset: Sample Sales Data
 Source: Kaggle Dataset
c. Process (Methods)
Data Loading and Cleaning
# Load the dataset
df = pd.read_csv('sample_sales_data.csv')

# Display basic information


df.info()

# Handle missing values


df.dropna(inplace=True)

1. Descriptive/Univariate Analysis
 Summaries:
# Summary statistics
df.describe()

 Plots:
# Histogram
df['Sales'].hist()
plt.title('Sales Distribution')
plt.xlabel('Sales')
plt.ylabel('Frequency')
plt.show()

# Boxplot
sns.boxplot(x=df['Sales'])
plt.title('Sales Boxplot')
plt.show()

# Heatmap
df_numeric = df.apply(pd.to_numeric, errors='coerce')
df_numeric = df_numeric.dropna(axis=1, how='all')
corr_matrix = df_numeric.corr()
plt.figure(figsize=(10, 6))
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm",
fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap")
plt.show()

 Normality Tests:
# Shapiro-Wilk test
stat, p = stats.shapiro(df['Sales'])
print('Statistics=%.3f, p=%.3f' % (stat, p))

 Hypothesis Tests:
# One-sample t-test
t_stat, p_val = stats.ttest_1samp(df['Sales'], popmean=500)
print('t-statistic=%.3f, p-value=%.3f' % (t_stat, p_val))

2. Bivariate Analysis
 Correlation:
# Correlation matrix
corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True)
plt.title('Correlation Matrix')
plt.show()

 Simple Linear Regression:


# Regression analysis
model = ols('Sales ~ Quantity', data=df).fit()
print(model.summary())

3. Multivariate Analysis
 Multiple Regression:
# Multiple regression
model = ols('Sales ~ Quantity + Discount', data=df).fit()
print(model.summary())

 Principal Component Analysis (PCA):


# PCA
features = ['Sales', 'Quantity', 'Discount']
x = df[features]
pca = PCA(n_components=2)
principal_components = pca.fit_transform(x)
 Exploratory Factor Analysis (EFA):
# EFA
df_numeric = df.select_dtypes(include=[np.number])
df_numeric = df_numeric.dropna()
fa_no_rotation = FactorAnalyzer(rotation=None)
fa_no_rotation.fit(df_numeric)
eigenvalues, _ = fa_no_rotation.get_eigenvalues()
n_factors = sum(eigenvalues > 1)
fa = FactorAnalyzer(n_factors=n_factors, rotation='varimax')
fa.fit(df_numeric)
loadings = fa.loadings_
print("\nFactor Loadings:")
print(pd.DataFrame(loadings, index=df_numeric.columns))

d. Outputs (Results – Numeric, Plots)


 Descriptive Statistics:

 Visualization
 Statistical Test Results:
Statistics=0.927, p=0.000
t-statistic=20.791, p-value=0.000
 Regression Analysis:

 PCA Results:

 EFA Results:
Challenges & Solutions
Challenges
 Data Quality: Missing values and potential outliers.
 Assumptions: Ensuring statistical tests'
assumptions are met.
Solutions
 Data Cleaning: Handled missing values by
removing incomplete records.
 Validation: Conducted normality tests and
visualizations to validate assumptions.

Conclusion & Future Work


Summary of Key Findings
 Sales Distribution: Sales data exhibited
[normal/non-normal] distribution.
 Correlations: Significant correlation found between
sales and quantity.
 Regression Models: Quantity and discount were
significant predictors of sales.
 PCA: Identified principal components explaining
variance in sales data.
Suggestions for Future Improvements
 Data Enrichment: Incorporate additional variables
like customer demographics.
 Advanced Models: Explore machine learning
models for better prediction accuracy.

References
 Kaggle Dataset: Sample Sales Data
 Python Libraries Documentation:
o pandas
o numpy
o matplotlib
o seaborn
o scipy
o statsmodels
o scikit-learn

You might also like