0% found this document useful (0 votes)
17 views

IIT_FDS_Assignment1

Uploaded by

likhita A.N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

IIT_FDS_Assignment1

Uploaded by

likhita A.N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Analysis and Cleaning in a Retail

Industry Use Case


1.Context

In the retail industry, data plays a crucial role in decision-making and strategic planning.
Companies rely on data from various sources, such as sales transactions, customer
feedback, and inventory management systems, to understand market trends, customer
preferences, and operational efficiency. Effective data management and analysis can
provide insights that lead to improved customer satisfaction, optimized inventory levels,
and increased profitability.

2. Content

This assignment focuses on key concepts related to data and databases, including the types
of data and attributes. You will learn how to import and export data using Python, load data
from various formats such as CSV, Excel, JSON, and HTML, and perform descriptive statistics
and data cleaning operations. These tasks will be integrated into a comprehensive analysis
of a retail dataset to simulate a real-world industry scenario.

3. Data Description

We will use a publicly available retail dataset for this assignment. The dataset contains
information about sales transactions, including the following attributes:
InvoiceNo: Invoice number
StockCode: Product code
Description: Product description
Quantity: Quantity of products sold
InvoiceDate: Date of the invoice
UnitPrice: Price per unit of the product
CustomerID: Customer identification number
Country: Country where the customer resides
You can download the dataset from the following link:
Retail Dataset: https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online
%20Retail.xlsx
4 Objective

The objective of this assignment is to gain hands-on experience with data analysis and
cleaning using Python. By the end of this assignment, you will be able to:
1. Identify types of data and attributes in a dataset.
2. Import and export data using Python.
3. Perform descriptive statistics to understand the dataset.
4. Clean the dataset by handling missing values, duplicate entries, and outliers.

5 Tasks

1. Identify Data Types and Attributes


Load the retail dataset and identify the types of data (numeric, categorical) and types of
attributes (nominal, ordinal, interval, ratio).

2. Data Import and Export with Python


Load the dataset from different formats such as CSV, Excel, JSON, and HTML into a pandas
DataFrame.
Export the cleaned dataset to CSV and Excel formats.

3. Descriptive Statistics
Calculate and interpret mean, median, mode, variance, standard deviation, skewness, and
correlation for the numeric attributes in the dataset.

4. Data Cleaning
Handle missing values by applying appropriate techniques such as imputation or removal.
Identify and remove duplicate entries in the dataset.
Detect and handle outliers using statistical methods.

5. Implementation Using Python


Write Python code to implement the above tasks. Ensure that your code is well-
documented and includes comments explaining each step.

You might also like