IIT_FDS_Assignment1
IIT_FDS_Assignment1
In the retail industry, data plays a crucial role in decision-making and strategic planning.
Companies rely on data from various sources, such as sales transactions, customer
feedback, and inventory management systems, to understand market trends, customer
preferences, and operational efficiency. Effective data management and analysis can
provide insights that lead to improved customer satisfaction, optimized inventory levels,
and increased profitability.
2. Content
This assignment focuses on key concepts related to data and databases, including the types
of data and attributes. You will learn how to import and export data using Python, load data
from various formats such as CSV, Excel, JSON, and HTML, and perform descriptive statistics
and data cleaning operations. These tasks will be integrated into a comprehensive analysis
of a retail dataset to simulate a real-world industry scenario.
3. Data Description
We will use a publicly available retail dataset for this assignment. The dataset contains
information about sales transactions, including the following attributes:
InvoiceNo: Invoice number
StockCode: Product code
Description: Product description
Quantity: Quantity of products sold
InvoiceDate: Date of the invoice
UnitPrice: Price per unit of the product
CustomerID: Customer identification number
Country: Country where the customer resides
You can download the dataset from the following link:
Retail Dataset: https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online
%20Retail.xlsx
4 Objective
The objective of this assignment is to gain hands-on experience with data analysis and
cleaning using Python. By the end of this assignment, you will be able to:
1. Identify types of data and attributes in a dataset.
2. Import and export data using Python.
3. Perform descriptive statistics to understand the dataset.
4. Clean the dataset by handling missing values, duplicate entries, and outliers.
5 Tasks
3. Descriptive Statistics
Calculate and interpret mean, median, mode, variance, standard deviation, skewness, and
correlation for the numeric attributes in the dataset.
4. Data Cleaning
Handle missing values by applying appropriate techniques such as imputation or removal.
Identify and remove duplicate entries in the dataset.
Detect and handle outliers using statistical methods.