Rape cases data analysis
Rape cases data analysis
PUBLIC SCHOOL
PROJECT REPORT ON
ANALYSIS OF RAPE CASES IN INDIA
SUBMITTED TO:
Mrs. Paramjeet Kaur
BY:
Raunak Singh (12 C)
Roll No: 11
Mayank Vij (12 C)
Roll No: 09
1
INFORMATICS
PRACTICES
PROJECT ON
ANALYSIS OF RAPE CASES
IN INDIA
2
TABLE OF CONTENTS
SNO CONTANT PAGE NO.
1 ACKNOWLEDGEMENT 4
2 CERTIFICATE 5
3 INTRODUCTION 6
6 ABOUT MATPLOTLIB 9
8 SOURCE CODE 11
9 OUTPUT 16
10 FEATURES 25
11 BIBLIOGRAPHY 27
3
ACKNOWLEDGEMENT
We are extremely grateful and remain deeply
indebted to our guide, Mrs. Paramjeet Kaur, for being a
constant source of inspiration and support throughout
the design, implementation, and evaluation of this
project. We sincerely thank her for his invaluable
suggestions, which greatly benefited us during the
development of our project on “ANALYSIS OF RAPE CASES IN
INDIA”
4
CERTIFICATE
________________________
Mrs. Paramjeet Kaur
Guru harkrishan public school
5
INTRODUCTION
Rape is a critical social issue that has significant consequences for
individuals and communities alike. In India, addressing the prevalence
of rape and understanding its trends are crucial steps toward creating a
safer society. This project, titled "Analysis of Rape Cases in India,"
aims to provide a comprehensive exploration of the patterns, trends,
and factors associated with reported rape cases across different states
and regions of India.
DEVELOPMENT ENVIRONMENT
Software Used: -
1. Python 3.8.6
2. Microsoft Excel 2019
3. Windows 10 Pro (64-bit operating system)
4. Microsoft Word 2019
5. Snipping Tool for Screenshots
Online Help: -
1. Google images
Book Reference: -
1. Informatics Practices Notes by School Teacher
2. Informatics Practices Book by Preeti Arora
Hardware Used: -
1. 12 GB RAM
2. Intel(R) Core (TM) i3-7020U CPU @ 2.30GHz 2.30
GHz
3. 1 TB Hard Disk
7
ABOUT PANDAS
8
preprocess their data, which is crucial for any data
analysis or machine learning task.
Pandas also integrates seamlessly with other popular
libraries like Matplotlib for data visualization and Scikit-
learn for machine learning, making it an essential part of
the Python data science ecosystem. Its flexibility and
readability make it ideal for both beginners and
experienced programmers, enabling them to handle
complex data operations with concise and readable code.
ABOUT MATPLOTLIB
FILE 1 : rape_cases_india.csv
10
FRONTEND DETAILS
import pandas as pd
import matplotlib.pyplot as plt
11
while True:
print('\n\tA N A L Y S I S O F R A P E C A S E S I N I N D I A')
print("WHAT DO YOU WANT TO DO?")
print("1. DISPLAY DATAFRAME")
print("2. ANALYZE CASES BY STATE")
print("3. ANALYZE CASES BY YEAR")
print("4. PLOT AGE-WISE DISTRIBUTION")
print("5. DISPLAY STATISTICAL SUMMARY")
print("6. PLOT TOP STATES WITH HIGHEST CASES")
print("7. PLOT CASES PER 100,000 POPULATION BY STATE")
print("8. ANALYZE TREND OF CASES BY REGION")
print("9. PLOT VICTIM EDUCATION LEVEL DISTRIBUTION")
print("10. EXIT")
if choice == 1:
df = pd.read_csv("rape_cases_india.csv")
print(df)
elif choice == 2:
df = pd.read_csv("rape_cases_india.csv")
state_counts = df['State'].value_counts()
plt.bar(state_counts.index, state_counts.values, color='skyblue')
plt.title('Cases by State')
plt.xlabel('State')
plt.ylabel('Number of Cases')
plt.xticks(rotation=45)
plt.show()
12
elif choice == 3:
df = pd.read_csv("rape_cases_india.csv")
year_counts = df['Year'].value_counts().sort_index()
plt.plot(year_counts.index, year_counts.values, marker='o', linestyle='-', color='red')
plt.title('Cases by Year')
plt.xlabel('Year')
plt.ylabel('Number of Cases')
plt.grid(True)
plt.show()
elif choice == 4:
df = pd.read_csv("rape_cases_india.csv")
age_distribution = df['Age Group'].value_counts()
plt.bar(age_distribution.index, age_distribution.values, color='lightgreen')
plt.title('Age-wise Distribution')
plt.xlabel('Age Group')
plt.ylabel('Number of Cases')
plt.xticks(rotation=45)
plt.show()
elif choice == 5:
df = pd.read_csv("rape_cases_india.csv")
print("\nStatistical Summary")
print("Total Cases: ", len(df))
print("Cases by Year:\n", df['Year'].value_counts().sort_index())
print("Cases by State:\n", df['State'].value_counts())
print("Cases by Age Group:\n", df['Age Group'].value_counts())
13
elif choice == 6:
df = pd.read_csv("rape_cases_india.csv")
state_counts = df['State'].value_counts().head(10)
plt.bar(state_counts.index, state_counts.values, color='purple')
plt.title('Top States with Highest Cases')
plt.xlabel('State')
plt.ylabel('Number of Cases')
plt.xticks(rotation=45)
plt.show()
elif choice == 7:
df = pd.read_csv("rape_cases_india.csv")
if 'Population' in df.columns:
df['Cases per 100k'] = (df['Total Cases'] / df['Population']) * 100000
cases_per_100k = df.groupby('State')['Cases per
100k'].mean().sort_values(ascending=False)
plt.bar(cases_per_100k.index, cases_per_100k.values, color='orange')
plt.title('Cases per 100,000 Population by State')
plt.xlabel('State')
plt.ylabel('Cases per 100,000')
plt.xticks(rotation=45)
plt.show()
else:
print("Population data not available in the dataset.")
elif choice == 8:
df = pd.read_csv("rape_cases_india.csv")
if 'Region' in df.columns:
region_trends = df.groupby(['Year', 'Region'])['Total Cases'].sum().unstack()
14
region_trends.plot(marker='o')
plt.title('Trend of Cases by Region')
plt.xlabel('Year')
plt.ylabel('Total Cases')
plt.grid(True)
plt.legend(title='Region')
plt.show()
else:
print("Region data not available in the dataset.")
elif choice == 9:
df = pd.read_csv("rape_cases_india.csv")
if 'Victim Education Level' in df.columns:
education_distribution = df['Victim Education Level'].value_counts()
plt.bar(education_distribution.index, education_distribution.values,
color='lightblue')
plt.title('Victim Education Level Distribution')
plt.xlabel('Education Level')
plt.ylabel('Number of Cases')
plt.xticks(rotation=45)
plt.show()
else:
print("Victim education level data not available in the dataset.")
else:
15
print("WRONG INPUT!! Please enter a number between 1 and 10.")
16
SCREEN SHOTS OF EXECUTION
17
18
19
20
21
22
FEATURES
Display Complete Dataframe:
Allows users to view the entire dataset in tabular format for detailed inspection.
Visualizes the number of cases reported in each state using bar charts.
Helps identify states with higher or lower case counts.
Statistical Summary:
Provides key statistics such as total cases, cases by year, state, and age group.
Offers a quick summary of the dataset.
Identifies and plots the top 10 states with the highest number of reported cases.
Useful for prioritizing regions for further study.
Normalizes case data by state population and calculates cases per 100,000 people.
Facilitates fair comparisons between states with varying population sizes.
23
Shows trends of reported cases in different geographical regions over the years.
Useful for observing regional patterns and changes.
Employs various types of plots (bar charts, line charts, histograms) to make data insights visually
appealing and easy to understand.
New features or additional columns in the dataset can be easily integrated into the program.
24
BIBLIOGRAPHY
BOOKS:
WEBSITES:
www.geeksforgeeks.org
https://docs.python.org/3/
https://www.w3schools.com/python/
25