0% found this document useful (0 votes)
45 views25 pages

Rape cases data analysis

The project report titled 'Analysis of Rape Cases in India' aims to explore patterns and trends in reported rape cases across various states and regions in India using a dataset with detailed information. Utilizing Python libraries such as Pandas and Matplotlib, the project provides visualizations and statistical analyses to assist policymakers and researchers in understanding the issue and developing strategies to combat it. Key features include data visualization of cases by state, year, age group, and victim education level, along with a user-friendly interface for interaction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views25 pages

Rape cases data analysis

The project report titled 'Analysis of Rape Cases in India' aims to explore patterns and trends in reported rape cases across various states and regions in India using a dataset with detailed information. Utilizing Python libraries such as Pandas and Matplotlib, the project provides visualizations and statistical analyses to assist policymakers and researchers in understanding the issue and developing strategies to combat it. Key features include data visualization of cases by state, year, age group, and victim education level, along with a user-friendly interface for interaction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

GURU HARKRISHAN

PUBLIC SCHOOL

PROJECT REPORT ON
ANALYSIS OF RAPE CASES IN INDIA
SUBMITTED TO:
Mrs. Paramjeet Kaur
BY:
Raunak Singh (12 C)
Roll No: 11
Mayank Vij (12 C)
Roll No: 09

1
INFORMATICS
PRACTICES
PROJECT ON
ANALYSIS OF RAPE CASES
IN INDIA

2
TABLE OF CONTENTS
SNO CONTANT PAGE NO.
1 ACKNOWLEDGEMENT 4

2 CERTIFICATE 5

3 INTRODUCTION 6

4 HARDWARE AND SOFTWARE 7


REQUIREMENTS
5 ABOUT PANDAS 8

6 ABOUT MATPLOTLIB 9

7 ABOUT BACKED IMPLEMENTATION(CSV) 10

8 SOURCE CODE 11

9 OUTPUT 16

10 FEATURES 25

11 BIBLIOGRAPHY 27

3
ACKNOWLEDGEMENT
We are extremely grateful and remain deeply
indebted to our guide, Mrs. Paramjeet Kaur, for being a
constant source of inspiration and support throughout
the design, implementation, and evaluation of this
project. We sincerely thank her for his invaluable
suggestions, which greatly benefited us during the
development of our project on “ANALYSIS OF RAPE CASES IN
INDIA”

Her encouragement and motivation inspired us to work


diligently, and his cooperative nature was a tremendous
help throughout this journey. Through this
acknowledgment, we would like to express our heartfelt
thanks to her for his unwavering guidance and support,
without which we would not have been able to
accomplish this project.

This project has been jointly submitted by:

Raunak Singh (12 C)


Mayank Vij (12 C)
Subject teacher Signature: ___________

4
CERTIFICATE

This is to certify that Raunak Singh and


Mayank Vij of class XII of CBSE, successfully
carried out the project work entitled "ANALYSIS
OF RAPE CASES IN INDIA”

GURU HARKRISHAN PUBLIC SCHOOL


Karol Bagh, New Delhi.
For the session 2024-2025, under my
guidance.
This project report has been prepared as a
partial fulfillment of the subject INFORMATICS
PRACTICES of Class-XII under the Central
Board Education the session 2024-2025

________________________
Mrs. Paramjeet Kaur
Guru harkrishan public school

5
INTRODUCTION
Rape is a critical social issue that has significant consequences for
individuals and communities alike. In India, addressing the prevalence
of rape and understanding its trends are crucial steps toward creating a
safer society. This project, titled "Analysis of Rape Cases in India,"
aims to provide a comprehensive exploration of the patterns, trends,
and factors associated with reported rape cases across different states
and regions of India.

The project utilizes a dataset containing detailed information about


rape cases, including attributes such as the year, state, age group of
victims, population, and the education level of victims. By leveraging
Python for data analysis and visualization, the program helps to
uncover valuable insights, such as:

 Trends in the number of reported cases over the years.


 Geographical and regional variations in the occurrence of cases.
 Distribution of cases among different age groups and education
levels.
 A comparison of cases relative to the population size of states.

This tool is designed to assist policymakers, researchers, and social


activists in making informed decisions. By identifying patterns and
trends, stakeholders can better understand the factors contributing to
the prevalence of rape and develop strategies to combat this issue
effectively. Furthermore, the program's user-friendly interface and
visualizations make it accessible to a wide audience, encouraging
6
awareness and active engagement in the fight against gender-based
violence.

Through this project, we aim to contribute to the ongoing efforts to


create a safer, more equitable society in India.

DEVELOPMENT ENVIRONMENT

Software Used: -
1. Python 3.8.6
2. Microsoft Excel 2019
3. Windows 10 Pro (64-bit operating system)
4. Microsoft Word 2019
5. Snipping Tool for Screenshots
Online Help: -
1. Google images
Book Reference: -
1. Informatics Practices Notes by School Teacher
2. Informatics Practices Book by Preeti Arora
Hardware Used: -
1. 12 GB RAM
2. Intel(R) Core (TM) i3-7020U CPU @ 2.30GHz 2.30
GHz
3. 1 TB Hard Disk

7
ABOUT PANDAS

The Pandas library in Python is a powerful, open-source


data analysis and manipulation tool, built on top of the
NumPy library. Known for its easy-to-use data structures
and high-performance capabilities, Pandas is widely used
in data science, finance, machine learning, and various
fields that involve large datasets. The library’s two
primary data structures, Series and DataFrame, allow
users to handle and analyze structured data efficiently. A
Series is a one-dimensional labeled array, while a
DataFrame is a two-dimensional table with labeled axes,
providing functionality similar to a database or an Excel
spreadsheet.
One of the key strengths of Pandas is its ability to handle
missing data and perform operations like merging,
filtering, grouping, and reshaping datasets with ease.
With functions like read_csv and read_excel, it allows
data loading from various formats, while its powerful
methods like groupby, pivot, and merge facilitate data
aggregation, transformation, and relational joins. These
features make it easier for users to clean, explore, and

8
preprocess their data, which is crucial for any data
analysis or machine learning task.
Pandas also integrates seamlessly with other popular
libraries like Matplotlib for data visualization and Scikit-
learn for machine learning, making it an essential part of
the Python data science ecosystem. Its flexibility and
readability make it ideal for both beginners and
experienced programmers, enabling them to handle
complex data operations with concise and readable code.

ABOUT MATPLOTLIB

Matplotlib is a widely used plotting library in Python,


known for its versatility and capability to create a variety
of visualizations, ranging from simple line plots to
complex multi-dimensional graphs. It provides an
interface for embedding plots in Python applications and
generating publication-quality figures. The library’s core
is the pyplot module, which offers a MATLAB-like
interface, making it beginner-friendly while maintaining
extensive functionality for advanced users.
One of the strengths of Matplotlib is its customizability.
Users can control every aspect of a plot, from titles,
labels, and legends to the color, size, and style of plot
elements. Commonly used visualizations include line
plots, bar charts, scatter plots, histograms, and pie
charts. For advanced visualizations, it supports features
like 3D plotting, subplots, and animations, providing a
comprehensive toolkit for data visualization.
9
Matplotlib works seamlessly with other libraries such as
NumPy, Pandas, and Seaborn. While Matplotlib is
excellent for creating basic visualizations, it also serves
as a foundation for more specialized libraries like
Seaborn, which builds on its capabilities for statistical
visualizations. Whether you're a beginner or a seasoned
data scientist, Matplotlib is an indispensable tool for
translating data into insights through visual storytelling.

PROJECT BACKEND IMPLEMENTATION

FILE 1 : rape_cases_india.csv

10
FRONTEND DETAILS
import pandas as pd
import matplotlib.pyplot as plt

11
while True:
print('\n\tA N A L Y S I S O F R A P E C A S E S I N I N D I A')
print("WHAT DO YOU WANT TO DO?")
print("1. DISPLAY DATAFRAME")
print("2. ANALYZE CASES BY STATE")
print("3. ANALYZE CASES BY YEAR")
print("4. PLOT AGE-WISE DISTRIBUTION")
print("5. DISPLAY STATISTICAL SUMMARY")
print("6. PLOT TOP STATES WITH HIGHEST CASES")
print("7. PLOT CASES PER 100,000 POPULATION BY STATE")
print("8. ANALYZE TREND OF CASES BY REGION")
print("9. PLOT VICTIM EDUCATION LEVEL DISTRIBUTION")
print("10. EXIT")

choice = int(input("\nENTER YOUR CHOICE: "))

if choice == 1:
df = pd.read_csv("rape_cases_india.csv")
print(df)

elif choice == 2:
df = pd.read_csv("rape_cases_india.csv")
state_counts = df['State'].value_counts()
plt.bar(state_counts.index, state_counts.values, color='skyblue')
plt.title('Cases by State')
plt.xlabel('State')
plt.ylabel('Number of Cases')
plt.xticks(rotation=45)
plt.show()
12
elif choice == 3:
df = pd.read_csv("rape_cases_india.csv")
year_counts = df['Year'].value_counts().sort_index()
plt.plot(year_counts.index, year_counts.values, marker='o', linestyle='-', color='red')
plt.title('Cases by Year')
plt.xlabel('Year')
plt.ylabel('Number of Cases')
plt.grid(True)
plt.show()

elif choice == 4:
df = pd.read_csv("rape_cases_india.csv")
age_distribution = df['Age Group'].value_counts()
plt.bar(age_distribution.index, age_distribution.values, color='lightgreen')
plt.title('Age-wise Distribution')
plt.xlabel('Age Group')
plt.ylabel('Number of Cases')
plt.xticks(rotation=45)
plt.show()

elif choice == 5:
df = pd.read_csv("rape_cases_india.csv")
print("\nStatistical Summary")
print("Total Cases: ", len(df))
print("Cases by Year:\n", df['Year'].value_counts().sort_index())
print("Cases by State:\n", df['State'].value_counts())
print("Cases by Age Group:\n", df['Age Group'].value_counts())

13
elif choice == 6:
df = pd.read_csv("rape_cases_india.csv")
state_counts = df['State'].value_counts().head(10)
plt.bar(state_counts.index, state_counts.values, color='purple')
plt.title('Top States with Highest Cases')
plt.xlabel('State')
plt.ylabel('Number of Cases')
plt.xticks(rotation=45)
plt.show()

elif choice == 7:
df = pd.read_csv("rape_cases_india.csv")
if 'Population' in df.columns:
df['Cases per 100k'] = (df['Total Cases'] / df['Population']) * 100000
cases_per_100k = df.groupby('State')['Cases per
100k'].mean().sort_values(ascending=False)
plt.bar(cases_per_100k.index, cases_per_100k.values, color='orange')
plt.title('Cases per 100,000 Population by State')
plt.xlabel('State')
plt.ylabel('Cases per 100,000')
plt.xticks(rotation=45)
plt.show()
else:
print("Population data not available in the dataset.")

elif choice == 8:
df = pd.read_csv("rape_cases_india.csv")
if 'Region' in df.columns:
region_trends = df.groupby(['Year', 'Region'])['Total Cases'].sum().unstack()
14
region_trends.plot(marker='o')
plt.title('Trend of Cases by Region')
plt.xlabel('Year')
plt.ylabel('Total Cases')
plt.grid(True)
plt.legend(title='Region')
plt.show()
else:
print("Region data not available in the dataset.")

elif choice == 9:
df = pd.read_csv("rape_cases_india.csv")
if 'Victim Education Level' in df.columns:
education_distribution = df['Victim Education Level'].value_counts()
plt.bar(education_distribution.index, education_distribution.values,
color='lightblue')
plt.title('Victim Education Level Distribution')
plt.xlabel('Education Level')
plt.ylabel('Number of Cases')
plt.xticks(rotation=45)
plt.show()
else:
print("Victim education level data not available in the dataset.")

elif choice == 10:


print("Exiting the program.")
break

else:
15
print("WRONG INPUT!! Please enter a number between 1 and 10.")

16
SCREEN SHOTS OF EXECUTION

17
18
19
20
21
22
FEATURES
 Display Complete Dataframe:

 Allows users to view the entire dataset in tabular format for detailed inspection.

 Analyze Cases by State:

 Visualizes the number of cases reported in each state using bar charts.
 Helps identify states with higher or lower case counts.

 Analyze Cases by Year:

 Plots year-wise trends in reported cases.


 Useful for understanding whether cases are increasing or decreasing over time.

 Age-Wise Distribution Analysis:

 Displays the distribution of cases across different victim age groups.


 Highlights which age groups are most affected.

 Statistical Summary:

 Provides key statistics such as total cases, cases by year, state, and age group.
 Offers a quick summary of the dataset.

 Top States with Highest Cases:

 Identifies and plots the top 10 states with the highest number of reported cases.
 Useful for prioritizing regions for further study.

 Cases Per 100,000 Population:

 Normalizes case data by state population and calculates cases per 100,000 people.
 Facilitates fair comparisons between states with varying population sizes.

 Trend Analysis by Region:

23
 Shows trends of reported cases in different geographical regions over the years.
 Useful for observing regional patterns and changes.

 Victim Education Level Distribution:

 Displays the distribution of cases based on the education level of victims.


 Provides insights into whether education levels correlate with vulnerability.

 Interactive and User-Friendly Menu:

 The program offers a clear and easy-to-navigate menu-driven interface.


 Users can choose specific analyses they want to perform.

 Dynamic Data Visualization:

 Employs various types of plots (bar charts, line charts, histograms) to make data insights visually
appealing and easy to understand.

 Extensible and Customizable:

 New features or additional columns in the dataset can be easily integrated into the program.

24
BIBLIOGRAPHY
BOOKS:

 INFORMATICS PRACTICES WITH PYTHON-


BY SUMITA ARORA

WEBSITES:
 www.geeksforgeeks.org
 https://docs.python.org/3/
 https://www.w3schools.com/python/

25

You might also like