Customer Service Requests Analysis PDF
Customer Service Requests Analysis PDF
DESCRIPTION
NYC 311's mission is to provide the public with quick and easy access to all New York City
government services and information while offering the best customer service. Each day,
NYC311 receives thousands of requests related to several hundred types of non-emergency
services, including noise complaints, plumbing issues, and illegally parked cars. These
requests are received by NYC311 and forwarded to the relevant agencies such as the police,
buildings, or transportation. The agency responds to the request, addresses it, and then
closes it.
Problem Objective :
Perform a service request data analysis of New York City 311 calls. You will focus on
the data wrangling techniques to understand the pattern in the data and also visualize
the major complaint types.
(Perform a service request data analysis of New York City 311 calls)
- Whether the average response time across complaint types is similar or not (overall)
- Are the type of complaint or service requested and location related?
Dataset Description :
In [2]: # Using set_option function to display the needed no. of rows and columns
pd.set_option('display.max_columns',30)
pd.set_option('display.max_rows',800)
#To ignore warnings
warnings.simplefilter('ignore')
Task 1
- Import a 311 NYC service request
Out[4]:
Unique Created Closed Agency Complaint Incident Incident Street
Agency Descriptor Location Type
Key Date Date Name Type Zip Address Name S
3 rows × 53 columns
In [6]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300698 entries, 0 to 300697
Data columns (total 53 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unique Key 300698 non-null int64
1 Created Date 300698 non-null object
2 Closed Date 298534 non-null object
3 Agency 300698 non-null object
4 Agency Name 300698 non-null object
5 Complaint Type 300698 non-null object
6 Descriptor 294784 non-null object
7 Location Type 300567 non-null object
8 Incident Zip 298083 non-null float64
9 Incident Address 256288 non-null object
10 Street Name 256288 non-null object
11 Cross Street 1 251419 non-null object
12 Cross Street 2 250919 non-null object
13 Intersection Street 1 43858 non-null object
14 Intersection Street 2 43362 non-null object
15 Address Type 297883 non-null object
16 City 298084 non-null object
17 Landmark 349 non-null object
18 Facility Type 298527 non-null object
19 Status 300698 non-null object
20 Due Date 300695 non-null object
21 Resolution Description 300698 non-null object
22 Resolution Action Updated Date 298511 non-null object
23 Community Board 300698 non-null object
24 Borough 300698 non-null object
25 X Coordinate (State Plane) 297158 non-null float64
26 Y Coordinate (State Plane) 297158 non-null float64
27 Park Facility Name 300698 non-null object
28 Park Borough 300698 non-null object
29 School Name 300698 non-null object
30 School Number 300698 non-null object
31 School Region 300697 non-null object
32 School Code 300697 non-null object
33 School Phone Number 300698 non-null object
34 School Address 300698 non-null object
35 School City 300698 non-null object
36 School State 300698 non-null object
37 School Zip 300697 non-null object
38 School Not Found 300698 non-null object
39 School or Citywide Complaint 0 non-null float64
40 Vehicle Type 0 non-null float64
41 Taxi Company Borough 0 non-null float64
42 Taxi Pick Up Location 0 non-null float64
43 Bridge Highway Name 243 non-null object
44 Bridge Highway Direction 243 non-null object
45 Road Ramp 213 non-null object
46 Bridge Highway Segment 213 non-null object
47 Garage Lot Name 0 non-null float64
48 Ferry Direction 1 non-null object
49 Ferry Terminal Name 2 non-null object
50 Latitude 297158 non-null float64
51 Longitude 297158 non-null float64
52 Location 297158 non-null object
dtypes: float64(10), int64(1), object(42)
memory usage: 121.6+ MB
Out[9]: 0
In [10]: nyc_dataset.isna().sum()
Out[10]: unique_key 0
created_date 0
closed_date 2164
agency 0
agency_name 0
complaint_type 0
descriptor 5914
location_type 131
incident_zip 2615
incident_address 44410
street_name 44410
cross_street_1 49279
cross_street_2 49779
address_type 2815
city 2614
status 0
due_date 3
resolution_description 0
resolution_action_updated_date 2187
community_board 0
borough 0
x_coordinate_(state_plane) 3540
y_coordinate_(state_plane) 3540
park_borough 0
latitude 3540
longitude 3540
location 3540
dtype: int64
Task 2
- Read or convert the columns ‘Created Date’ and Closed Date’ to datetime datatype
- create a new column ‘Request_Closing_Time’ as the time elapsed between request creation and re
quest closing.
Out[13]:
unique_key created_date closed_date agency agency_name complaint_type descriptor location_type incident_zip incident_addre
In [15]: nyc.head()
Out[15]:
unique_key created_date closed_date agency agency_name complaint_type descriptor location_type incident_zip incident_addre
Task 3
- Visualization
- Atleast 4 main conclusions
Conclusion 1
- Most Number of complaint requests received is for Blocked Driveway Followed by that is illegal
parking.
Conclusion 2
- Most Number of complaint requests received are from the city Brooklyn .
Conclusion 3
- Manhattan Borough has the minimum average complaint response time and Bronx Borough has the ma
ximum average complaint response time.
Conclusion 4
- Arverne has the minimum complaint request respond time and Floral Park has the maximum complai
nt request respond time.
Conclusion 5
- Posting Advertistment complaints are responded faster and Derelict Vehicle complaints are resp
onded slower.
In [21]: # visualizing Cities with number of complaint requests received and its complaint types
city_complaint_types = pd.crosstab(index=nyc['city'],columns=nyc['complaint_type'])
txt={'weight':'bold'}
plt.figure(figsize=(20,10))
city_complaint_types.plot(kind='barh',figsize=(15,25),stacked=True)
plt.title("City total complaint request counts with complaint types",fontdict=txt)
plt.xlabel("Total no. of complaint request ",fontdict=txt,labelpad=20)
plt.ylabel("City",fontdict=txt,labelpad=30)
plt.show()
Conclusion 6
- Brooklyn has the maximum complaint types received and it has the maximum number of complaints
requested as well than any other city.
Task 4
- Ordering the complaint types based on average response time for different locations
In [22]: # Grouping complaints by cities and finiding mean response time for each complaint type
# Sorting the mean response time of different complaint types for each city
city_complaintype_group = nyc.groupby(['city','complaint_type'])['request_closing_time_mins'].mean().un
stack(level=1)
city_complaintype_group = city_complaintype_group.T
col = city_complaintype_group.columns
for i in col:
exec("{} = city_complaintype_group['{}'].sort_values()".format(i,i))
In [23]: # Visualizing the top 6 cities with the mean response time sorted for different complaint types
plt.figure(figsize=(20,10))
plt.subplots_adjust(hspace=1.6,wspace=0.5)
plt.suptitle("Top 6 cities with more no. of complaints and Their response time",fontweight="bold",fonts
ize="25",y=1.1)
txt={'weight':'bold'}
plt.subplot(2,3,1)
plt.title('Brooklyn average complaint response time',fontdict=txt,y=1.1)
brooklyn.dropna().plot.bar()
plt.xlabel('complaint type',fontdict=txt,labelpad=20)
plt.ylabel('Average response time (mins)',fontdict=txt,labelpad=30)
plt.ylim(0,800)
plt.subplot(2,3,2)
plt.title('New York average complaint response time',fontdict=txt,y=1.1)
new_york.dropna().plot.bar()
plt.xlabel('complaint type',fontdict=txt,labelpad=20)
plt.ylabel('Average response time (mins)',fontdict=txt,labelpad=30)
plt.ylim(0,800)
plt.subplot(2,3,3)
plt.title('Bronx average complaint response time',fontdict=txt,y=1.1)
bronx.dropna().plot.bar()
plt.xlabel('complaint type',fontdict=txt,labelpad=20)
plt.ylabel('Average response time (mins)',fontdict=txt,labelpad=30)
plt.ylim(0,800)
plt.subplot(2,3,4)
plt.title('Staten Island average complaint response time',fontdict=txt,y=1.1)
staten_island.dropna().plot.bar()
plt.xlabel('complaint type',fontdict=txt,labelpad=20)
plt.ylabel('Average response time (mins)',fontdict=txt,labelpad=30)
plt.ylim(0,800)
plt.subplot(2,3,5)
plt.title('Jamaica average complaint response time',fontdict=txt,y=1.1)
jamaica.dropna().plot.bar()
plt.xlabel('complaint type',fontdict=txt,labelpad=20)
plt.ylabel('Average response time (mins)',fontdict=txt,labelpad=30)
plt.ylim(0,800)
plt.subplot(2,3,6)
plt.title('Astoria average complaint response time',fontdict=txt,y=1.1)
astoria.dropna().plot.bar()
plt.xlabel('complaint type',fontdict=txt,labelpad=20)
plt.ylabel('Average response time (mins)',fontdict=txt,labelpad=30)
plt.ylim(0,800)
plt.show()
In [ ]:
Task 5
Statistical Test
- Whether the average response time across complaint types is similar or not (overall)
- Are the type of complaint or service requested and location related?
F-Test
Testing at Confidence level(95%) => alpha value = 0.05
* Null Hypothesis : H0 : There is no significant difference in average response time across different complaint types
* Alternate Hypothesis : H1 : There is a significant difference in average response time across different complaint types
- There is a significant difference in average response time across different complaint types
(i.e) the average response time across different complaint types is not similar (overall)
* Null Hypothesis : H0 : There is no significant relation between type of complaint and location
* Alternate Hypothesis : H1 : There is some significant relation between type of complaint and location
- There is some significant relation between type of complaint and location (i.e) The type
of complaint or service requested and the location are related
In [ ]: