CHAPTER-2 Data Visualization
CHAPTER-2 Data Visualization
DATA VISUALIZATION
➢ Data visualization is the presentation of data in graphical format. It helps people understand the significance of
data by summarizing and presenting a huge amount of data in a simple and easy to understand format and helps
communicate information clearly and effectively.
➢ Plotting using Matplotlib
⚫ The Matplotlib Python library developed by John Hunter and many other contributors, is used to create high
quality graphs, charts and figures.
⚫ Matplotlib produces publication quality figures in a variety of hardcopy format and interactive environments
across platforms. It can be used in Python scripts, the Python and IPython shell, web application servers and
various graphical user interface toolkits.
⚫ For installation of matplotlib in various operating system such as windows, Linux, MacOS, etc., use following
command at command prompt :
Python -m pip install -U matplotlib
⚫ Importing matplotlib
from matplotlib import pyplot as plt
OR
import matplotlib.pyplot as plt
⚫ Plotting using matplotlib provides a brief introduction to plotting in Pandas using matplotlib. The matplotlib
API is imported using the standard convention.
⚫ pyplot is a module in the matplotlib package. This module provides an interface that allows you to implicitly
and automatically create figures and axes to achieve the desired plot.
e.g.,
from matplotlib import pyplot as plt
plt.plot ([4, 6, 2], [2, 3, 6])
plt.show()
Output
➢ Line Plot
For all matplotlib plots, we start by creating a figure and an axes.
The figure (an instance of the class plt. figure) can be thought of as a single container that contains all the objects
represented axes, graphics, text and labels.
The axes (an instance of the class plt. axes) is a bounding box with ticks and labels, which will eventually contain
the plot elements that make up our visualization. If we want to create a single figure with multiple lines, we can
simple call the plot function multiple times :
e.g,. import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = plt.axes()
x = np.linspace(0, 10,1000)
ax.plot(x, np.sin(x))
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
Output
1.00
0.75
0.50
0.25
0.00
–0.25
–0.50
–0.75
–1.00
0 2 4 6 8 10
➢ Plotting Bar Graph
Categorical data can be represented in rectangular blocks with different heights or lengths proportional to the
values. Such a type or representation is called a bar chart. The bar chart can be plotted vertically or horizontally.
A bar graph uses bars to compare data among different categories.
e.g.,
from matplotlib import pyplot as plt
import numpy as np
plt.bar([0.25, 1.25, 2.25, 3.25, 4.25], [70, 50, 60, 75, 30],
label=”Kiyaan”,width=.5)
plt.bar([.75, 1.75,2.75,3.75,4.75],[75, 60, 50, 80, 93],
label =”Shreya”,color=’r’,width=.5)
plt.legend()
plt.xlabel(‘Month’)
plt.ylabel(‘Salary (Thousand)’)
plt.title(‘Details’)
Output Details
80
60
Salary (Thousand)
45
40
20
0
0 1 2 3 4 5
Months
➢ Plotting Histogram
A histogram is an accurate representation of the distribution of numerical data. It uses rectangle to represent data.
Histograms are used to show a distribution. A probability distribution can be estimated using a histogram plot.
import matplotlib.pyplot as plt
days = [50,80,70,80,40,20,20,20,70,20,60,20,80,50,40,50,20,60,60,60]
bins = [0,10,20,40,50,60,70,80,90,100]
plt.hist(days, bins, histtype=’stepfilled’, rwidth=0.88)
plt.xlabel(‘Distance in kms’)
plt.ylabel(‘kilometer count’)
plt.title(‘bike details Histogram’)
Output
➢ Customizing Plots
You can customise the charts or graphs with proper details. The graph or plot should have a proper title,
labels, legends etc.
⚫ Adding a tittle : To add a tittle in chart or graph title () function is
used. Syntax
<matplotlib.pyplot>.title(tittle_string)
⚫ Adding Labels : To set the labels for X-axis and Y-axis, xlable() and ylabel() are used respectively.
⚫ Adding Legends : When we plot multiple ranges on a single plot, it becomes necessary that legends
are specified. To add legend to the plot, legend( ) function is used.
Syntax
<matplotlib.pyplot>.legend(loc = <string or position no>)