Chap 2. Data presentation
Chap 2. Data presentation
Total 18
5
Frequency distribu8on for con8nuous variables
• Frequency distribu8ons present data in a rela8vely compact
form, gives a good overall picture, and contain informa8on that
is adequate for many purposes, but there are usually some
things which can be determined only from the original data.
X 100
[30-40[ 35 2090 33.7 76.9
[40-50[
[50-60[
45
55
٪ 870
399
14.0
6.4
90.9
97.4
[60-70[ 65 127 2.0 99.4
[70-80[ (75) 37 0.6 100.0
10
6209 100
Frequency Absolute/rela8ve frequency
Rela8ve
Frequency
Absolute
2255 or 36.3%
2090 or 33.7%
Histogram
TOTAL :
40,3% 2.500 6209 or
100%
32,2% 2.000
870 or 14%
y
24,2% c1.500
n
e
u
q
399 or 6.4%
e
r
F
311 or 5%
16,1%
120 or 1.9%
1.000
37 or 0.6%
127 or 2%
8,1% 500
Mean = 33,03
Std. Dev. = 12,348
0% 0 N = 6.209
0 10 20 30 40 50 60 70 80
11
Age
Density of rela8ve frequency
Absolute rela8ve Density of rela8ve
Classes Mid-
frequencies frequencies (%) frequencies
points
ci xi ni
interval, i.e 10
[20-30[ 25 2255
[30-40[ 35 2090 33,7 3,37
[40-50[ 45 870 14,0 1,4
[50-60[ 55 399 6,4 0,64
[60-70[ 65 127 2,0 0,2
[70-80[ 75 37 0,6 0,06
6209 100 10
12
N.B.
When classes have the same intervals, we can use
• Absolute frequencies
• Rela8ve frequencies
• Density of frequency
13
3.2. Graphs
• Frequency distribu8ons can be oGen displayed effec8vely
using graphs or diagrams
• Diagrams give a very clear picture of data
• The rela8onship between numbers of various magnitudes
can usually be seen more quickly and easily from a graph
than from a table.
• They have greater a>rac8on and facilitate comparison.
• But it is not to be used when comparison is either not
possible or is not necessary.
• Diagramma8c representa8on is not an alterna8ve to
tabula8on.
• It can give only an approximate idea and as such where
greater accuracy is needed diagrams will not be suitable.
Histogram
• For quan8ta8ve con8nuous data.
• Put the observa8on in the ascending order
• Take a number of classes near to Ntot
• Define classes [1-2] [3-4] or [1-2[ [2-3[...
• Calculate the frequency (absolute, rela8ve, cumula8ve) or
the frequency density for each class
• Draw a rectangle for each class.
• The base of the rectangle= the interval of the class
• The height of each bar gives the frequency in each interval.
• The area of the rectangle is propor4onal (not necessarly
equal) to number of observa8ons of that class
• The total area equals the 100% of all observa8ons
15
frequencies
Density of rela4ve
con8nuous variable
Histogram
2.500
3,5
2.000
3
y 2,5
c1.500
n
e
u
q 2
e
r
F
1.000
1,5
1
500
9 15 15 7 11 12 14 10 11 8
8 11 11 14 8 10 11 11 10 11
7 15 12 6 14 9 15 8 8 14
15 10 11 13 11 11 15 12 15 10
11 9 8 13 9 8 13 14 15 15
10 10 7 15 15 7 14 9 3 10
15 10 15 8 15 8 14 9 6 13
12 11 9 9 13 14 8 13 8 5
Make a table of 10 classes, with equivalent interval (0-2; 2-4; 4-6;…18-20) of absolutes ,
relatives and cumulatives frequencies and the density of relative frequencies.
18
Exercise n° 3
absolute relative freq cumulative Dens of rel
Classes
freq (% ) freq (% ) freq
30
20
10
0
Single Married Divorced Widowed
Marital status
C. Component (or sub-divided) Bar
Diagram
• Bars are sub-divided into component parts of
the figure.
• These sorts of diagrams are constructed when
each total is built up from two or more
component figures.
Component bar diagram
4. Pie-chart
• For displaying the rela8ve frequency distribu8on of
qualita8ve or quan8ta8ve discrete data
• it is a circle divided into sectors so that the areas of the
sectors are propor8onal to the frequencies.
3.3.Summarizing data
37
a) The Mean
n
1
x= n ∑ xi
i =1
38
Example
Example 1: The systolic blood pressure of seven
pa8ents were as follows:
151, 124, 132, 170, 146, 124 and 113.
x=
(151 + 124 + 132 + 170 + 146 + 124 + 113)
The mean is 7
= 137.14
39
∑ x The sum of
x=
n
Example 2.
Marks out of
20 for 20
students
15 7 12 10 8
11 14 10 11 11
15 6 9 8 14
16 13 11 12 10
41
2. Median:
1 2 3 4 5
7 8 10 12 15
n=5
Posi2on of the median = 3
Value of the median = 10
N.B.: Median= 50th percen0le = P50
Example 1 – n is odd
The reordered systolic blood pressure data seen earlier are:
43
Example 2 – n is even
Six men with high cholesterol par8cipated in a study to
inves8gate the effects of diet on cholesterol level. At the
beginning of the study, their cholesterol levels (mg/dL) were as
follows:
366, 327, 274, 292, 274 and 230.
Rearrange the data in numerical order as follows:
The Median is half way between the middle two readings, i.e.
(274+292) ÷ 2 = 283.
Two men have the same cholesterol level- the Mode is 274.
44
1. Mode: the value or class with the highest frequency
in the sample / popula8on
Marks over 15 7 12 10 8
20 of 20 11 14 10 11 11
students 15 6 9 8 14
(QCM) 16 13 11 12 10
Mode = 11
If con8nuous variable : modal Class
45
Exemples of unimodal distribu8ons (one mode)
46
Symetric Distribu4on and unimodal
Mean
median 47
Unimodal distribu4on with nega4ve
skewness
mean median
48
Unimodal distribu4on with posi4ve
Skewness
median mean
49
Skewness
• If extremely low or extremely high
observa8ons are present in a distribu8on,
then the mean tends to shiG towards those
scores.
• Based on the type of skewness, distribu8ons
can be:
a) Nega4vely skewed distribu4on: occurs when majority
of scores are at the right end of the curve and a few
small scores are sca>ered at the leG end.
b) Posi4vely skewed distribu4on: Occurs when the
majority of scores are at the leG end of the curve and a
few extreme large scores are sca>ered at the right
end.
c) Symmetrical distribu4on: It is neither posi8vely nor
nega8velyskewed. A curve is symmetrical if one half of
the curve is the mirror image of the other half.
c) Geometric mean
54
1.Range
1. Range:
The difference between the maximum and the
minimum value in the data set
Range = Max – Min
Eg. data: -4 -3 -1 1 3 5
Range = 5 – (-4) = 9
§ easy to calculate;
• useful for “best” or “worst” case scenarios
• sensi8ve to extreme values
55
2. Variance
2. Variance: the mean of squared devia8ons from
the mean N
∑ ( x − µ )²
i
Always
Popula8on : σ ² = i =1 posi8ve
N popula8on size
n
∑ ( x − x)² i
Sample : s² = i =1
( n − 1)
Eg. data: 4 3 1 2 3 5
Mean: 18/6 = 3
Squared devia8ons from the mean: 1 0 4 1 0 4
Sum of Squared devia8ons from the mean : 10
Variance: S² = 10/5 = 2 56
3 . Standard Devia8on
• The sample standard devia4on, s, is the square-root
of the variance
n
2
∑ (xi − x )
i =1
s=
n −1
57
Example
Data Deviation Deviation2
151 13.86 192.02
124 -13.14 172.73
132 -5.14 26.45
170 32.86 1079.59
146 8.86 78.45
124 -13.14 172.73
113 -24.14 582.88
Sum = 960.0 Sum = 0.00 Sum = 2304.86
x = 137.14
58
Example (contd.)
7
2
∑ (x − x )
i =1
i = 2304.86
Therefore, 2304.86
s=
7 −1
= 19 .6
59
4. Coefficient of Varia8on
• In some cases the varaince of a variable changes with its mean
• The coefficient of varia4on (CV) or rela4ve standard devia4on (RSD) is a measure of
rela8ve variability.
• It is a ra8o of data dispersion (standard devia8on) to the mean and shows the extend of
variability in rela8on to the mean
⎛s⎞
CV = ⎜ ⎟ × 100%
⎝x⎠
• The CV is not affected by mul8plica8ve changes in scale
• Consequently, a useful way of comparing the dispersion of variables measured
independently to the unit in which the measurement was taken
• Generally small values of CV are considered best, since that means that the
variability in measurements is small rela8ve to their mean (measurements are
consistent in their magnitudes).
• i,e the higher the CV the greater the dispersion
60
Example
The CV of the blood pressure data is:
⎛ 19.6 ⎞
CV = 100 × ⎜ ⎟%
⎝ 137.1 ⎠
= 14.3%
61
5.Inter-quar8le range
• The Median divides a distribu8on into two halves.
• The first and third quar8les (denoted Q1 and Q3) are defined
as follows:
– 25% of the data lie below Q1 (and 75% is above Q1),
– 25% of the data lie above Q3 (and 75% is below Q3)
Q1 Q2 Q3
64
Exercise
In one class, the notes (out of 20) obtained in biosta8s8cs from a
sample of students are as follows:
65
Mean 14
Mode 14
Median 13.5
Variance 14.67
Std dev 3.83
Range 11
2.3.2. Box-plots
• A box-plot is a visual descrip8on of the
distribu8on based on
– Minimum
– Q1
– Median
– Q3
– Maximum
• Useful for comparing large sets of data
67
Building a box plot
1. Calculate important values
69
Example 1: Box-plot
70
Remarks
• The box is always limited by Q1 andQ3
• But the whiskers can represent several things according
different authors/programs
Ø the minimum and the maximum
Ø The low and high subsequent values
Ø A standard devia8on above and below the mean
h>p://en.wikipedia.org/wiki/Box_plot
71
QUIZ 1/ 2 marks