0% found this document useful (0 votes)
14 views

Lecture 2

The document provides an overview of organizing and graphing data in probability and statistics, focusing on both qualitative and quantitative data. It discusses various methods of data representation, including frequency distributions, bar graphs, pie charts, histograms, and box-and-whisker plots. Additionally, it explains concepts such as relative frequency, percentage distributions, and frequency density, along with examples for clarity.

Uploaded by

Cynosure Wolf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture 2

The document provides an overview of organizing and graphing data in probability and statistics, focusing on both qualitative and quantitative data. It discusses various methods of data representation, including frequency distributions, bar graphs, pie charts, histograms, and box-and-whisker plots. Additionally, it explains concepts such as relative frequency, percentage distributions, and frequency density, along with examples for clarity.

Uploaded by

Cynosure Wolf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Probability and Statistics

Lecture # 2

Organizing and Graphing Data


Raw Data
 Raw data is data recorded in the sequence in which they are collected and before they
are processed or ranked
 Eg. Suppose we collect information on the ages of 50 students selected from a
university.
From Raw to Organized – Qualitative Data
 Several ways of representing qualitative data in a table:

• Frequency Distribution

• Relative Frequency Distribution

• Percentage Distribution
Frequency Distribution
Frequency Distribution: A frequency distribution for qualitative data lists
all categories and the number of elements that belong to each of the
categories.
Relative Frequency
 Relative Frequency (R.F)
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑎𝑡 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦
• For a category, R.F. =
𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
• Percentage = (Relative Frequency) x 100
Graphing Qualitative Data

 There are several ways qualitative data can be represented:

• Bar Graphs: A bar graph is used to get an impression of the distribution of a discrete
or categorical data set. A graph made of bars whose heights represent the
frequencies of respective categories is called a bar graph. To construct a bar diagram,
the values of the variable or categories are taken along x-axis and a bar with height
equal to its frequency is drawn on each category.

• Pie Charts: A circle divided into portions that represent the relative frequencies or
percentages of a population or a sample belonging to different categories.
Bar Graph
Pie Chart
Graphing Qualitative Data
 MULTIPLE BAR DIAGRAM:
It is an extension of the simple bar diagram and is used to represent two or more
related sets of data in the form of groups of simple bars. Its main purpose is to
compare same characteristics of a variable.
 Example of Multiple Bar Diagram:
Following data is about the production of wheat in different localities of the Punjab
for years 1987 to 1989.
Graphing Qualitative Data
 SUB-DIVIDED BAR DIAGRAM:
There are certain situations where the simple bar diagram represents the totals
and it is possible to divide it further into different segments.

• Example:
There were 500 people of blood group A (kind 1). 300 of blood group B (kind 2) and
400 of blood group O (kind 3). After classification, it was observed that for kind 1
there were 200 females, for kind 2 there were 100 females and for kind 3 there
were 200 females.
Organizing and graphing quantitative data
 Frequency distributions:
 A frequency distribution for quantitative data lists all the classes and the number of
values that belong to each class.
 Data presented in the form of a frequency distribution are called grouped data.
CONT…

 Class Limits
• Specify the span of data values that fall within a class.

 Class boundary
• The class boundary is given by the midpoint of the upper limit of one class and the lower
limit of the next class.

 Class width
• Class width = Upper boundary - Lower boundary

 Calculating Class Midpoint or Mark


• Class midpoint or mark = (Lower limit + upper limit)/2
Example
 The class midpoints for the frequency distribution of Table 2.7 are listed in the fourth
column of Table 2.8.
Relative Frequency and Percentage Distributions
 Relative frequency
Frequency of that class
• R.F. of a class =
Sum of all frequencies
 Percentage = (Relative frequency) x 100
EXAMPLE OF A QUANTITATIVE DATA
 The following data gives the total home runs hit by all players of each of the 30 Major
League Baseball teams during the 2004 season. Construct a frequency distribution
table.
135, 178, 169, 222, 235, 242, 194, 184, 202, 201, 148, 187, 150, 162, 203, 135, 191, 151,
185, 242, 189, 215, 142, 214, 139, 183, 136, 145, 227, 145.
no of classes=1+ 3.3 log(n)
class interval=Maximum value- Minimum value/ no of classes
LESS THAN METHOD FOR WRITING CLASSES

 Construct the frequency distribution table using the less than


method to write classes. Take 0 as the lower boundary of the first
class and 6 as the width of each class.
 Calculate the relative frequencies and percentages for all classes.
 Draw a histogram for the frequency distribution.

4.95, 27.99, 8.00, 5.80, 4.50, 2.99, 4.85, 6.00, 9.00, 15.75,
9.50, 3.05, 5.65, 21.00, 16.60, 18.00, 21.77, 12.35, 7.75,
10.45, 3.85, 28.45, 8.35, 17.70, 19.50, 11.65, 11.45, 3.00,
6.55, 16.50.
Single-Valued Classes
 The administration in a large city wanted to know the distribution of vehicles owned
by households in that city. A sample of 40 randomly selected households from this city
produced the following data on the number of vehicles owned.
5 1 1 2 0 1 1 2 1 1 1
3 3 0 2 5 1 2 3 4 2 1
2 2 1 2 2 1 1 1 4 2 1
1 2 1 1 4 1 3
 Construct a frequency distribution table for these data using single-valued classes.
Single-Valued Classes
Histograms
 A histogram is an accurate graphical representation of the distribution of numerical
data.
 It is an estimate of the probability distribution of a continuous variable (quantitative
variable) and was first introduced by Karl Pearson. It resembles a bar graph in its
shape.
Histograms
Histograms
Shapes of Histograms

A histogram can assume any one of 3 shapes:


• Symmetric histogram
• Left Skewed Histogram
• Right skewed Histogram
Shapes of Histograms

Symmetric

Right-Skewed Left-Skewed
Difference between Histogram and bar Chart

 Histograms are used to show distributions of variables while bar chart are used
to compare variables.
 Histograms plot quantitative data with ranges of the data grouped into intervals
while bar charts plot categorical data.
 Bars can be reordered in bar charts but not in histograms.
Histogram With unequal Class Intervals

 While constructing a histogram with unequal class widths, we must ensure that the
areas of the rectangles are proportional to the class frequencies. However, as in the
histograms, it is the area which represents the frequencies.
 A histogram represents a frequency Distribution by means of rectangles whose width
represent class intervals and whose area are proportional to the corresponding
frequencies, the height of each is the average frequency density for the interval.
 Histogram differs from bar chart in that it is the area of the bar that denotes the
values, not the height. This means that we would need to consider the widths in order
to determine the height of each rectangle.
Histogram With unequal Class Intervals
 The following frequency distribution gives the masses of 48 objects measured to
the nearest gram. Draw a histogram to illustrate the data.
Concept Of Frequency Density With An Example
 Suppose you have the following data about the ages of people in a group. The data is
grouped into age ranges (or classes), and the frequency represents the number of
people in each range:

27/116
Concept Of Frequency Density With An Example
 Step-by-step Calculation of Frequency Density:
1. For the age range 10–19:-
• Frequency= 20
• Class Width = 10
• Frequency Density = Frequency ÷ Class Width = 20 ÷ 10 = 2

28/116
Concept Of Frequency Density With An Example
 Step-by-step Calculation of Frequency Density:
2. For the age range 20–24:-
• Frequency= 15
• Class Width = 5
• Frequency Density = Frequency ÷ Class Width = 15 ÷ 5 = 3

29/116
Concept Of Frequency Density With An Example
 Step-by-step Calculation of Frequency Density:
3. For the age range 25–34:-
• Frequency= 30
• Class Width = 10
• Frequency Density = Frequency ÷ Class Width = 30 ÷ 10 = 3

30/116
Concept Of Frequency Density With An Example
 Step-by-step Calculation of Frequency Density:
4. For the age range 35–44:-
• Frequency= 10
• Class Width = 10
• Frequency Density = Frequency ÷ Class Width = 10 ÷ 10 = 1

31/116
Concept Of Frequency Density With An Example
Histogram:
 In a histogram representing this data:-
 The height of each bar is the frequency density
 The width of each bar represents the class width
 The area of each bar (height × width) represents the frequency of each class.

 This approach ensures that intervals of different widths are comparable, as a smaller
interval (like 20–24) will have a higher frequency density if the frequency is high
relative to the width.

32/116
Histogram With unequal Class Intervals
Histogram With unequal Class Intervals
POLYGON

 A polygon is another device that can be used to present quantitative data in graphic
form.
 To draw a frequency polygon, we first mark a dot above the midpoint of each class at a
height equal to the frequency of that class.
 This is the same as marking the midpoint at the top of each bar in a histogram.
 Next we mark two more classes, one at each end, and mark their midpoints.
 Note that these two classes have zero frequencies.
 In the last step, we join the adjacent dots with straight lines.
 The resulting line graph is called a frequency polygon or simply a polygon.
 Frequency polygons are a graphical device for understanding the shapes of
distributions. They serve the same purpose as histograms, but are especially helpful
for comparing sets of data.
OGIVE

 Ogive is a curve drawn for the cumulative frequency distribution by joining with
straight lines the dots marked above the upper boundaries of classes at heights
equal to the cumulative frequencies of respective classes.
Stem and Leaf Displays
 A stem and leaf display is a device of representing quantitative data in a graphical
format, similar to histogram, to assist in visualizing the shape of distribution. In a stem-
and-leaf display of quantitative data, each value is divided into two portions__ a stem
and a leaf. To arrange data in order we can use stem and leaf method.

 An advantage of a stem-and-leaf display over a frequency distribution is that by


preparing a stem-and-leaf display we do not lose information on individual
observations.
CONT…

 A few examples:
BOX-AND-WHISKER PLOT

 A box-and whisker plot gives a graphic presentation of data using


five measures: the median, the first quartile, the third quartile, and
the smallest and the largest values in the data set between the
lower and the upper inner fences.
 A box-and-whisker plot can help us visualize the center, the spread,
and the skewness of a data set.
 It also helps to detect outliers.
 We can compare different distributions by making box-and-whisker
plots for each of them.
Thank you!

You might also like