STATISTICS (Organizing Data)
STATISTICS (Organizing Data)
ORGANIZING DATA
WHAT
✔ Frequency Tables
- partitions data into classes or intervals and shows howmanydatavalues are in each
class. The classes or intervals are constructed so that eachdata value falls into exactly one
class.
- the classes do not overlap and they don’t have a gap where datacould fall in between the
classes, thus each value should be in exactly one class- sum up all the frequency in which it
became our sample size
HOW:
1. Decide how many classes: 5-15 (20 max for large data) 2. Find the class width =
Largest data value -smallest data value , desired number of classes
increase to next whole number (integer data)
Sample solution base on given: 47 - 1 = 46 or 7.6 or 8 6 6
3. Find the lower class limits:
a. The first lower class limit is the smallest data value b. Add the class width to the
lower class limit to find the next lower classlimit
c. Repeat until all lower class limits are computed 4. Find the upper class
limits:
a. To find the upper class limit, subtract 1 from the 2
nd
lower class limit
b. Add the class width to the upper class limit to find the next one c. Repeat until all upper
class limits are computed 5. Tally the data into classes. Each data value should fall into
exactly onclass6. Total the tallies to find each class frequency
7. Compute midpoints (class mark) for each class
midpoint = lower class limit + upper class limit
2
8. Find the class boundaries:
a. Lower class boundary: subtract 0.5 from the lower class limit b. Upper class
boundary: add 0.5 to the upper class limit
9. Calculate relative frequency
rf = class frequency = f
Total of all frequencies n
● If we want to compare two samples that have different sample sizes, then we need to
standardize our values
● One way to do this is to use relative frequencies because their total isalways equal
to one (one)
* Histogram is similar to making a graph out of frequency and distancebase on the given
above. Frequency on the Y-axis and Distance on the X-axis * Relative Frequency Histogram is
also similar to histogrambut its Y-axisare the data from the relative frequency table. But for the
X-axis of the relativefrequency histogram, aside form the distance or the boundaries, class
midpoint and class limits can also be use.
Distribution Shapes
a) Mound Shape
b) Uniform/Rectangular
c) Skewed Left - long tail on left
d) Skewed Right - long tail on right
e) Bimodal - two peaks
✔ Cumulative Frequency Tables
* Many times we want to know the number of individuals that have a valuebelow some
level
* We use the upper boundary and calculate the cumulative frequencytothat point
* Cumulative frequency for a class - the sum of frequencies for that classand all
previous classes
* Adding that column to a frequency table makes it a cumulative frequencytable
* A graph that displays the cumulative is an ogive
* To get the cumulative frequency, add the first value of frequency, whichis also use as the
first value of cumulative frequency, to the next value (23+43= 66) , (66 + 51 = 117) you will do
this until you will get the total valueof frequency
* No need to total or sum up the cumulative frequency since the last valueis already
the total value of the frequency
✔ Ogive
* For the lower class boundary of the first class, place a dot at zero height. * For each
upper boundary, place a dot at the height of the cumulativefrequency for that class
* Connect each dot with a straight line
Note: It is often useful in business to make a graph like an ogive except using the
cumulative relative frequencies
✔ Bar Graphs
- a bar graph with the bars in decreasing order, where the height of the bar represents
the frequency of an event
Pie Chart: the total quantity (100%) is represented by the entire circle, andeach slice is
represented by a percentage of the area Note: pie charts are misleading charts in general. It
is never okaytorepresent a linear quantity using area; it over exaggerates the difference
✔ Time Series Graph
- Data sets composed of similar measurements taken at regular intervals over time are
time series.
- Put time on the horizontal axis and the variable measured onthevertical axis and the
variable measured on the vertical axis. We place dot for each time at the height of the variable
and connect the dots
- These are often used in economics, finance, sociology, medicine, andany other
situation in which we want to study or monitor a similar measureover a period of time. Time
series graph can reveal some of the main featuresof a time series