Topic 7 - Sample
Topic 7 - Sample
Prerequisite Content Knowledge: The teacher-participants recall the concept on Correlation Analysis.
Prerequisite Skill: The teacher-participants solve, interpret, and test the significance of the value of Pearson’s r.
The trainer will provide the following questions to assess the prior knowledge of the teacher-participants then the results will be processed.
Prerequisite Assessment: 5-Item Fill in the Blanks and Solving (Online/Offline)
Content Knowledge Skills
Listed below are the heights in centimeters and weights in kilograms of six teachers. Solve for
the Pearson Product Moment Correlation Coefficient and construct a scatterplot. Perform the
steps of hypothesis testing using α = 0.05. (25pts.)
Teacher A B C D E F
Height (in cm) 160 162 167 158 167 170
Weight (in kg) 50 59 63 52 65 68
Fill in the blanks. Complete the statement below by choosing your answers in the box
provided above.
Note:
Teacher-participants with internet Connection:
- can use MS Excel or other online platforms.
Teacher-participants without internet Connection:
- may use graphing paper, and other tools available offline.
|𝑡𝑐| >
α
2
(𝑛 − 2)
|𝑡𝑐| >
0.05
2
(6 − 2) = 0. 1
0.9552 6−2
𝑡𝑐 = 2
= 6. 45
1−(0.9552)
Terminologies:
Bivariate data - examines the relationship between two variables.
Example: The relationship between math and physics grades of the senior high school students.
PRIVATE EDUCATION ASSISTANCE COMMITTEE
Scatter Plot - a graph of plotted points that show the relationship between two sets of data. It consists of a series of points plotted on a rectangular coordinate
plane.
● No Correlation - the points are scattered, meaning there is no pattern showing the relationship of the two variables.
Linear Correlation - is a statistical method used to determine the existence, strength, and direction of the relationship between two variables.
Pearson Product Moment Correlation Coefficient or Pearson’s r - is used to quantify the relationship between two variables.
The correlation coefficient is always a value from -1 to 1. The sign indicates the direction of the relationship while its magnitude (i.e., distance from 0) suggests
the strength of the relationship (Canlapan, 2016).
Interpretation:
PRIVATE EDUCATION ASSISTANCE COMMITTEE
If the teacher-participants are still having difficulty in correlation, they may access the following videos.
The teacher-participant will recall the steps on hypothesis testing involving Pearson’s r.
Example: A doctor wants to verify if there is a significant correlation between a person’s age and blood pressure. She randomly selects 6 subjects to test this at 0.05 level of
significance. Perform the steps in hypothesis testing given this situation.
A 43 128
B 48 120
C 56 135
D 61 143
E 67 141
F 70 152
PRIVATE EDUCATION ASSISTANCE COMMITTEE
Solution:
There is a very strong positive linear relationship between age and blood pressure.(As age increases, blood pressure increases.)
|𝑡𝑐| >
α
2
(𝑛 − 2)
|𝑡𝑐| >
0.05
2
(6 − 2) = 0. 1
0.897 6−2
𝑡𝑐 = 2
= 4. 05
1−(0.897)
Introduction
Note: The trainer can now proceed to this once the learning needs of the teacher-participants have been met.
Through integrated approaches (Real-Life data Analysis, Socratic Method, and Guided Approach), an understanding of regression analysis will aid the
teacher-participants in forecasting for sound decision-making in different disciplines. (Drawing Attention to Meaning)
PRIVATE EDUCATION ASSISTANCE COMMITTEE
Should the teacher-participants have questions, clarifications, or concerns, they can contact the trainer through [email protected] or through mobile number +63
91x-xxx-xxxx.
The trainer engages the teacher-participants by posing the problem and questions below.
Mrs. Jocelyn’s Pizza Shop sells a pizza that costs ₱95.00 with the first topping, and then an additional ₱25.00 for each
additional topping.
3. In computing the amount you will pay for a pizza, what do you think should always be checked and considered?
PRIVATE EDUCATION ASSISTANCE COMMITTEE
Ans.: The amount of toppings to be added in the pizza.
4. How do you describe the relationship between the amount you pay when buying pizza and the number of toppings on a
pizza?
Ans.: There is a positive correlation between the amount of pizza and the numberof toppings.
5. Can you make an equation that shows the relationship between the amount you pay and the number of toppings
added?
Ans.: Let x = number of toppings, y = amount of pizza
y = 95 + 25(x - 1)
The teacher-participants recall the concepts of correlation by describing the relationship between the amount they have to pay
and the number of additional toppings that they add. Then the teacher-participants will create an equation that will describe their
observation.
Guide Questions:
● How can understanding correlation and the slope-intercept form help us make predictions or analyze trends in real-life
scenarios beyond pizza pricing? Possible Answer: If we know that two variables are associated and there is a formula that
connects the two variables, we can estimate the value of one variable given another variable.
● Can you think of any situations where we can predict one element in terms of another related element? (Prompting for
Effortful Thinking)
Introduction: Forensic anthropology is the application of the science of physical or biological anthropology to the legal process.
Physical or biological anthropologists who specialize in forensics primarily focus their studies on the human skeleton. This activity
highlights the concept that certain physical attributes (like femur length) can be used as predictors for another variable (height).
Instructions:
1. The teacher-participants form groups of 3 to 4 members.
PRIVATE EDUCATION ASSISTANCE COMMITTEE
2. The teacher-participants measure their height and the length of their femur.
3. The formula for estimating height given a femur length is presented.
where:
P represents the person’s height
F stands for the known length of the bone (femur) through measurement
Reference: http://www.most.org/wp-content/uploads/2016/04/Bones_Can_Tell_Us_More.pdf
Based on the previous activities, the teacher-participants answer the following questions leading to the discussion of the
lesson:
What is regression analysis? What is ● What do you think is the main goal of our lesson today based on the given situations? Do you think this has a connection to
the importance of identifying our previous lesson about correlation?
dependent and independent variables ● What is regression analysis?
in a regression problem?
The trainer will lead the teacher-participants to making sense of the word regression. (Drawing Attention to Meaning)
History (Optional): The term "regression" was coined by Francis Galton in the 19th century to describe a biological phenomenon.
The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average (a
phenomenon also known as regression toward the mean)(Galton, reprinted 1989). For Galton, regression had only this biological
meaning (Galton, 1887), but his work was later extended by Udny Yule and Karl Pearson to a more general statistical context
(Pearson, 1903)
Reference: Why are regression problems called “regression” problems? (n.d.). Cross Validated.
https://stats.stackexchange.com/questions/11087/why-are-regression-problems-called-regression-problems#:~:text=%22Regression%22%20comes%2
0from%20%22regress,clearer%20and%20more%20meaningful%20model.
The trainer introduces the lesson about regression analysis. The teacher-participants come up with the definition of regression
analysis, that is, a statistical method that determines the nature of the relationship between variables which is either positive or
negative, linear or nonlinear. The teacher-participants realize that regression analysis can only be used if there is a significant
relationship between variables. In addition, the teacher-participants understand that the regression equation enables us to predict
the value of the dependent variable given the value of the independent variable.
Simple linear regression is a regression model that uses a straight line to evaluate the association between two independent
variables. In a nutshell, simple linear regression is used when there is only one explanatory variable, but multiple regression is
used when there are numerous explanatory variables.
The purpose of simple linear regression is to forecast or anticipate. The method can be used to create a predictive model from
observed x and y values. After solving for the model, one can use it to calculate the value of y for any additional x value.
Process Questions:
● What do you think are the factors that we need to establish in order to use regression analysis?
● Can regression analysis be applied in interpreting any bivariate data? (Prompting Connections to Prior Knowledge)
Each data point represents two variables when bivariate data is shown on the xy coordinate plane with a scatter plot. The
independent variable influences the dependent variable, but the dependent variable does not impact the independent variable.
PRIVATE EDUCATION ASSISTANCE COMMITTEE
Try These:
Consider which variables are affected and which contribute to the change and determine which is the independent (I) or
dependent (D) variable in the statements below.
1. The time it takes to run a mile depends on the person’s running speed.
I: running speed D: time
2. The height of bean plants depends on the amount of water they receive.
I: amount of water_ D: height of bean plants
3. The higher the temperature of the air in the oven, the faster a cake will bake.
I: oven temperature D: baking time
Guide question: What is the importance of identifying dependent and independent variables in a regression problem? (Prompting
for Effortful Thinking)
46 187
29 142
PRIVATE EDUCATION ASSISTANCE COMMITTEE
35 161
38 164
30 140
27 131
Process Questions:
● How do you describe the distance of the points in the graph?
● Can we really form one line using all the points in the graph? (Prompting for Effortful Thinking)
While many lines can be drawn from a set of points, a regression line or the line of best fit can be formed. The term "line of best
fit" refers to a line that runs across a scatter plot of data points and best reflects the relationship between them. It divides the points
on the scatter plot such that the number of points above is approximately equal to the number of points below.
Note that the closer the value of r to 1 or -1, the closer the points are to the regression line, the better the fit and prediction will
be. Also, in determining the equation of the regression line, there must be a significant strong linear relationship between the two
variables.
𝑎= ( 2)
(Σ𝑦) Σ𝑥 −(Σ𝑥)(Σ𝑥𝑦)
2 2
𝑛(Σ𝑥 )−(Σ𝑥)
𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦)
𝑏=
( 2)
𝑛 Σ𝑥 −(Σ𝑥)
2
PRIVATE EDUCATION ASSISTANCE COMMITTEE
Note: If the slope is positive, an increase in x would imply an increase in y. This means that the change in the independent
variables is associated with a change in the dependent variable. The larger the regression coefficient means the more change.
Using the Gradual Release of Responsibility (GRR) approach, the following examples will be given by the trainer.
Example 1: A researcher wants to know if weight depends on the height of the person. With this, he collected a sample 0f 5
students in high school. Consider the following data on the height (in inches) and weight (in kilograms) of a random sample of five
students in high school.
Student 1 2 3 4 5
Height (x) 58 59 61 62 63
Weight (y) 48 50 47 54 55
a. Construct a scatter plot and determine the regression line that best fits the data.
b. Compute the slope (a) and y-intercept (b) then determine the equation of the regression line.
c. Find the predicted value of the weight when the height is given as 62 inches.
PRIVATE EDUCATION ASSISTANCE COMMITTEE
Solution:
a. Construct a scatter plot and determine the regression line that best fits the data.
b. Compute the slope (a) and y-intercept (b) then determine the equation of the regression line.
PRIVATE EDUCATION ASSISTANCE COMMITTEE
𝑎= ( 2)
(Σ𝑦) Σ𝑥 −(Σ𝑥)(Σ𝑥𝑦)
2 2
𝑛(Σ𝑥 )−(Σ𝑥)
𝑎 = -25.30 → y-intercept
𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦)
𝑏=
( 2)
𝑛 Σ𝑥 −(Σ𝑥)
2
𝑏 = 1.26 → slope
Therefore, the equation of the regression line is 𝑦' = − 25. 30 + 1. 26𝑥. Hence, as height increases by 1 unit, weight increases
by 1.26.
c. Find the predicted value of the weight when the height is given as 60 inches.
y′=-25.30+1.26x
y′=-25.30+1.26(60)
y’ = 50.30
Therefore, the predicted value of the weight is 50.30 kilograms when the height is 60 inches.
The Coefficient of Determination (r2) is a measure of predicting the power of linear regression analysis. It is the value between 0
and 1 that gives the percentage of the variance in the dependent variable predictable from the independent variable.
Note that a coefficient of determination of 0 means that the linear regression model cannot predict the dependent variable from the
independent variable. On the other hand, a coefficient of determination of 1 means that the dependent variable can be predicted
perfectly from the independent variable. The table below shows the strength of prediction of the linear model.
To compute the coefficient of determination for a simple linear regression model, simply square the correlation coefficient.
In the first example, the value of r is 0.73.
If the value of r in Example 1 is 0. 731, what is the value of the coefficient of determination of the regression model?
2 2
𝑟 = (0. 731) 𝑥 100 = 0. 5329
PRIVATE EDUCATION ASSISTANCE COMMITTEE
The computed value means that the linear regression model has a predictive power of 53.29%. This means that the independent
variable, height, has a strong influence in predicting the dependent variable, weight.
CAUTION: In predicting the value of the dependent variable, consider the realistic range (minimum and maximum values) of the
independent variable. This is vital for accurate predictions. The model's accuracy is compromised when predicting values
outside the observed range, as it hasn't been trained for such scenarios. Thus, understanding and respecting the boundaries of
the independent variable(s) ensure the model's validity and reliability in practical applications.
Example 2: A marketing manager conducted a study to determine whether there is a linear relationship between money spent (x in
thousands) on advertising and company sales (y in thousands). He then gathered data from advertising companies about this. The
table below shows the data he gathered.
Company Sales 225 184 220 240 180 184 186 215
(in thousands), y
a. Find the equation of the regression line for the money spent on advertising and company sales;
b. Interpret the slope and the y-intercept in the context of the problem.
c. Compute for the coefficient of determination of the linear regression line.
PRIVATE EDUCATION ASSISTANCE COMMITTEE
Solution:
a. Solution:
𝑎= ( 2)
(Σ𝑦) Σ𝑥 −(Σ𝑥)(Σ𝑥𝑦)
2 2
𝑛(Σ𝑥 )−(Σ𝑥)
(1634)(32.44) − (15.8)(3289.8)
= 2
8(32.44) − (15.8)
𝑎 = 104.06
𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦)
𝑏=
( 2)
𝑛 Σ𝑥 −(Σ𝑥)
2
(8)(3289.8) − (15.8)(1634)
= 2
8(32.44) − (15.8)
𝑏 = 50.73
b. Interpretation:
The y-intercept (a = 104.06) indicates that when a company does not spend on advertisement (x = 0), the mean sales is between
104.06.
The slope (b = 50.73) indicates that for each increase of one (thousand) in advertisement, the mean change in company sales is
predicted to be +50.73 (thousand).
2 2
c. 𝑟 = (0. 913) 𝑥 100 = 0. 8335
This means that the linear regression model has a predictive power of 83.35%. This means that the independent variable,
advertising cost, has a strong influence in predicting the dependent variable, sales.
A sociologist found that there is a linear relationship between family income level (in thousands of dollars) and the percent of
income donated to charities. Find the equation of the regression line for the income level and percent of donation. Interpret the
slope and y-intercept in the context of the problem. Determine the coefficient of correlation, given r = -0.916
Percent donated, y 9 10 8 5 6 3
PRIVATE EDUCATION ASSISTANCE COMMITTEE
Solution:
𝑎= ( 2)
(Σ𝑦) Σ𝑥 −(Σ𝑥)(Σ𝑥𝑦)
2 2
𝑛(Σ𝑥 )−(Σ𝑥)
(41)(19458) − (336)(2159)
= 2
6(19458) − (336)
𝑎 = 18.78
𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦)
𝑏=
( 2)
𝑛 Σ𝑥 −(Σ𝑥)
2
(6)(2159) − (336)(41)
= 2
6(19458) − (336)
𝑏 = -0.21
b. Interpretation:
PRIVATE EDUCATION ASSISTANCE COMMITTEE
The y-intercept (a = 18.78) indicates that when the (x=0), the mean donation is 18.78.
The slope (b = -0.21) indicates that for each increase of one (thousand) in income, the mean change in company sales is predicted
to be -0.21 (thousand).
2 2
c. 𝑟 = (0. 916) 𝑥 100 = 0. 8390
This means that the linear regression model has a predictive power of 83.90%. This means that the independent variable, income,
has a strong influence in predicting the dependent variable, donation amount.
Using Example 1, the trainer demonstrates the use of Jamovi as indicated below.
I. Using Jamovi:
a. Construct a scatter plot and draw the line of best fit.
iii. Drag and drop your independent variable to the x-axis and dependent variable to the y-axis and click
“Linear”.
PRIVATE EDUCATION ASSISTANCE COMMITTEE
ii. Drag and drop your outcome variable to the Dependent Variable and your predictor variable to Covariates if it is
continuous or to Factors if it is categorical.
PRIVATE EDUCATION ASSISTANCE COMMITTEE
iv. Use the result for the predictor (intercept and X) as values of “a” and “b” in the formula: y’ = a + bx, respectively.
PRIVATE EDUCATION ASSISTANCE COMMITTEE
c. Using the derived regression equation, give the value when the height is 60 inches.
Solution:
y’ = -25.30 + 1.26x
y’ = -25.30 + 1.26(60)
y’ = -25.30 + 75.60
y’ = 50.30 kilograms
Show the results of Example 2 and the practice exercise (Try This: Think-Pair Share Activity) using Jamovi.
PRIVATE EDUCATION ASSISTANCE COMMITTEE
Example 3. The table below revealed the amount of time students spent studying for a test in Statistics and Probability as a subject
and their test scores.
Solution:
a. Graph the data using a scatter plot and find the best-fit line.
X Y xy x2
1.00 78 78.00 1.00
0.00 75 0.00 0.00
1.50 86 129.00 2.25
2.00 89 178.00 4.00
3.00 92 276.00 9.00
2.75 87 239.25 7.56
1.00 80 80.00 1.00
0.50 77 38.50 0.25
Σ𝑥 = 11. 75 Σ𝑦 = 664.00 Σ𝑥𝑦 = 1018. 75 2
Σ𝑥 = 25. 06
𝑎= ( 2)
(Σ𝑦) Σ𝑥 −(Σ𝑥)(Σ𝑥𝑦)
2 2
𝑛(Σ𝑥 )−(Σ𝑥)
(664)(25.06)−(11.75)(1018.75)
= 2
8(25.06)−(11.75)
𝑎 = 74. 81
𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦)
𝑏=
( 2)
𝑛 Σ𝑥 −(Σ𝑥)
2
8(1018.75)−(11.75)(664.00)
= 2
8(25.06)−(11.75)
𝑏 = 5. 57
c. How do you interpret the slope and the y-intercept in the context of the problem?
PRIVATE EDUCATION ASSISTANCE COMMITTEE
The y-intercept (b =74.81) indicates that when a student does not study for the final exam (x = 0), his/her mean
score is 74.81.
The slope (m = 5.57) indicates that for each increase of one hour in studying time, the mean change in the test
score is predicted to be +5.57.
d. Using the simple linear regression equation, predict the score of the student who spent 3 hours and 30 minutes for
the same test in Statistics and Probability.
Solution:
𝑦' = 74. 81 + 5. 57𝑥
= 74. 81 + 5. 57(3. 5)
= 74. 81 + 19. 50
= 94. 31
Therefore, the student who studied for 3 hours and 30 minutes to take the same test in Statistics and Probability
may have a score of 94.
vi. Drag and drop your outcome variable to Dependent Variable and your predictor variable to Covariates if it is
continuous or to Factors, if it is categorical.
PRIVATE EDUCATION ASSISTANCE COMMITTEE
Therefore, the equation of the regression line is y’ =74.81+5.57x. This means a student studying for his/her test in
Statistics and Probability may get a chance to increase her score by 5.57 per hour spent studying.
c. Using the simple linear regression equation, predict the score of the student who spent 3 hours and 30 minutes for the same
test in Statistics and Probability.
Solution:
𝑦' = 74. 81 + 5. 57𝑥
= 74. 81 + 5. 57(3. 5)
= 74. 81 + 19. 50
= 94. 31
Therefore, the student who studied for 3 hours and 30 minutes to take the same test in Statistics and Probability
may have a score of 94.
[see video as additional material for solving simple linear regression using different statistical applications: ]
Using a calculator:
Simple Linear Regression using Casio fx-991-ES Plus
Linear Regression using a calculator (Casio fx-991Ms)
How to do linear regression on the Sharp EL-W535SA
Correlation and Regression Calculator Shortcut using CANON F-789SGA
Synthesis:
In this lesson, the teacher-participants learn that regression analysis is a statistical technique used to quantify the relationship between predictor variables. Linear regression
is essential for making good decisions because it helps us predict things in a logical way. It gives us a basic setup for understanding how different factors relate to each other
and let us make educated guesses about what might happen in the future. It is crucial in various fields such as economics, finance, social sciences, healthcare, and other
disciplines for decision-making and forecasting purposes.
As Abraham Lincoln says, "The best way to predict the future is to create it!". Linear regression empowers us to create a better future by understanding the relationships
between variables and using that knowledge to forecast potential outcomes. In essence, linear regression not only helps us anticipate what might happen but also empowers
us to actively influence and create the future we envision.
Towards the end, the trainer may now proceed in assessing the teacher-participants’ learning to check their mastery of the topic.
RUA of Student’s Learning:
For teacher-participants in Online Modality For teacher-participants in Onsite Modality
Uploading of accomplished worksheets in the Learning Management System or
Submitting handwritten work.
online platforms such as Shared Google Drive or OneDrive.
PRIVATE EDUCATION ASSISTANCE COMMITTEE
The teacher-participant is tasked to answer the following questions below.
2. Given the following situations, identify the dependent and independent variables.
a. The blood pressure depends on the age of the person.
b. The student test scores depend on the hours spent in studying.
c. Jeepney fare increase depends on the oil price increase.
3. The table below is the estimated world population (in billions) after the year 2010.
a. Find the slope and intercept of the equation of the regression line and interpret in the context of the problem.
b. Determine the regression equation of the line.
c. Using the simple linear regression equation, predict the expected population by 2024.
4. Read and analyze the given situations. Write the letter of the correct answer on your answer sheet and justify your answer.
B. Linear Correlation
C. Linear Regression
a. A business analyst wants to determine the relationship between advertising spending and product sales.
b. The researcher wants to determine if the actual weight of a bottled fruit juice is the same as the claimed weight.
c. An educational institution aims to predict students' overall academic performance (GPA) based on variables like study hours, attendance, and extracurricular
involvement.
d. A guidance counselor wants to know if there is an association between the number of absences of a student and the final grades of the grade 11 students.
6. Research (Performance Task). Search for a study with a correlational research design. Conduct a survey that will produce a similar data as the research that you
chose. Using Jamovi, determine the Pearson Moment Correlation Coefficient of your data. Conduct a hypothesis testing for r. Given the results, can you proceed with
regression analysis? If yes, determine the equation of the regression line. Interpret the slope and intercept in the context of the study.
1. Given the following situations, identify the dependent and independent variables.
a. The weight of the person depends on the time spent exercising.
b. The time spent on social media depends on the hours of sleep.
c. The amount of savings and the hours spent working.
2. According to Giles, 1991, foot length has a correlation with the person’s height. He further suggests that the shoe size can be predicted given the height of the
person. The table below shows the height and shoe size of six randomly selected individuals.
a. Find the slope and intercept of the equation of the regression line and interpret in the context of the problem.
b. Determine the regression equation of the line.
c. Predict the shoe size of a man whose height is 155 cm.
PRIVATE EDUCATION ASSISTANCE COMMITTEE
Reference: Giles, E., and Vallandigham, P. (July 1, 1991). "Height Estimation from Foot and Shoe Print Length." ASTM International. J. Forensic Sci.. July 1991; 36(4):
1134–1151. https://doi.org/10.1520/JFS13129J
3. Read and analyze the given situations. Write the letter of the correct answer on your answer sheet and justify your answer.
1. A business analyst wants to determine the relationship between advertising spending and product sales.
2. The researcher wants to determine if the actual weight of a bottled fruit juice is the same as the claimed weight.
3. An educational institution aims to predict students' overall academic performance (GPA) based on variables like study hours, attendance, and extracurricular
involvement.
4. A guidance counselor wants to know if there is an association between the number of absences of a student and the final grades of the grade 11 students.