0% found this document useful (0 votes)

17 views

STAT22209 - Chapter 03-Multiple Regression - 2022

Uploaded by

Hasitha Dhananjaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

STAT22209 - Chapter 03-Multiple Regression - 2022

Uploaded by

Hasitha Dhananjaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Advanced Statistics II

( PST22209/ FST 22209/ ESNRM22209)

R.M. KAPILA RATHNAYAKA

B.Sc. Special (Math. & Stat. ) (Ruhuna), M.Sc. (Industrial Mathematics) (USJ),
M.Sc. (Stat. ) (WHUT, China),
Ph.D. (Applied Statistics, WHUT)
Why we need alternative method than
Linear Regression…….
Polynomial Regression
• In situations where the functional relationship between the
response Y and the independent variable x cannot be
adequately approximated by a linear relationship, it is
sometimes possible to obtain a reasonable fit by considering a
polynomial relationship.

• where are regression coefficients that would have to be estimated

• h is called the degree of the polynomial.

• To determine these estimators, we take partial derivatives
with respect to of the foregoing sum of squares, and then
set these equal to 0 so as to determine the minimizing
values.

• On doing so, and then rearranging the resulting equations,

we obtain that the least square estimators satisfy the
following set of linear equations called the normal
equations.
Degree of the polynomial

• where h is called the degree of the polynomial. For

lower degrees, the relationship has a specific names.
• h = 2 is called quadratic
• h = 3 is called cubic,
• h = 4 is called quartic, and so on.
Second-degree Polynomial – quadratic Trend
• Practically, most of the real world data patterns are best described by

curves, not straight lines. In these instances, the linear trend model does

not adequately describe the change in the variable as time changers.

• To overcome this problem, we often use a parabolic curve, which is

described by mathematically by a second-degree equation.

• The general form for an estimated second-degree equation is;

• Where;

• estimate of the dependent variable

• numerical constants
• However, we can determine the values of the numerical constants
from the following three equations.
Second-degree Polynomial – quadratic Trend
Applications
• Fit a polynomial to the following data.
X Y
0 0
1 0
2 2
3 6
4 12
• However, we can determine the values of the
numerical constants from the following three
equations. X Y
0 0
1 0
2 2
3 6
4 12
Example
• Fit a polynomial to the following data.
quadratic Trend
• However, we can determine the values of the numerical constants
from the following three equations.
• the estimated quadratic regression equation I

• The estimated quadratic regression equation is

Matrix notation to solve equation system

• which has the solution

which has the solution

Example 2
• You are studying the relationship between a particular
machine setting and the amount of energy consumed.
• log transformation of the response variable will produce a
more symmetric error distribution.
Multiple Linear Regression

• Multiple regression is an extension of simple linear

regression.

• It is used when we want to predict the value of a variable

based on the value of two or more other variables.

• Suppose that we have a linear model

Example
• you could use multiple regression to understand whether
exam performance can be predicted based on
– revision time,
– test anxiety,
– lecture attendance
– gender.
• Alternately, you could use multiple regression to understand
whether daily cigarette consumption can be predicted based
on
– smoking duration,
– age when started smoking,
– smoker type,
– income
– gender.
Assumption #1:

• Dependent variable should be measured on a continuous

scale (i.e., it is either an interval or ratio variable)

• Example:
– revision time (measured in hours),
– intelligence (measured using IQ score),
– exam performance (measured from 0 to 100),
– weight (measured in kg)
Assumption #2:

• Two or more independent variables, which can be either

continuous (i.e., an interval or ratio variable) or categorical
(i.e., an ordinal or nominal variable).

• Examples of nominal variables include ;

– gender (male and female),

– ethnicity (Caucasian, African American and Hispanic),

– physical activity level (sedentary, low, moderate and high),

– profession (surgeon, doctor, nurse, dentist, therapist),

Numerical Data (Data that is Numbers) :
Continuous Random Variables

• Continuous Variable –

Continuous variables is a variable whose value is obtained

by measuring.
height of students in class
weight of students in class
 time it takes to get to school
distance traveled between classes
Numerical Data (Data that is Numbers) :
Discrete Random Variables
• A discrete variable is a variable whose value is obtained by
counting.

• All continuous variables are numeric, but not all numeric

variables are continuous.

• Examples:
– number of students present
– number of red marbles in a jar
– number of heads when flipping three coins
– students’ grade level
Categorical Data (Data that is not
numbers) : Nominal Variable
• Sometimes there is no hierarchy in categorical data.
• If eye colour was coded
– 0-- “Blue”
– 1 --“Green”
– 2 --“Brown”

we have to randomly choose which option gets which

number.
• It doesn’t matter whether Blue eyes is zero, or one, or two,
because there is no hierarchy in eye colour.
Categorical Data (Data that is not
numbers) : Ordinal Variable
• Annoying surveys often ask you to answer with the options
“Strongly Disagree”, “Disagree”, “Neutral”, “Agree” or
“Strongly agree”.
• This data has a special structure, because if these are coded 0
“Strongly Disagree” to 4 “Strongly agree”;
– 0 = Strongly Disagree
– 1 = Disagree
– 2 = Neutral
– 3 = Agree
– 4 = Strongly agree
Assumption #3:
• Your data needs to show homoscedasticity, which is where the
variances along the line of best fit remain similar as you move
along the line.

Assumption #4:
• Data must not show multicollinearity, which occurs when
you have two or more independent variables that are highly
correlated with each other.
What is Multicollinearity?
The following data on 20 individuals with high blood pressure:
1. blood pressure (y = BP, in mm Hg)

2. age (x1 = Age, in years)

3. weight (x2 = Weight, in kg)

4. body surface area (x3 = BSA, in sq m)

5. duration of hypertension (x4 = Dur, in years)

6. basal pulse (x5 = Pulse, in beats per minute)

7. stress index (x6 = Stress)

BP Age Weight BSA Dur Pulse
Age 0.659
Weight 0.950 0.407
BSA 0.866 0.378 0.875
Dur 0.293 0.344 0.201 0.131
Pulse 0.721 0.619 0.659 0.465 0.402
Stress 0.164 0.368 0.034 0.018 0.312 0.506

• Cell Contents: Pearson correlation

• Blood pressure appears to be related fairly strongly to Weight (r = 0.950)
and BSA (r = 0.866), and hardly related at all to Stress level (r = 0.164).
• Weight and BSA appear to be strongly related (r = 0.875)

• The high correlation among some of the predictors suggests that data-
based multicollinearity exists.
Assumption #5:
• There should be

– no significant outliers,
– high leverage points
– highly influential points.
• These different classifications of unusual points reflect the different
impact they have on the regression line.
What are outliers in the data?
• An outlier is an observation that lies an abnormal distance from other values
in a random sample from a population.

• The box plot is a useful graphical display for describing the behavior of the
data in the middle as well as at the ends of the distributions.

• The following quantities (called fences) are needed for identifying extreme
values in the tails of the distribution:
– lower inner fence: Q1 - 1.5*IQ
– upper inner fence: Q3 + 1.5*IQ
– lower outer fence: Q1 - 3*IQ
– upper outer fence: Q3 + 3*IQ

• A point beyond an inner fence on either side is considered a mild outlier. A

point beyond an outer fence is considered an extreme outlier.
Assumption #5:
• You should have independence of observations (i.e., independence of

residuals), which you can easily check using the Durbin-Watson statistic

Assumption #6:
• There needs to be a linear relationship between
– the dependent variable and each of your independent
variables
Assumption #7:
• Finally, you need to check that the residuals (errors) are
approximately normally distributed

• Two common methods to check this assumption include using:

– histogram (with a superimposed normal curve) and a
Normal P-P Plot;
– Normal Q-Q Plot of the studentized residuals.
Multiple Linear Regression

• Suppose that we have a linear model

And we make independent observations , on .

• We can write the observation as

• where is the setting of the independent variable for the

observation, .
,

• We now define the following matrices, with :

•
• , , ,
• Thus, the equations representing as a function of the ’s, ’s,
and ’s can be simultaneously written as
Regression with Two Independent Variables
• For observations from a simple linear regression model of the form
,
, , ,
• The least-squares equations for and were given in the previous
section as
Regression with Two Independent Variables
• Assume the model production function below,

.
• Where is total production, is labor input, is total capital and
the information about each factor is given below for the 15
year period of 2001 to 2016.

Year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

20 35 30 47 60 68 76 90 100 105 130 140 125 120 135

10 15 21 26 40 37 42 33 30 38 60 65 50 35 42

12 10 9 8 5 7 4 5 7 5 3 4 3 1 2
• By using above data, estimate the and
parameters of

by using the ordinary least square (OLS) method.

Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Stateline Shipping and Transport Company-Kel 4
No ratings yet
Stateline Shipping and Transport Company-Kel 4
12 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Shenzhen Bao'an International Airport, China
No ratings yet
Shenzhen Bao'an International Airport, China
8 pages
ZERO Article
100% (1)
ZERO Article
3 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
10 Regression Analysis
No ratings yet
10 Regression Analysis
55 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Linear Regression
100% (2)
Linear Regression
28 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Chapter three
No ratings yet
Chapter three
35 pages
New Section 1
No ratings yet
New Section 1
39 pages
High Yield Notes
No ratings yet
High Yield Notes
251 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
No ratings yet
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
6 pages
01 - Quantitative Methods
No ratings yet
01 - Quantitative Methods
28 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Cfa Level 2 2023 Summary
No ratings yet
Cfa Level 2 2023 Summary
100 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
AMA3602Final2024Fall Ray
No ratings yet
AMA3602Final2024Fall Ray
21 pages
Unit 3
No ratings yet
Unit 3
24 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Econometric estimation BETA
No ratings yet
Econometric estimation BETA
36 pages
Chapter 6 (Part Ii)
No ratings yet
Chapter 6 (Part Ii)
41 pages
Module 4
No ratings yet
Module 4
33 pages
Financial Statistics - Formula Sheet
No ratings yet
Financial Statistics - Formula Sheet
26 pages
Econometrics
No ratings yet
Econometrics
13 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
26 pages
Lecture Notes - Econometrics I - Andrea Weber
No ratings yet
Lecture Notes - Econometrics I - Andrea Weber
119 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
BAB 7 Multiple Regression and Other Extensions of The Simple
No ratings yet
BAB 7 Multiple Regression and Other Extensions of The Simple
17 pages
Regression
No ratings yet
Regression
9 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
What Is Simple Linear Regression?
No ratings yet
What Is Simple Linear Regression?
7 pages
Oversikt ECN402
No ratings yet
Oversikt ECN402
40 pages
Data Science Q&A - Latest Ed (2020) - 3 - 1
No ratings yet
Data Science Q&A - Latest Ed (2020) - 3 - 1
2 pages
Notes2
No ratings yet
Notes2
16 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
WST 311 Notes part 2 2024
No ratings yet
WST 311 Notes part 2 2024
21 pages
Cursus Advanced Econometrics
No ratings yet
Cursus Advanced Econometrics
129 pages
Lecture 10
No ratings yet
Lecture 10
5 pages
Econometrics: Damodar Gujarati
No ratings yet
Econometrics: Damodar Gujarati
36 pages
Example How To Perform Multiple Regression Analysis Using SPSS Statistics
100% (1)
Example How To Perform Multiple Regression Analysis Using SPSS Statistics
14 pages
Week 6: Assumptions in Regression Analysis
No ratings yet
Week 6: Assumptions in Regression Analysis
69 pages
6: Regression and Multiple Regression: Independent Variable. Then, Click
No ratings yet
6: Regression and Multiple Regression: Independent Variable. Then, Click
9 pages
6: Regression and Multiple Regression: Independent Variable. Then, Click
No ratings yet
6: Regression and Multiple Regression: Independent Variable. Then, Click
9 pages
6: Regression and Multiple Regression: Independent Variable. Then, Click
No ratings yet
6: Regression and Multiple Regression: Independent Variable. Then, Click
9 pages
7-Multiple Regression
No ratings yet
7-Multiple Regression
17 pages
Chapter 2 Regression Analysis Notes
No ratings yet
Chapter 2 Regression Analysis Notes
11 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Regression 101
No ratings yet
Regression 101
18 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Physics Formulas 11th First Book
No ratings yet
Physics Formulas 11th First Book
16 pages
Python_Mastery_Guide_Complete
No ratings yet
Python_Mastery_Guide_Complete
6 pages
WWW - Studymaterialz.in: Signal and System Important 30 MCQ PDF With Solution
100% (1)
WWW - Studymaterialz.in: Signal and System Important 30 MCQ PDF With Solution
71 pages
Aapt United States Physics Team AIP 2014: 2014 F Ma Contest 25 Questions - 75 Minutes Instructions
No ratings yet
Aapt United States Physics Team AIP 2014: 2014 F Ma Contest 25 Questions - 75 Minutes Instructions
13 pages
The McEliece Cryptosystem
No ratings yet
The McEliece Cryptosystem
11 pages
(Springer Series in Statistics) Albert W. Marshall, Ingram Olkin - Life Distributions - Structure of Nonparametric, Semiparametric, and Parametric Families - Springer (2007)
No ratings yet
(Springer Series in Statistics) Albert W. Marshall, Ingram Olkin - Life Distributions - Structure of Nonparametric, Semiparametric, and Parametric Families - Springer (2007)
783 pages
(OR) - EPGP Kochi - AMBA Course Outline
No ratings yet
(OR) - EPGP Kochi - AMBA Course Outline
3 pages
7es DLP For Math8
100% (1)
7es DLP For Math8
7 pages
Graph Theory Model Exam
No ratings yet
Graph Theory Model Exam
12 pages
Models - Woptics.fabry Perot PDF
No ratings yet
Models - Woptics.fabry Perot PDF
18 pages
Computational Fluid Dynamics
No ratings yet
Computational Fluid Dynamics
22 pages
Article A Generalization of Powers-Størmer
No ratings yet
Article A Generalization of Powers-Størmer
8 pages
Exponential Functions Activity Sheets
No ratings yet
Exponential Functions Activity Sheets
7 pages
Engineering Metrology
No ratings yet
Engineering Metrology
10 pages
Amc2016 Intermediate
No ratings yet
Amc2016 Intermediate
6 pages
2023 Cssa Extension 1
No ratings yet
2023 Cssa Extension 1
12 pages
6a-García Et Al2017 - Does An IFRS Adoption Increase Value Relevance
No ratings yet
6a-García Et Al2017 - Does An IFRS Adoption Increase Value Relevance
14 pages
Fuzzy Theory: Presented by Gao Xinbo E.E. Dept. Xidian University
No ratings yet
Fuzzy Theory: Presented by Gao Xinbo E.E. Dept. Xidian University
76 pages
J Electre v1.0 User Guide
No ratings yet
J Electre v1.0 User Guide
21 pages
Parhi CV
No ratings yet
Parhi CV
62 pages
Force & Law of Motion
No ratings yet
Force & Law of Motion
2 pages
Evaluation of Methods For Design Discharge Estimation in Ungauged Catchments, A Case of Tigithe River Catchment in Mara River Basin
No ratings yet
Evaluation of Methods For Design Discharge Estimation in Ungauged Catchments, A Case of Tigithe River Catchment in Mara River Basin
11 pages
Amit Sir - Assignment
No ratings yet
Amit Sir - Assignment
19 pages
Kusiak 2010
No ratings yet
Kusiak 2010
9 pages
Density Based Smart Traffic Control System UsingCanny Edge Detection Algorithm For CongregatingTraffic Information
No ratings yet
Density Based Smart Traffic Control System UsingCanny Edge Detection Algorithm For CongregatingTraffic Information
38 pages
Logic Gates and Circuits
No ratings yet
Logic Gates and Circuits
36 pages
Falconfx Strategy Handbook v2pdf
100% (1)
Falconfx Strategy Handbook v2pdf
38 pages

Uploaded by

Uploaded by

Advanced Statistics II

( PST22209/ FST 22209/ ESNRM22209)

R.M. KAPILA RATHNAYAKA

• where are regression coefficients that would have to be estimated

• h is called the degree of the polynomial.

• On doing so, and then rearranging the resulting equations,

• where h is called the degree of the polynomial. For

not adequately describe the change in the variable as time changers.

• To overcome this problem, we often use a parabolic curve, which is

described by mathematically by a second-degree equation.

• The general form for an estimated second-degree equation is;

• estimate of the dependent variable

• The estimated quadratic regression equation is

• which has the solution

which has the solution

• Multiple regression is an extension of simple linear

• It is used when we want to predict the value of a variable

• Suppose that we have a linear model

• Dependent variable should be measured on a continuous

• Two or more independent variables, which can be either

• Examples of nominal variables include ;

– ethnicity (Caucasian, African American and Hispanic),

– physical activity level (sedentary, low, moderate and high),

– profession (surgeon, doctor, nurse, dentist, therapist),

Continuous variables is a variable whose value is obtained

• All continuous variables are numeric, but not all numeric

we have to randomly choose which option gets which

2. age (x1 = Age, in years)

3. weight (x2 = Weight, in kg)

4. body surface area (x3 = BSA, in sq m)

5. duration of hypertension (x4 = Dur, in years)

6. basal pulse (x5 = Pulse, in beats per minute)

7. stress index (x6 = Stress)

• Cell Contents: Pearson correlation

• A point beyond an inner fence on either side is considered a mild outlier. A

• Two common methods to check this assumption include using:

• Suppose that we have a linear model

And we make independent observations , on .

• We can write the observation as

• where is the setting of the independent variable for the

• We now define the following matrices, with :

20 35 30 47 60 68 76 90 100 105 130 140 125 120 135

by using the ordinary least square (OLS) method.

You might also like