0% found this document useful (0 votes)

4 views

DECS Cheat Sheet

The document covers fundamental concepts in probability, random variables, and statistical inference, including mutually exclusive events, conditional probability, and Bayes' Rule. It also discusses regression analysis, including linear regression, multiple regression, and the importance of understanding relationships between variables, as well as techniques for estimating and benchmarking. Additionally, it addresses issues like multicollinearity, influential observations, and the use of log and linear models in data analysis.

Uploaded by

elizabeth.delonge

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

DECS Cheat Sheet

Uploaded by

elizabeth.delonge

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Probability

Mutually Exclusive /Disjoint Events - if the events have no common elements and cannot happen at the same time. P(A or B) = P(A) +
P(B);
Not mutually disjoint = P(A or B) = P(A) + P(B) - P(A and B)

Conditional Probability - The probability that event A happens, given event B has occurred
P(A|B) = P(A and B)/P(B)
If these are independent events, then P(A|B) = P(A)

Independent Events - when the outcome of one event does not change the likelihood of the other; P(A and B) = P(A)*P(B)

Bayes’ Rule
P(B|A) = P(A|B)*P(B) / P(A|B)*P(B) + P(A|Bc)*P(Bc)

Probability Trees

Random Variables
An uncertain numerical outcome
E(aX) = a*E(X)
E(X+Y) = E(X) + E(Y)

Expected Value in Excel = SUMPRODUCT(A range of x values, B range of probability)

Total Revenue Example: X sells for $500 and Y sells for $1200 so E(X + Y) = 500E(X) + 1200E(Y)
Percentage Returns Example: We invest 60% of portfolio in X and 40% in Y so E(X + Y) = 0.60E(X) + 0.40E(Y)

Variance - how close are we to the expected value

Var(X) = p1[x1 – E(X)]2 + p2[x2 – E(X)]2 + … + pn[xn – E(X)]2
Var (X+Y) = Var(X) + Var(Y)
Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y)
Var(aX+bY) = a2Var(X) + b2Var(Y) + 2abCov(X,Y)
Var(aX) = a^2 Var(X)

Two possible investments with independent returns

Asset X: The rate of return of this asset is given by X E(X) = .12 ;Var(X) = .04
Asset Y: The rate of return of this asset is given by Y E(Y) = .05 ;Var(Y) = .01
Consider a 20-80 mix of X and Y
We define P, our portfolio’s returns, to be: P = .2X + .8Y
E(P) = .2E(X) + .8E(Y) = .2(.12) + .8(.05) = .064
Var(P) = .22Var(X) + .82Var(Y) = .04(.04)+.64(.01) = .008
The combined portfolio has less risk and greater returns. Use this to find standard deviation - the measure of risk.

Correlation, Gauss and CLT

Covariance of Random Variables
When X and Y vary together, and the way they vary is correlated. Covariance is the way we measure relationships between variables
Cov(X,Y) = Σ P(X=x,Y=y) [x-E(X)][y-E(Y)]
- Can be small when the two random variables do not tend to move together very much, or they just don’t vary much at all
- Can be large when the two random variables generally move together quite closely or they vary an awful lot.
=COVARIANCE.P(A2:A6,B2:B6)

Correlation
A more reliable and intuitive measure of independence
Corr(X,Y) = Cov(X,Y)/SD(X)∙SD(Y)

Normal Distribution
Normal random variables is completely described by its mean and standard deviation
- Adding or subtracting a constant doesn’t affect, X + b ~ Normal (μ + b,σ)
- Multiplying/dividing by a constant doesn’t affect, aX~ Normal (aμ,aσ)
- Combining - Y = aX+b~ Normal (aμ+b,aσ)
P(X ≤ 2) = NORM.DIST( 2, 5, 3.6, 1) or [x, μ, σ, Cumulative?]
P(X ≥ 20) = 1 –NORM.DIST(20, 10, 3, 1)
How short would a call have to be to be in the quickest 2% of all calls?
=NORM.INV(.02, 10, 3) or (p, μ, σ)

Z = 𝑋−𝜇/𝜎~ Normal (0,1) 𝑃(−1.96≤𝑍≤1.96)=.9500

Standard Normal

CLT - Central Limit Theorum

For any distribution of X when n is large, the sum of n independent trials of X is approximately normally distributed. Used when we only

Sum ~ Normal (nμ, sqrt(𝑛)𝜎) ; Where E(X) = nμ and Var(X) = nσ2

care about the SUM TOTAL about the total outcome of trials.

Average ~ Normal (μ, 𝜎/sqrt(𝑛)) ; Where E(X) = μ and Var(X) = σ2/n

Sampling Distributions

Xbar~ Normal (μ, 𝜎/sqrt(n))

Confidence Intervals

Suppose we know the average of n trials. Then we are 95% confident that
the true population mean, μ, falls within the following range:

T-distribution -- used when we don’t know the

population standard deviation.
95% confidence interval in an uncertain
population standard deviation case would be

You can get the t-value from excel: T.INV.2T(tail probability, n -1)

Statistical Inference / Hypothesis Testing

If a certain condition were met, how surprising would it be to see data like this? The significant or p-value.
Claim: The population mean exceeds $10,000.
Supporting evidence:
1. X bar= $10,362, s= $2,287, n= 40.
2. “Such a high sample mean would be surprising if the population mean did not exceed $10,000”
The nullhypothesis(H0): the “default” claim.
The alternative hypothesis (Ha): the claim one is trying to establish (or “demonstrate” or “prove”).
For us:
H0: μ≤ 10,000
Ha: μ> 10,000
=T.DIST.RT(t,n-1,1) the probability of getting an average this high (or higher)
Conclude:If the population mean were 10,000, there would only be a 16.1% chance of getting sample mean this large or larger. In our
case a p-value of 16% is not considered “small enough” to reject the null hypothesis. p > αso we do not reject the null hypothesis
The p-value of our test is p= 16.1%.

Other tests
=T.DIST (t,n-1,1) if testing <Ho
=T.DIST.2T(t,n-1,1) if testing not equal to Ho

Linear Regression: Prediction

What single factor is the biggest determinant in the price of a home?

Y= β0+ β1X1+ 𝜖𝜖
The Regression MODEL –

Β0 - Y-intercept
Β1 - Slope / rate of change between X and Y

Best-Fit Line - Least Squares Method

The Regression LINE –
ŷ = b0 + b1x1
Degrees of freedom = n-k-1 where k is the number of x variables
Statistical Significance ≠ Economic Significance

Root MSE - Root Mean Squared Error

Uses our data set to estimate standard deviation from the error term

R-squared
Used to measure regression performance, but is NOT AS IMPORTANT AS IT SEEMS. It only measures the percentage of the variance of
Y that can be explained by the value of X.
- If high it means that the regression equation does a much better job predicting y as compared to the sample mean y
- It is based on observations included in the regression -- only useful if the model that generates future observations is the same as the
model that generated your data.

Prediction Interval
When we want to be 95% confident about the outcome of one particular instance. KPREDINT: Estimate a single observation – only
LEVEL questions also gives prediction interval Whereas the confidence interval is for the mean. Kpredint - creates a prediction interval
for the sales in any one specific store with X=x. Prediction interval is always bigger than CI because SE(induv pred)2 = SE(mean) 2 +
SE(regression)2. The actual value of Y in a line of data (e.g. one specific store) is equal to the mean of the Y variable for other specific
stores with the same X variable levels plus an error term, which measures the effect of other variables impacting that specific data point.

Regression: Estimating and Benchmarking

Rate of Change / Estimating
Instead of using a regression equation to make predictions of Y for a particular value of X, we want to use it to understand how changes
in X relate to changes in Y. The key here is the slope. It answers the question when X1 increases by ONE, by how much does Y
increase? Use Klincom.
- CAPM or firm specific risk, where we compare to a beta of 1
- Economics of Pricing

Linear Combinations
Klincom - Lets you run hypotheses about linear combinations of regression coefficients. The output will be the mean y across the
population of y with that specific x
KLINCOM: Estimate the mean of a variable while controlling for other factors (for LEVEL (include constant) or CHANGE(don’t include
constant) question related to the mean OR for CHANGE question related to SINGLE OBS) e.g. “klincom _b[cons] + _b[VAR1
Name]*(value level)
To determine if a mean Y value is above e.g. 10, then “klincom _b[cons] +_b[VAR 1]*(VALUE) – 10; returns p-value mean > 10

Benchmarking
Using the regression model to benchmark certain situations. Can use residuals as rankings to see how performance pans out given what
was expected.
- Run the regression
- predict residuals, residuals
- sort residuals - smallest to largest
- gsort -residuals - largest to smallest

Multiple Regression
Multiple Regression
To solve for omitted variable bias, and isolate the effect of a specific variable on the dependent variable, we include control variables. We
use this to basically replicate the conditions of a Randomized Control Trial.
Interpreting Coefficients [continuous variables]
Constant = the true mean of the y variable when the x variables are all equal to 0
Beta 1 = how much the true mean of the y variable increases when x1 increases by 1 unit, while all other x variables remain constant

Dummy Variables
Used to test the difference between the 2 categories is statistically significant. Like having different CONSTANTS for 0 and 1 categories,
BUT any other continuous variables have the same IMPACT (slope) across the other variables in the equation without slope dummy
Generate Dummy in Stata: “generate black = 0”, then “replace black=1 if color ==”black” ” If more than one category, create N-1
variables. Interpreting coefficients, when only one dummy variable, B0 = average (mean) of Y variable when dummy category = 0; B0+B1
= average (mean) of Y variable when dummy category = 1; B1 = difference in Y var value between when – 0 and 1. To get CI for omitted
category, KLINCOM _b[cons]; cat. 1 = KLINCOM _b[_cons] + _b[_X1]*1 Coefficient difference between mean of Y for the category of
interest and mean of Y for the excluded categories, holding all other X-variables fixed. Each regression choice enables us to easily TEST
the significance of the difference between the excluded variable category and the ones that we included.
Slope Dummy Variables
Used to measure the differences in slopes between the variables. This is used when we want to combine two linear equations. Enables a
DIFFERENT IMPACT (slope) between the different categories; include when believe that the change in y when the other c-variable
change is diff. for the diff. categories associated with the dummy (if the same impact just use normal dummy). Stata: “generate
SlopeDummyName = (continuous variable name)*(dummy variable name)”
What is the difference between the estimated rooms for tourists and locals when price equals 50? Here we are asking if we think the
impact of being a tourist adds a price differential. Klincom tourist + 50*touristprice

Multicollinearity
The opposite of omitted variable bias, this is caused by the variables that are included in the regression that vary with each other /
strongly related to one another. Occurs when correlation among independent variables is too high. Makes it more difficult to estimate the
impact of each variable individually on the Y variable TF p-values and SE may be high.
- Test using VIF or Variance Inflation Factor to identify excessive multicollinearity. If the VIF > 10 then you have a serious
multicollinearity problem (.vif)
- Use the partial f-test to find out if the two variables are jointly significant . testparm. If p-value is small, then we can say that at least
one of the variables are significantly related to the other. But don’t remove both variables.
- F-TEST: hypothesis tests (returns p-value) that at least one x variable coefficient ≠ 0; like a p-value for the whole
regression; takes into account the # of variables testing + requires stronger evidence for more vars . If high F-test,
but low p-value, likely spurious correlation
- As you add variables to your regression the likelihood of one of the “junk” variables is stat. signif. Increases
- PARTIAL F-TEST: Tests if some of the variables in a regression are jointly statistically significant; e.g. if there is enough evidence
that X2 OR X3 are related to Y

Influential Observations
Outlier (extreme y values) - if the y value of our data point is far away from the line. We can identify outliers by looking at residuals. We
define this as being more than about 1.96 standard deviations away from the line. About 5% of our data should be outliers.

.inflobs - command that helps identify the data points that may be having disproportionate influence on your regression.

Leverage (extreme x values) - has a residual of zero, but is lonely in its x value. They tend to shape the regression line more than other
data points.

Cook’s D - How much a single data point moves the regression line.

Extrapolation
Trying to predict a value outside the range of our model

Log v. Linear Models

If you plot rvfplot, and there is a pattern, then likely not
linear. Two ways to deal with nonlinear functions:
quadratic (applied to one or more x’s) and
logarithmic (applied to y alone or both y and x’s)
Quadratic

Semi-log - Log tends to popup when the impact of X is

multiplicative and not additive. Only log the Y variable.
- Diminishing returns
- Learning curves
- Growth rates
- Price elasticity

The semi-log model is: logY= β0+ β1 X

To use KLINCOM or KPREDINT in a semi-log use the NORMAL UNITS for the X variables in your command
REMEMBER! Then convert your prediction back to original units AND multiply by a correction factor ((SE of regression)2 /2) (works for
estimations and CI levels) ONLY with estimates around the mean do you need the correction factor NOT when for individual predictions!
In Stata: di exp(prediction#)*exp((e(rmse)^2)/2)
Interpreting co-efficient in semi-log: each 1 unit increase in X1 variable is associated with an approximately (100*B1)% change in
the Y variable

Log-Log
The log-log model is: logY= β0+ β1logX
To use KLINCOM or KPREDINT need to take the log of the value in the X variable you want to test (e.g. if want to test 2,
then need to multiple the coefficient by ln(2)≈ 0.69)
REMEMBER! Then convert your prediction back to original units AND multiply by a correction factor ((SE of regression)2 /2) (works for
estimations and CI levels) ONLY with estimates around the mean do you need the correction factor NOT when for individual predictions!
Interpreting co-efficient in log-log: each 1% increase in X1 is associated with an approx. B1% change in the Y variable
B1 is also called the elasticity of the Y variable with respect to the X variables
UnitCost = a(Experience)b --> ln(UnitCost) = ln(a) + b ln(Experience)

Correction Factor
Logging and exponentiating creates a bias in the average, so we can correct that bias by multiplying fitted values and CI (not PI) by a
correction factor
. display exp(e(rmse)^2/2)

Error Assumptions in Regression

In order to compute the standard errors of the estimated coefficients in regressions, we have to make two assumptions about the errors:

Homeskedasticity and Cluster-Robust Standard Errors

The distribution (variation of the individual observations around the regression line) of each observations error is identical. Something
violates this and is heteroskedastic if the magnitudes of the errors are related to any x. Estimates are regressions are fine, but SEs and
CIs/PIs and p-values are wrong.
- .rvfplot, and see if there is a pattern / cone shape
- .hettest is a better way. If p-value is significant, then say that model is heteroskedastic
- Logs can correct for this. If logs still don’t help, can add vce(robust) to the end of the regression equation
- . regress sales shoppers, vce(robust) >> Reduces the effective sample size to the number of groups, so typically need a large
number of groups (>20). Need to ensure that there is linearity, homosked. And independence in these items as well after use cluster-
robust SEs

Independence
Assumption: residuals are independent; violated when data is “groupy” (panel) AND aren’t using a fixed model regression. Each error
must be independent of every other error. Doesn’t happen in the instance of autocorrelation such as a time series plot
- ID via a pattern in residual plot
- Can correct by making the dependent variable the change in y from day to day. In Stata: add “, cluster(GroupVariableName)” at the
end of your regression

Omitted Variable Bias (OVB)

OVB occurs when a coefficient of some x-variables is interpreted incorrectly because it leaves out an important factor which is related to
the X1 variable and the Y-variable (therefore, the coefficient captures the direct impact of X1, but also impact on X1 on Y through X2)
Consider these regressions:
- Y = B0* +B1*(X1 var)
- Y = B0 + B1(X1 var) + B2(X2 var)
- X2 var = A0 + A1*(X1 var)
OVB = B1* - B1 = A1*B2
OVB = (relationship b/w X1 and omitted variable)*(relationship b/w Y and omitted variable)
- If negative OVB, then B1* - B1 < 0 ; B1* < B1
- The coefficient on an X variable measures DIFFERENT things depending on what other variables are in a regression
- Business question may mean only include one variable (i.e. don’t care how a company reduces energy cost, just care that it does, so
would use just energy cost as the X variable)
Example:
scores=30+8* study
scores=-70+12* study+30*IQ
What can you conjecture about the relationship between IQ and the hours devoting to studying among your older peers?

We can say that there is a negative relationship (quantified more exactly below) between how long a student studied for the midterm and the
student’s IQ. Explanation: Comparing the two regressions, we see that the OVB on the coefficient of study from omitting IQ is equal to 8 – 12 = -4,
which is negative. This OVB is the product of (1) the relation between study time and IQ and (2) the relation between IQ and scores when study time
is held fixed. The latter is given by the 30 estimated coefficient on IQ in the second regression. Thus -4 = (?)*(30), so that the ? representing the
relation between study time and IQ must equal -4/30 ≈ -0.133, meaning that in this data an additional hour of study time is associated with an IQ
lower by 0.133 points. In other words, in our data, students with lower IQs tended to study more for the midterm.

Solutions to OVB:
- Find data for the omitted variables (however, there may be omitted factors we don’t think of)
- Use Fixed Effects requires Panel Data (see Panel Data & Fixed Effects section below) \

Panel Data Fixed Effects Models

Cross-sectional data is data collected from one point in time across different subjects; panel data is data with multiple dimensions (often
many subjects) over many periods of time (e.g. multiple stock values over time)
- Cross section studies are particularly vulnerable to OVB because comes from a single point in time
- This can be corrected by using panel data and implementing a fixed effects regression. When have panel data, we can use
dummy variables (Fixed Effect) to eliminate OVB. Fixed effects models focus on this within-group variation
- Need to assume the effect of the X variable on the Y variable is the SAME across the different ‘panels’ (e.g. people, regions, stocks,
etc.), then can create dummy variables for each of these categories/panels/groups if DO NOT want this assumption, need to also use
a slope dummy
- We regress sales on bonus and include in the regression dummy variables for the regions(remembering to omit one region). Look for
groupiness.
- xi: regress sales bonus i.region i.year
- The fixed effects (the region dummies) soak up all between group variation. In doing so, they pick up the combined effects of all
factors that differ across groups but remain constant over time within each group.
- Regressions based on cross-section data are vulnerable to OVB
- Panel data includes both between-group and “within”-group variation Between variation is the difference in the mean values of
variables between each group of data (e..g, each individual, firm, etc.)
- Within variation is the variation around the mean (over time) within each group
- FE regression focuses on the effects of within-group variation. FE regression eliminates OVB due to the effects of unobserved
between-group differences
- Some OVB may remain, but only if unobservables vary over time within groups, and this variation is correlated with the variation over
time of the variables in your regression
- When you create a fixed effect, the interpretation of the coefficients on your variables of interest is now “controlled for within one
group” (e.g. within one store, one stock, one person, etc.)

Building Models
Convincing results have five features:
1. The coefficients are unbiased(or known biases do not pose a problem)
2. The model is parsimonious
3. The standard errors (and associated confidence and prediction intervals) are correctly estimated
4. The results are robust
5. When required, the model is well identified, meaning that we can sort out cause and effect

Problem 3: Simple re-greasin’

Run a regression of netprofits versus repairs and write out the regression equation below.

The shop is considering installing new “puncture resistant” tires on their Simple bikes. The new tires will cost an extra $45 per pair but will reduce the average number of visits in the first year by 2.5. Is this upgrade a good
idea for the bike shop? Justify your recommendation.

. klincom -2.5*repairs - 45

( 1) - 2.5*repairs = 45

------------------------------------------------------------------------------
netprofits | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 10.55826 5.146492 2.05 0.043 .3621264 20.75438
------------------------------------------------------------------------------

If Ha: < then Pr(T < t) = .979

If Ha: not = then Pr(|T| > |t|) = .043

If Ha: > then Pr(T > t) = .021

We should reject the null hypothesis. This upgrade is a good idea. We expect profits to go up by $10.56 per customer.

The non-profit organization is considering an aggressive new marketing campaign to encourage attendance at the farmer’s market. They believe that the campaign will increase attendance by 50 people per week on average,
but they will have to increase rents for the each seller. The Aloha Honey Co would have to pay an extra $150 per week to support the campaign. Should the company support this new campaign? Explain.

. klincom 50*shoppers -150

Hospital Pharmacy Mcqs
100% (4)
Hospital Pharmacy Mcqs
43 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
MCQ Questions: Administrative Law: de Legalite, Which Means
33% (3)
MCQ Questions: Administrative Law: de Legalite, Which Means
38 pages
Drug Dosage Calculation Final
100% (1)
Drug Dosage Calculation Final
15 pages
Data Science Q&A - Latest Ed (2020) - 3 - 1
No ratings yet
Data Science Q&A - Latest Ed (2020) - 3 - 1
2 pages
Evans - Analytics2e - PPT - 07 and 08 CH
No ratings yet
Evans - Analytics2e - PPT - 07 and 08 CH
50 pages
Predective Analytics or Inferential Statistics
No ratings yet
Predective Analytics or Inferential Statistics
27 pages
BA - Advanced statistical method using R (P2)
No ratings yet
BA - Advanced statistical method using R (P2)
12 pages
Chapter 3 Notes Part 3
No ratings yet
Chapter 3 Notes Part 3
9 pages
Linear Regression
100% (2)
Linear Regression
28 pages
Basics
No ratings yet
Basics
8 pages
correlation
No ratings yet
correlation
13 pages
Regression ANOVA Compiled
No ratings yet
Regression ANOVA Compiled
112 pages
Screenshot 2024-12-15 at 8.15.38 PM
No ratings yet
Screenshot 2024-12-15 at 8.15.38 PM
138 pages
Regression Analysis
100% (2)
Regression Analysis
9 pages
Lecture - 01 - REVIEW MATERIAL - Quantitative - Review 8801
No ratings yet
Lecture - 01 - REVIEW MATERIAL - Quantitative - Review 8801
35 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Welcome To The Course: Financial Econometrics I
No ratings yet
Welcome To The Course: Financial Econometrics I
14 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Multiple Regression
No ratings yet
Multiple Regression
61 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Evans - Analytics2e - PPT - 07 and 08
No ratings yet
Evans - Analytics2e - PPT - 07 and 08
49 pages
L4&5 Multiple Regression 2010B
No ratings yet
L4&5 Multiple Regression 2010B
77 pages
A Tutorial On How To Run A Simple Linear Regression in Excel
No ratings yet
A Tutorial On How To Run A Simple Linear Regression in Excel
19 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
Business Analytics
No ratings yet
Business Analytics
19 pages
Lesson 08 Data Analysis Using Statistics
No ratings yet
Lesson 08 Data Analysis Using Statistics
100 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Na9vr1 SZWvb69fvimVUw BF C2 W2 Multiple Regression Models
No ratings yet
Na9vr1 SZWvb69fvimVUw BF C2 W2 Multiple Regression Models
25 pages
Ba All Notes Merge - Merged
No ratings yet
Ba All Notes Merge - Merged
385 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
REGRESSION ANALYSIS 1 and 2 Notes
No ratings yet
REGRESSION ANALYSIS 1 and 2 Notes
9 pages
Predictive Analytics-Mid Sem Exam Question Bank
No ratings yet
Predictive Analytics-Mid Sem Exam Question Bank
28 pages
Regrion
No ratings yet
Regrion
19 pages
T-Tests, Anovas & Regression: and Their Application To The Statistical Analysis of Neuroimaging
No ratings yet
T-Tests, Anovas & Regression: and Their Application To The Statistical Analysis of Neuroimaging
39 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Business Statistics
No ratings yet
Business Statistics
20 pages
Basic Statistics: Basic Statistical Interview Question
No ratings yet
Basic Statistics: Basic Statistical Interview Question
5 pages
Bivariate
No ratings yet
Bivariate
28 pages
Interpreting Correlation
No ratings yet
Interpreting Correlation
13 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Presentation of Statistics
No ratings yet
Presentation of Statistics
21 pages
Simple Regression
No ratings yet
Simple Regression
46 pages
REGRESSION ANALYSIS
No ratings yet
REGRESSION ANALYSIS
6 pages
Simple Linear Regression (1)
No ratings yet
Simple Linear Regression (1)
83 pages
BRM-Lecture 4-2023
No ratings yet
BRM-Lecture 4-2023
48 pages
Why Do We Need Statistics? - P Values - T-Tests - Anova - Correlation33
No ratings yet
Why Do We Need Statistics? - P Values - T-Tests - Anova - Correlation33
37 pages
14 - Regresi dan Korelasi
No ratings yet
14 - Regresi dan Korelasi
34 pages
Making Confident Decisions
No ratings yet
Making Confident Decisions
37 pages
F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)
No ratings yet
F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)
8 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
PARAMETRIC-TEST
No ratings yet
PARAMETRIC-TEST
49 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
SMMD
No ratings yet
SMMD
10 pages
Internal Econometrics
No ratings yet
Internal Econometrics
5 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Artificer BioMechanical Weapon - GM Binder
No ratings yet
Artificer BioMechanical Weapon - GM Binder
2 pages
Bad Things Happen For A Reason.: Devil
No ratings yet
Bad Things Happen For A Reason.: Devil
8 pages
4.Extension Farm Journalism[1]
No ratings yet
4.Extension Farm Journalism[1]
63 pages
Report_504C103295_Kishanbhai Ahuja
No ratings yet
Report_504C103295_Kishanbhai Ahuja
6 pages
Past Simple Vs Past Continuous - Differences - Theory
100% (1)
Past Simple Vs Past Continuous - Differences - Theory
3 pages
Toilet Handwash Bed, Nightstand Sink Dining Table TV: Unit 2, Lesson 2, Worksheet 76
100% (1)
Toilet Handwash Bed, Nightstand Sink Dining Table TV: Unit 2, Lesson 2, Worksheet 76
6 pages
Statistics and Probability Letters: Héctor W. Gómez, Juan F. Olivares-Pacheco, Heleno Bolfarine
No ratings yet
Statistics and Probability Letters: Héctor W. Gómez, Juan F. Olivares-Pacheco, Heleno Bolfarine
8 pages
1 s2.0 S0263224123008795 Main
No ratings yet
1 s2.0 S0263224123008795 Main
13 pages
Cherry, Cherry - Neil Diamond Chords
No ratings yet
Cherry, Cherry - Neil Diamond Chords
2 pages
Deducing A Legalized Approach To The Exceptional - "Who, Not What, and Until When?"
No ratings yet
Deducing A Legalized Approach To The Exceptional - "Who, Not What, and Until When?"
12 pages
MIDTERM EXAM (NEW BOOK)
No ratings yet
MIDTERM EXAM (NEW BOOK)
3 pages
Antao - Edited - LP 2
No ratings yet
Antao - Edited - LP 2
9 pages
Chapter 16 Developing Pricing Strategies and Programs
0% (1)
Chapter 16 Developing Pricing Strategies and Programs
5 pages
John10:10.He Will Look For A Way To Keep On Putting Pressure On You To Stop
No ratings yet
John10:10.He Will Look For A Way To Keep On Putting Pressure On You To Stop
7 pages
South City Homes, Inc., Petitioner, vs. Republic of The Philippines and COURT OF APPEALS, Respondents
No ratings yet
South City Homes, Inc., Petitioner, vs. Republic of The Philippines and COURT OF APPEALS, Respondents
6 pages
Y E C C: Angon Ducation Reation Orner
No ratings yet
Y E C C: Angon Ducation Reation Orner
9 pages
Laptops Are Great. But Not During A Lecture or A Meeting. - The New York Times
No ratings yet
Laptops Are Great. But Not During A Lecture or A Meeting. - The New York Times
4 pages
Infectious Disease
No ratings yet
Infectious Disease
9 pages
Application Form: User Id
No ratings yet
Application Form: User Id
7 pages
How To Use This Workbook Common Steps Used in Creating Pivottable
No ratings yet
How To Use This Workbook Common Steps Used in Creating Pivottable
65 pages
The 1619 Project
No ratings yet
The 1619 Project
3 pages
Bài tập luyện tập thì của động từ trong tiếng Anh 3
No ratings yet
Bài tập luyện tập thì của động từ trong tiếng Anh 3
3 pages
OCD Essay
No ratings yet
OCD Essay
4 pages
Sociology Lecture No 11 Population and Urbanization
No ratings yet
Sociology Lecture No 11 Population and Urbanization
3 pages
Grade 11 - 12 The Nature and Process of Communication
No ratings yet
Grade 11 - 12 The Nature and Process of Communication
14 pages
Terrorism in Pakistan and Its Impact On Foreign Investment
No ratings yet
Terrorism in Pakistan and Its Impact On Foreign Investment
23 pages
The Wild Swans at Coole - Notes
No ratings yet
The Wild Swans at Coole - Notes
3 pages

Uploaded by

Uploaded by

Probability

Expected Value in Excel = SUMPRODUCT(A range of x values, B range of probability)

Variance - how close are we to the expected value

Two possible investments with independent returns

Correlation, Gauss and CLT

Z = 𝑋−𝜇/𝜎~ Normal (0,1) 𝑃(−1.96≤𝑍≤1.96)=.9500

CLT - Central Limit Theorum

Sum ~ Normal (nμ, sqrt(𝑛)𝜎) ; Where E(X) = nμ and Var(X) = nσ2

Average ~ Normal (μ, 𝜎/sqrt(𝑛)) ; Where E(X) = μ and Var(X) = σ2/n

Xbar~ Normal (μ, 𝜎/sqrt(n))

T-distribution -- used when we don’t know the

Statistical Inference / Hypothesis Testing

Linear Regression: Prediction

Best-Fit Line - Least Squares Method

Root MSE - Root Mean Squared Error

Regression: Estimating and Benchmarking

Log v. Linear Models

Semi-log - Log tends to popup when the impact of X is

The semi-log model is: logY= β0+ β1 X

Error Assumptions in Regression

Homeskedasticity and Cluster-Robust Standard Errors

Omitted Variable Bias (OVB)

Panel Data Fixed Effects Models

Problem 3: Simple re-greasin’

If Ha: < then Pr(T < t) = .979

If Ha: not = then Pr(|T| > |t|) = .043

If Ha: > then Pr(T > t) = .021

. klincom 50*shoppers -150

You might also like