Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
vuoi
o PayPal
tutte le volte che vuoi
X.
• The slope is the difference in the expected values of Y, for two
values of X that differ by one unit
• The estimated regression can be used either for:
– causal inference (learning about the causal effect on Y of a change in X)
– prediction (predicting the value of Y given X, for an observation not in
the data set)
• Causal inference and prediction place different requirements on
–
the data but both use the same regression toolkit.
The problem of statistical inference for linear regression is, at a general
level, the same as for estimation of the mean or of the differences between
two means. Statistical, or econometric, inference about the slope entails:
• Estimation:
– How should we draw a line through the data to estimate the population
slope?
Answer: ordinary least squares (OLS).
– What are advantages and disadvantages of OLS?
• Hypothesis testing:
– How to test whether the slope is zero?
• Confidence intervals:
– How to construct a confidence interval for the slope?
The Linear Regression Model (SW Section 4.1)
The population regression line:
β β
Test Score = + STR
0 1
β = slope of population regression line
1 β β “population” parameters?
Why are and
0 1
• β
We would like to know the population value of .
1
• We don’t know β , so must estimate it using data.
1
The Population Linear Regression Model
β β
Y = + X + u , i = 1,…, n
i 0 1 i i
• We have n observations, (X , Y ), i = 1,.., n.
i i
• X is the independent variable or regressor
• Y is the dependent variable
• β = intercept
0
• β = slope
1
• u = the regression error
i
• The regression error consists of omitted factors. In general, these omitted
factors are other factors that influence Y, other than the variable X. The
regression error also includes error in the measurement of Y.
The population regression model in a picture: Observations on Y and X
(n = 7); the population regression line; and the regression error (the
“error term”):
The Ordinary Least Squares Estimator
(SW Section 4.2)
β β
How can we estimate and from data?
0 1 μ
Recall that was the least squares estimator of :Ȳ solves,
Y
n
2
min (
Y m )
m i
i 1
we will focus on the least squares (“ordinary
By analogy, least
or “OLS”) estimator of the unknown parameters β
squares” 0
β
and . The OLS estimator solves,
1 n
2
min [
Y (
b b X )]
b , b i 0 1 i
0 1
i 1
Mechanics of OLS β β
The population regression line: E(Test Score|STR) = + STR
0 1
slope ??
1 n
2
The OLS estimator solves: min [
Y ( b b X )]
b , b i 0 1 i
0 1
i 1
• The OLS estimator minimizes the average squared difference
between the actual values of Y and the prediction (“predicted
i
value”) based on the estimated line.
• This minimization problem can be solved using calculus
(App. 4.2).
• β β
The result is the OLS estimators of and .
0 1
Key Concept 4.2: The OLS Estimator,
Predicted Values, and Residuals
β β
The OLS estimators of the slope and the intercept are
1 0
n
( X X )(
Y Y )
i i s
ˆ
i 1 XY (4.7)
1 n 2
s
2 X
( X X )
i
i 1
ˆ ˆ
Y X . (4.8)
0 1
ˆ ˆ
The OLS predicted values Y and residuals u are
i i
ˆ ˆ
ˆ
Y X , i 1,..., n (4.9)
i 0 1 i
ˆ
ˆ
u Y Y , i 1,..., n
. (4.10)
i i i
ˆ ˆ
ˆ
The estimated intercept ( ), slope ( ), and residual (
u ) are computed
0 1 i
from a sample of n observations of X and Y , i 1,..., n
. These are estimates
i i
of the unknown true population intercept ( ), slo
pe ( ), and error term (
u ).
0 1 i
–
Application to the California Test Score
Class Size data
ˆ
• Estimated slope 2.28
1 ˆ
• Estimated intercept 698.9
0
• Estimated regression line: TestScore 698.9 2.28 STR
Interpretation of the estimated slope and
intercept
• TestScore 698.9 2.28 STR
• Districts with one more student per teacher on average have test
scores that are 2.28 points lower.
E (
Test score
|
STR )
•
That is, 2.28
STR
• The intercept (taken literally) means that, according to this
estimated line, districts with zero students per teacher would have a
(predicted) test score of 698.9. But this interpretation of the
–
intercept makes no sense it extrapolates the line outside the range
–
of the data here, the intercept is not economically meaningful.
Predicted values & residuals:
One of the districts in the data set is Antelope, CA, for which
STR = 19.33 and Test Score = 657.8
ˆ
predicted value: 698.9 – 2.28 19.33 654.8
Y
Antelope
ˆ
residual: 657.8 – 654.8 3.0
u Antelope
OLS regression: STATA output
regress testscr str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581
-------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+----------------------------------------------------------------
str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
-------------------------------------------------------------------------
698.9 – 2.28
TestScore STR
(We’ll discuss the rest of this output later.
)
Measures of Fit (SW Section 4.3)
Two regression statistics provide complementary measures of how
well the regression line “fits” or explains the data:
• 2
The regression R measures the fraction of the variance of Y that
is explained by X; it is unitless and ranges between zero (no fit)
and one (perfect fit)
• The standard error of the regression (SER) measures the
magnitude of a typical regression residual in the units of Y.
2
The regression R is the fraction of the sample
“explained” by the regression.
variance of Y
i
ˆ
ˆ
Y Y u OLS prediction OLS residual
i i i ˆ
ˆ
sample var (
Y ) sample var(
Y ) sample var(
u )( why ?)
i i
total sum of squares “explained” SS “residual” SS
n
ˆ ˆ
2
(
Y Y )
i
ESS
2 2 i 1
Definition of R : R n
TSS 2
(
Y Y )
i
i 1
• 2
R = 0 means ESS = 0
• 2
R = 1 means ESS = TSS
• 0 ≤ ≤ 1
2
R
• 2
For regression with a single X, R = the square of the correlation coefficient
between X and Y
The Standard Error of the Regression (SER)
The SER measures the spread of the distribution of u. The SER is
(almost) the sample standard deviation of the OLS residuals:
n
1
ˆ ˆ 2
SER (
u u )
i
n 2
i 1
n
1
ˆ 2
u
i
n 2
i 1 n
1
ˆ ˆ
The second equality holds because u u 0.
i
n
i 1
n
1
ˆ 2
SER u
i
n 2
i 1
The SER:
has the units of u, which are the units of Y
“size”
measures the average of the OLS residual (the average
“mistake” made by the OLS regression line)
The root mean squared error (RMSE) is closely related to the SER:
n
1
ˆ 2
RMSE u
i
n
i 1 –
This measures the same thing as the SER the minor difference is
–
division by 1/n instead of 1/(n 2). –
Technical note: why divide by n 2 instead
–
of n 1? n
1
ˆ 2
SER u
i
n 2
i 1
• Division by 2 is a “degrees of freedom” correction just
n
like division by n 1 in, except that for the SER , two parameters
ˆ ˆ
2
have been estimated ( and , by and ), whereas in s
0 1 0 1 Y
onl
y one has been estimated ( , by Y ).
Y
• is large, it doesn’t matter whether – –
When n n, n 1, or n 2 are
– –
used although the conventional formula uses n 2 when there is
a single regressor.
• For details, see Section 18.4
2
Example of the R and the SER
2
TestScore 698.9 2.28 STR , R .05, SER 18.6
STR explains only a small fraction of the variation in test scores.
Does this make sense? Does this mean the STR is unimportant in
a policy sense?
The Least Squares Assumptions for
Causal Inference (SW Section 4.4)
• So far we have treated OLS as a way to draw a straight line
through the data on Y and X. Under what conditions does the
slope of this line have a causal interpretation? That is, when will
the OLS estimator be unbiased for the causal effect on Y of X?
• What is the variance of the OLS estimator over repeated
samples?
• To answer these questions, we need to make some assumptions
about how Y and X are related to each other, and about how they
are collected (the sampling scheme)
• – –
These assumptions there are three are known as the Least
Squares Assumptions for Causal Inference.
Definition of Causal Effect
• The causal effect on Y of a unit change in X is the expected
difference in Y as measured in a randomized controlled
experiment
– For a binary treatment, the causal effect is the expected difference in
means between the treatment and control groups, as discussed in Ch. 3
• With a binary treatment, for the difference in means to measure a
causal effect requires random assignment or as-if random
assignment.
– Random assignment ensures t