Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
vuoi
o PayPal
tutte le volte che vuoi
X
• If an omitted variable Z is both:
1. a determinant of Y (that is, it is contained in u); and ˆ
2. correlated with X , then 0 and the OLS estimator is biased
Xu 1
and is not consistent.
• For example, districts with few ESL students (1) do better on
standardized tests and (2) have smaller classes (bigger budgets),
so ignoring the effect of having many ESL students factor would
result in overstating the class size effect. Is this is actually going
on in the CA data?
The omitted variable bias formula: (2 of 2)
TABLE 6.1 Differences in Test Scores for California School Districts with Low and High
Student–Teacher Ratios, by the Percentage of English Learners in the District
Difference in Test Scores,
Student–Teacher Student–Teacher
Ratio ≥ 20 Low vs. High STR
Ratio < 20
Blank Average Average
Test Score n Test Score n Difference t-statistic
All districts 657.4 238 650.0 182 7.4 4.04
Percentage of Blank Blank Blank Blank Blank Blank
English learners −0.9 −0.30
< 1.9% 664.5 76 665.4 27
1.9–8.8% 665.2 64 661.8 44 3.3 1.13
8.8–23.0% 654.9 54 649.7 50 5.2 1.72
> 23.0% 636.7 44 634.8 61 1.9 0.68
• Districts with fewer English Learners have higher test scores
• Districts with lower percent EL (PctEL) have smaller classes
• Among districts with comparable PctEL, the effect of class size is small
(recall overall “test score gap” = 7.4)
Using regression to estimate causal effects
• The test score/STR/fraction English Learners example shows that, if an
omitted variable satisfies the two conditions for omitted variable bias, then
the OLS estimator in the regression omitting that variable is biased and
ˆ
inconsistent. So, even if n is large, will not be close to .
1 1
• We have distinguished between two uses of regression: for prediction,
and to estimate causal effects.
– Regression also can be used simply to summarize the data without attaching any
meaning to the coefficients or for any other purpose, but we won’t focus on this
use.
• In the class size application, we clearly are interested in a causal effect:
what do we expect to happen to test scores if the superintendent reduces
the class size?
What, precisely, is a causal effect?
• “Causality” is a complex concept!
• In this course, we take a practical approach to defining causality:
A causal effect is defined to be the effect measured in an ideal
randomized controlled experiment.
Ideal Randomized Controlled Experiment
• –
Ideal: subjects all follow the treatment protocol perfect
compliance, no errors in reporting, etc.!
• Randomized: subjects from the population of interest are
randomly assigned to a treatment or control group (so there are
no confounding factors)
• Controlled: having a control group permits measuring the
differential effect of the treatment
• Experiment: the treatment is assigned as part of the experiment:
the subjects have no choice, so there is no “reverse causality” in
which subjects choose the treatment they think will work best.
Back to class size:
Imagine an ideal randomized controlled experiment for measuring
the effect on Test Score of reducing STR…
• In that experiment, students would be randomly assigned to
classes, which would have different sizes.
• Because they are randomly assigned, all student characteristics
(and thus u ) would be distributed independently of STR .
i i
• –
Thus, E(u |STR ) = 0 that is, LSA #1 holds in a randomized
i i
controlled experiment.
How does our observational data differ from
this ideal? (1 of 2)
• The treatment is not randomly assigned
• – –
Consider PctEL percent English learners in the district.
It plausibly satisfies the two criteria for omitted variable
bias: Z = PctEL is:
1. a determinant of Y; and
2. correlated with the regressor X.
• Thus, the “control” and “treatment” groups differ in a
≠
systematic way, so corr(STR,PctEL) 0
How does our observational data differ from
this ideal? (2 of 2)
• Randomization implies that any differences between the
–
treatment and control groups are random not systematically
related to the treatment
• We can eliminate the difference in PctEL between the large class
(control) and small class (treatment) groups by examining the
effect of class size among districts with the same PctEL.
– If the only systematic difference between the large and small class size
groups is in PctEL, then we are back to the randomized controlled
–
experiment within each PctEL group.
– This is one way to “control” for the effect of PctEL when estimating the
effect of STR.
Return to omitted variable bias
Three ways to overcome omitted variable bias
1. Run a randomized controlled experiment in which treatment (STR) is
randomly assigned: then PctEL is still a determinant of TestScore, but
PctEL is uncorrelated with STR. (This solution to OV bias is rarely
feasible.)
Adopt the “cross tabulation” approach, with finer gradations of
2. STR
–
and PctEL within each group, all classes have the same PctEL, so
we control for PctEL (But soon you will run out of data, and what
about other determinants like family income and parental education?)
3. Use a regression in which the omitted variable (PctEL) is no longer
omitted: include PctEL as an additional regressor in a multiple
regression.
The Population Multiple Regression Model
(SW Section 6.2)
• Consider the case of two regressors:
β β β = 1,…,n
Y = + X + X + u , i
i 0 1 1i 2 2i i
• Y is the dependent variable
• X , X are the two independent variables (regressors)
1 2
• th
(Y , X , X ) denote the i observation on Y, X , and X .
i 1i 2i 1 2
• β = unknown population intercept
0
• β = effect on Y of a change in X , holding X constant
1 1 2
• β = effect on Y of a change in X , holding X constant
2 2 1
• u = the regression error (omitted factors)
i
Interpretation of coefficients in multiple
regression (1 of 2)
β β β = 1,…,n
Y = + X + X + u , i
i 0 1 1i 2 2i i
Consider the difference in the expected value of Y for two values
of X holding X constant:
1 2
Population regression line when X = X :
1 1,0
β β β
Y = + X + X
0 1 1,0 2 2 ΔX
Population regression line when X = X + :
1 1,0 1
ΔY β β ΔX β
Y + = + (X + ) + X
0 1 1,0 1 2 2
Interpretation of coefficients in multiple
regression (2 of 2)
β β ΔX
Before: Y = + (X + ) + X
0 1 1,0 1 2 2
ΔY β β ΔX β
After: Y + = + (X + ) + X
0 1 1,0 1 2 2
ΔY β ΔX
Difference: = 1 1
So:
Y
, holding X constant
1 2
X 1
Y
, holding X constant
2 1
X 2
β = predicted value of Y when X = X = 0.
0 1 2
The OLS Estimator in Multiple Regression
(SW Section 6.3)
• With two regressors, the OLS estimator solves:
n
2
min [
Y (
b b X b X )]
b , b , b i 0 1 1
i 2 2 i
0 1 2
i 1
• The OLS estimator minimizes the average squared difference
between the actual values of Y and the prediction (predicted
i
value) based on the estimated line.
• This minimization problem is solved using calculus
• β β
This yields the OLS estimators of and .
0 1
Example: the California test score data
Regression of TestScore against STR:
TestScore 698.9 2.28 STR
Now include percent English Learners in the district (PctEL):
TestScore 686.0 1.10 STR 0.65 PctEL
• What happens to the coefficient on STR?
• What (STR, PctEL) = 0.19)
Multiple regression in STATA
reg testscr str pctel, robust;
Regression with robust standard errors Number of obs = 420
F( 2, 417) = 223.82
Prob > F = 0.0000
R-squared = 0.4264
Root MSE = 14.464
-----------------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+--------------------------------------------------------------------------
str | −1.101296 .4328472 −2.54 0.011 −1.95213 −.2504616
pctel | −.6497768 .0310318 −20.94 0.000 −.710775 −.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
-----------------------------------------------------------------------------------------
TestScore 686.0 1.10 STR 0.65 PctEL
More on this printout later…
Measures of Fit for Multiple Regression
(SW Section 6.4) (1 of 2) ˆ
ˆ
Actual predicted residual: Y Y u
i i i
ˆ
SER std. deviation of u (with d.f. correction)
i
ˆ
RMSE std. deviation of u (without d.f. correction)
i
2
R fraction of variance of Y explained by X
“adjusted ” w
2 2 2
R R R ith a degrees-of-freedom correction
2 2
that adjusts for estimation uncertainty; R R
SER and RMSE
As in regression with a single regressor, the SER and the RMSE are
measures of the spread of the Ys around the regression line:
n
1
ˆ 2
SER u
i
n k 1
i 1
n
1
ˆ 2
RMSE u
i
n
i 1
2 2 2
R and R (adjusted R ) (1 of 2) –
2
The R is the fraction of the variance explained same definition
as in regression with a single regressor:
ESS SSR
2
R 1 ,
TSS TSS
n n n
ˆ ˆ
ˆ
2 2 2
where ESS (
Y Y ) , SSR u , TSS (
Y Y ) .
i i i
i 1 i 1 i 1
• 2
The R always increases when you add another regressor
– a bit of a problem for a measure of “fit”
(why?)
2 2 2
R and R (adjusted R ) (2 of 2)
The (the “adjusted ”) corrects this problem by “penalizing”
2 2
R R 2
you for including another regressor the R does not necessarily
increase when you add another regressor.
n 1 SSR
2 2
Adjusted R : R 1
n k 1 TSS
2 2
Note that R R , however if n is large the two wi