Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
vuoi
o PayPal
tutte le volte che vuoi
Y
Small (STR > 20) 657.4 19.4 238
≥ 20)
Large (STR 650.0 17.9 182
Difference in means : Y Y 657.4 650.0 7.4
small large
2 2 2 2
S S 19.4 17.9
s l
Standard error SE 1.8
n n 238 182
s 1
Summary: regression when X is binary (0/1)
i
β β
Y = + X + u
i 0 1 i i
• β = mean of Y when X = 0
0
• β β
+ = mean of Y when X = 1
0 1
• β = difference in group means, X =1 minus X = 0
1 ˆ
• SE ( ) has the usual interpretation
1
• t-statistics, confidence intervals constructed as usual
• This is another way (an easy way) to do difference-in-means analysis
• The regression formulation is especially useful when we have
additional regressors
Heteroskedasticity and Homoskedasticity, and
Homoskedasticity-Only Standard Errors (Section 5.4)
What…?
1.
2. Consequences of homoskedasticity
3. Implication for computing standard errors
What do these two terms mean?
–
If var(u|X=x) is constant that is, if the variance of the conditional
–
distribution of u given X does not depend on X then u is said to
be homoskedastic. Otherwise, u is heteroskedastic.
Example: hetero/homoskedasticity in the case of a
binary regressor (that is, the comparison of means)
• Standard error when group variances are unequal:
2 2
s s
s l
SE n n
s l
• Standard error when group variances are equal:
1 1
SE s p n n
s l
2 2
( n 1) S ( n 1) S
2 s s l l
Where S (SW, Sect 3.6)
p n n 2
s l
“pooled estimator of ” when
2 2 2
S p l s
• Equal group variances = homoskedasticity
• Unequal group variances = heteroskedasticity
Homoskedasticity in a picture:
• β
E(u|X=x) = 0 (β + X is the population regression line)
0 1
• The variance of u does not depend on x
A real-data example from labor economics: average
hourly earnings vs. years of education (data source:
Current Population Survey):
Heteroskedastic or homoskedastic?
The class size data:
Heteroskedastic or homoskedastic?
So far we have (without saying so) assumed
that u might be heteroskedastic.
Recall the three least squares assumptions:
1. E(u|X = x) = 0
= 1,…,n,
2. (X ,Y ), i are i.i.d.
i i
3. Large outliers are rare
Heteroskedasticity and homoskedasticity concern var(u|X=x).
Because we have not explicitly assumed homoskedastic errors, we
have implicitly allowed for heteroskedasticity.
What if the errors are in fact homoskedastic? (1 of 2)
• You can prove that OLS has the lowest variance among estimators that are
a result called the Gauss-Markov
linear in Y… theorem that we will return
to shortly. ˆ
• The formula for the variance of and the OLS standard error simplifies :
1
If var(
u | X x ) , then
i i
var[( X )
u ]
i x i
var( ) (general formula)
2 2
n ( )
X
2
u (simplification of u is homoscedastic)
2
n X
ˆ
Note : var( ) is inversely proportional to var( X ): more spread in X
1 ˆ
means more information about we discussed this earlier but it is
1
clearer from this formula.
What if the errors are in fact homoskedastic? (2 of 2) ˆ
• Along with this homoskedasticity-only formula for the variance of ,
1
we have homoskedasticity-only standard errors:
Homoskedasticity-only standard error formula:
n
1 ˆ 2
u
i
1 n 2
ˆ
i 1
SE ( ) .
1 n
1
n 2
( X X )
i
n
i 1
Some people (e.g. Excel programmers) find the homoskedasticity-only
–
formula simpler but it is wrong unless the errors really are
homoskedastic.
We now have two formulas for standard
ˆ
errors for .
1
• –
Homoskedasticity-only standard errors these are valid only if
the errors are homoskedastic.
• –
The usual standard errors to differentiate the two, it is
–
conventional to call these heteroskedasticity robust standard
errors, because they are valid whether or not the errors are
heteroskedastic.
• The main advantage of the homoskedasticity-only standard
errors is that the formula is simpler. But the disadvantage is that
the formula is only correct if the errors are homoskedastic.
Practical implications… ˆ
• The homoskedasticity-only formula for the standard error of and
1
the “heteroskedasticity-robust” formula differ so in general, you get
different standard errors using the different formulas
.
• Homoskedasticity-only standard errors are the default setting in
–
regression software sometimes the only setting (e.g. Excel). To get
the general “heteroskedasticity-robust” standard errors you must
override the default.
• If you don’t override the default and there is in fact
heteroskedasticity, your standard errors (and t-statistics and
–
confidence intervals) will be wrong typically, homoskedasticity-
only SEs are too small.
Heteroskedasticity-robust standard errors
in STATA
regress testscr str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581
------------------------------------------------------------------------
Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------
qqqqstr | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
qq_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
------------------------------------------------------------------------
• “,
If you use the robust” option, STATA computes heteroskedasticity-robust
standard errors
• Otherwise, STATA computes homoskedasticity-only standard errors
The bottom line:
• If the errors are either homoskedastic or heteroskedastic and you
use heteroskedastic-robust standard errors, you are OK
• If the errors are heteroskedastic and you use the homoskedasticity-
only formula for standard errors, your standard errors will be wrong
ˆ
(the homoskedasticity-only estimator of the variance of is
1
inconsistent if there is heteroskedasticity).
• The two formulas coincide (when n is large) in the special case
of homoskedasticity
• So, you should always use heteroskedasticity-robust standard
errors.
Some Additional Theoretical Foundations
of OLS (Sections 5.5)
We have already learned a very great deal about OLS: OLS is
unbiased and consistent; we have a formula for heteroskedasticity-
robust standard errors; and we can construct confidence intervals
and test statistics. –
Also, a very good reason to use OLS is that everyone else does
so by using it, others will understand what you are doing. In
effect, OLS is the language of regression analysis, and if you use a
different estimator, you will be speaking a different language.
Still, you may wonder…
• Is this really a good reason to use OLS? Aren’t there other
–
estimators that might be better in particular, ones that might
have a smaller variance?
• Also, what happened to our old friend, the Student t distribution?
–
So we will now answer these questions but to do so we will need
to make some stronger assumptions than the three least squares
assumptions already presented.
The Homoskedastic Normal Regression
Assumptions
These consist of the three LS assumptions, plus two more:
1. E(u|X = x) = 0.
=1,…,n,
2. (X ,Y ), i are i.i.d.
i i ∞, ∞).
4 4
3. Large outliers are rare (E(Y ) < E(X ) <
4. u is homoskedastic 2
5. u is distributed N(0,σ )
• –
Assumptions 4 and 5 are more restrictive so they apply to fewer
cases in practice. However, if you make these assumptions, then
certain mathematical calculations simplify and you can prove
–
strong results results that hold if these additional assumptions
are true.
• We start with a discussion of the efficiency of OLS
Efficiency of OLS, part I: The Gauss-Markov
Theorem (1 of 2)
Under assumptions 1-4 (the basic three, plus homoskedasticity),
ˆ
has the smallest variance among all linear estimators
1
(estimators that are linear functions of Y ,..., Y ). This is
1 n
the Gauss - Markov theorem .
Comments
• The GM theorem is proven in SW Appendix 5.2
Efficiency of OLS, part I: The Gauss-Markov
Theorem (2 of 2)
ˆ
• is a linear estimator, that is, it can be written as a linear function
1
of Y ,..., Y :
1 n n
( X X )
u
i i n
1
ˆ
i 1 w u ,
1 1 i i
n
n
2 i 1
( X X )
i
i 1
( X X )
i
where w .
i n
1 2
( X X )
i
n
i 1
• The G-M theorem says that among all possible choices of { w }, the OLS
i
ˆ
weights yield the samllest var( )
1
Efficiency of OLS, part II:
• Under all five homoskedastic normal regression assumptions
ˆ
including normally distributed errors has the smallest
1
variance of all consistent estimators (linear or nonlinear
functio ns of Y ,..., Y ), as - .
1 n
• –
This is a pretty amazing result it says that, if (in addition to LSA 1-3)
the errors are homoskedastic and normally distributed, then OLS is a
better choice than any other consistent estimator. And because an
estimator that isn’t consistent is a poor choice, this says that OLS
–
really is the best you can do if all five extended LS assumptions
hold. (The proof of this result is beyond the scope of this course and
isn’t in SW – it is typically done in graduate courses.)
Some not-so-good thing about OLS (1 of 2)
–
The foregoing results are impressive, but these results and the
–
OLS estimator have important limitations.
1. The GM theorem really isn’t that compelling:
– The condition of homoskedasticity often doesn’t hold (homoskedasticity
is special)
– –
The result is only for linear e