Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
vuoi
o PayPal
tutte le volte che vuoi
STR
hypothesis at the 5% significance level
Standard errors in multiple regression in
STATA
reg testscr str pctel, robust;
Regression with robust standard errors Number of obs = 420
F( 2, 417) = 223.82
Prob > F = 0.0000
R-squared = 0.4264
Root MSE = 14.464
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616
pctel | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
------------------------------------------------------------------------------
TestScore 686.0 1.10 STR 0.650 PctEL
(8.7
) (
0.
43
) (
0.031
) –
We use heteroskedasticity-robust standard errors for exactly the same reason as
in the case of a single regressor.
Tests of Joint Hypotheses (SW Section 7.2)
(1 of 2)
Let Expn = expenditures per pupil and consider the population
regression model: β β β β
TestScore = + STR + Expn + PctEL + u
i 0 1 i 2 i 3 i i
The null hypothesis that “school resources don’t matter,” and the
alternative that they do, corresponds to:
β β
H : = 0 and = 0
0 1 2
β ≠ β ≠
vs. H : either 0 or 0 or both
1 1 2
β β β β
TestScore = + STR + Expn + PctEL + u
i 0 1 i 2 i 3 i i
Tests of Joint Hypotheses (SW Section 7.2)
(2 of 2)
• β β
H : = 0 and = 0
0 1 2
• β ≠ β ≠
vs. H : either 0 or 0 or both
1 1 2
• A joint hypothesis specifies a value for two or more coefficients,
that is, it imposes a restriction on two or more coefficients.
• In general, a joint hypothesis will involve q restrictions. In the
β β
example above, q = 2, and the two restrictions are = 0 and = 0.
1 2
• A “common sense” idea is to reject if either of the individual
t-statistics exceeds 1.96 in absolute value.
• But this “one at a time” test isn’t valid: the resulting test rejects too
often under the null hypothesis (more than 5%)!
Why can’t we just test the coefficients one
at a time?
Because the rejection rate under the null isn’t 5%. We’ll calculate the
“common
probability of incorrectly rejecting the null using the sense”
test based on the two individual t-statistics. To simplify the calculation,
suppose that and are independently distributed (this isn’t true in
–
general just in this example). Let t and t be the t-statistics:
1 2
ˆ ˆ
0 0
1 2
t and t
ˆ ˆ
1 2
SE ( ) SE ( )
1 2
“one at time” test is:
The β β
reject H : = = 0 if |t | > 1.96 and/or |t | > 1.96
0 1 2 1 2
“one at a time” test rejects
What is the probability that this H , when H
0 0
is actually true? (It should be 5%.)
Suppose t and t are independent
1 2
(for this example).
The probability of incorrectly rejecting the null hypothesis using
the “one at a time” test
Pr [| t | 1.96 and/or | t | 1.96]
H 1 2
0
1 Pr [| t | 1.96 and | t | 1.96]
H 1 2
0
1 Pr [| t | 1.96] Pr [| t | 1.96]
H 1 H 2
0 0
(because t and t are independent by assumption)
1 2
– 2
= 1 (.95) –
= .0975 = 9.75% which is not the desired 5%!!
The size of a test is the actual rejection
rate under the null hypothesis.
• The size of the “common sense” test isn’t 5%!
• In fact, its size depends on the correlation between t and t
1 2
ˆ ˆ
(and thus on the correlation between and ).
1 2
Two Solutions:
• –
Use a different critical value in this procedure not 1.96 (this is
the “Bonferroni –
method see SW App. 7.1) (this method is
rarely used in practice however)
• β β
Use a different test statistic designed to test both and at
1 2
once: the F-statistic (this is common practice)
The F-statistic
The F-statistic tests all parts of a joint hypothesis at once.
β β
Formula for the special case of the joint hypothesis = and
1 1,0
β β
= in a regression with two regressors:
2 2,0
ˆ
2 2
t t 2 t t
1
1 2 t , t 1 2
F 1 2
ˆ 2
2 1
t , t
1 2
ˆ
where estimates the correlation between t and t .
t , t 1 2
1 2
Reject when F is large (how large?)
β β
The F-statistic testing and :
1 2
ˆ
2 2
t t 2 t t
1
1 2 t , t 1 2
F 1 2
ˆ 2
2 1
t , t
1 2
• The F-statistic is large when t and/or t is large
1 2
• The F-statistic corrects (in just the right way) for the correlation
between t and t .
1 2
• β’s
The formula for more than two is nasty unless you use matrix
algebra.
• This gives the F-statistic a nice large-sample approximate
distribution, which is…
Large-sample distribution of the F-statistic
p
ˆ
Consider the special case that t and t are independent, so 0;
1 2 t , t
1 2
in large samples the formula becomes
ˆ
2 2
t t 2 t t
1 1
1 2 t , t 1 2
2 2
F (
t t )
1 2
ˆ 1 2
2
2 1 2
t ,
t
1 2
• Under the null, t and t have standard normal distributions that,
1 2
in this special case, are independent
• The large-sample distribution of the F-statistic is the distribution
of the average of two independently distributed squared standard
normal random variables.
The chi-squared distribution 2
The chi - squared distribution with q degrees of freedom ( ) is
q
defined to be the distribution of the sum of q independent squared
standard normal random variables. 2
In large samples, F is distributed as / q .
q 2
Selected large-sample critical values of / q
q
q 5% critical value
1 3.84 (why?)
2 3.00 (the case q = 2 above)
3 2.60
4 2.37
5 2.21
Computing the p-value using the F-statistic:
2
p -value tail probability of the / q distribution beyond
q
the F -statistic actually computed.
Implementation in STATA
Use the “test” command after the regression
Example: Test the joint hypothesis that the population coefficients
on STR and expenditures per pupil (expn_stu) are both zero, against
the alternative that at least one of the population coefficients is
nonzero.
F-test example, California class size data:
reg testscr str expn_stu pctel, r;
Regression with robust standard errors Number of obs = 420
F( 3, 416) = 147.20
Prob > F = 0.0000
R-squared = 0.4366
Root MSE = 14.353
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -.2863992 .4820728 -0.59 0.553 -1.234001 .661203
expn_stu | .0038679 .0015807 2.45 0.015 .0007607 .0069751
pctel | -.6560227 .0317844 -20.64 0.000 -.7185008 -.5935446
_cons | 649.5779 15.45834 42.02 0.000 619.1917 679.9641
------------------------------------------------------------------------------
NOTE
test str expn_stu; The test command follows the regression
( 1) str = 0.0 There are q=2 restrictions being tested
( 2) expn_stu = 0.0
F( 2, 416) = 5.43 The 5% critical value for q=2 is 3.00
Prob > F = 0.0047 Stata computes the p-value for you
More on F-statistics.
There is a simple formula for the F-statistic that holds only under
homoskedasticity (so it isn’t very useful) but which nevertheless
might help you understand what the F-statistic is doing.
The homoskedasticity-only F-statistic
When the errors are homoskedastic, there is a simple formula for
computing the “homoskedasticity-only” F-statistic:
• Run two regressions, one under the null hypothesis (the
“restricted” regression) and one under the alternative hypothesis
“unrestricted”
(the regression).
• – –
2
Compare the fits of the regressions the R s if the
“unrestricted” model fits sufficiently better, reject the null
“restricted” “unrestricted”
The and regressions
Example: are the coefficients on STR and Expn zero?
Unrestricted population regression (under H ):
1
β β β β
TestScore = + STR + Expn + PctEL + u
i 0 1 i 2 i 3 i i
Restricted population regression (that is, under H ):
0
β β
TestScore = + PctEL + u (why?)
i 0 3 i i
• The number of restrictions under H is q = 2 (why?).
0
• 2
The fit will be better (R will be higher) in the unrestricted
regression (why?) 2
By how much must the R increase for the coefficients on Expn
and PctEL to be judged statistically significant?
Simple formula for the homoskedasticity-only
F-statistic:
2 2
( R R )/ q
unrestricted restricted
F
2
(1 R )/( n k 1)
unrestricted unrestricted
where:
2 2
R the R for the restricted regression
restricted
2 2
R the R for the unrestricted regression
unrestricted
q = the number of restrictions under the null
k = the number of regressors in the unrestricted regression.
unrestricted
• The bigger the difference between the restricted and unrestricted
–
2
R s the greater the improvement in fit by adding the variables in
–
question the larger is the homoskedasticity-only F.
Example:
Restricted regression:
2
TestScore 644.7 0.671
PctEL
, R 0.4149
restricted
(1.0) (0.032)
Unrestricted regression:
TestScore 649.6 0.29 STR 3.87 Expn 0.656 PctEL
(15.5) (0.48) (1.59) (0.032)
2
R 0.4366, k 3, q 2
unrestricted unrestricted
2 2
( R R )/ q
unrestricted restricted
So F
2
(1 R )/( n k 1)
unrestricted unres