Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
vuoi
o PayPal
tutte le volte che vuoi
ΔX,
for small
Y
* 1/100
1/100 *
1 X / X
X
Now 100 percentage change in X , so a 1% increase
X
in X ( multiplying X by 1.01) is associated with a .01 1
change in Y . → →
(1% increase in X .01 increase in ln(X ) .01β increase in Y )
1
Example: TestScore vs. ln(Income)
• First defining the new regressor, ln(Income)
• The model is now linear in ln(Income), so the linear-log model
can be estimated by OLS:
TestScore 557.8 36.42 ln( Income )
i
(3.8) (1.40)
so a 1% increase in Income is associated with an increase in
TestScore of 0.36 points on the test.
• –
2
Standard errors, confidence intervals, R all the usual tools of
regression apply here.
• How does this compare to the cubic model?
The linear-log and cubic regression functions
II. Log-linear population regression function
(1 of 2) β β
ln(Y ) = + X (b)
0 1
ΔY β β ΔX
Now change X: ln(Y + ) = + (X + ) (a)
0 1
– ΔY – β ΔX
Subtract (a) (b): ln(Y + ) ln(Y ) = 1
Y
so X
1
Y
Y / Y
or (small X )
1 X
II. Log-linear population regression function
(2 of 2)
ln(
Y ) X u
i 0 1 i i
Y / Y
for small X ,
1 X
Y
• Now 100 percentage change in Y , so a change in X by
Y
one unit ( X = 1) is associated with a 100 % change in Y .
1
• → β
1 unit increase in X increase in ln(Y )
1
→ 100β % increase in Y
1
• Note: What are the units of u and the SER?
i
o fractional (proportional) deviations
for example, SER = .2 means…
o
III. Log-log population regression function
(1 of 2) β β
ln(Y ) = + ln(X ) + u (b)
i 0 1 i i
ΔY β β ΔX
Now change X: ln(Y + ) = + ln(X + ) (a)
0 1
ΔY – β βX –
Subtract: ln(Y + ) ln(Y ) = [ln(X + ) ln(X )]
1
Y X
so 1
Y X
Y / Y
or (small X )
1 X / X
III. Log-log population regression function
(2 of 2) β β
ln(Y ) = + ln(X ) + u
i 0 1 i i
ΔX,
for small
Y / Y
1 X / X
Y X
Now 100 percentage change in Y , and 100 percentage
Y X
change in X , so a 1% change in X is associated with a %
1
change in Y . β
In the log-log specification, has the interpretation of an
1
elasticity.
Example: ln(TestScore) vs. ln(Income) (1 of 2)
• First defining a new dependent variable, ln(TestScore), and the
new regressor, ln(Income)
• The model is now a linear regression of ln(TestScore) against
ln(Income), which can be estimated by OLS:
ln(
TestScore ) 6.336 0.0554 ln( Income )
i
(0.006) (0.0021)
An 1% increase in Income is associated with an increase of .0554%
in TestScore (Income up by a factor of 1.01, TestScore up by a
factor of 1.000554)
Example: ln(TestScore) vs. ln(Income) (2 of 2)
ln(
TestScore ) 6.336 0.0554 ln( Income )
i
(0.006) (0.0021)
• For example, suppose income increases from $10,000 to
$11,000, or by 10%. Then TestScore increases by approximately
×
.0554 10% = .554%. If TestScore = 650, this corresponds to an
×
increase of .00554 650 = 3.6 points.
• How does this compare to the log-linear model?
The log-linear and log-log specifications:
• Note vertical axis
• model doesn’t seem to fit as well as the log-log
The log-linear model,
based on visual inspection.
Summary: Logarithmic transformations
• Three cases, differing in whether Y and/or X is transformed by
taking logarithms.
• The regression is linear in the new variable(s) ln(Y ) and/or ln(X ),
and the coefficients can be estimated by OLS.
• Hypothesis tests and confidence intervals are now implemented
and interpreted “as usual.”
• β
The interpretation of differs from case to case.
1
The choice of specification (functional form) should be guided by
judgment (which interpretation makes the most sense in your
application?), tests, and plotting predicted values
Other nonlinear functions (and nonlinear
least squares) (SW Appendix 8.1)
The foregoing regression functions have limitations…
• Polynomial: test score can decrease with income
• Linear-log: test score increases with income, but without bound
• Here is a nonlinear function in which Y always increases with X
and there is a maximum (asymptote) value of Y:
X
Y e 1
0
β β α
, , and are unknown parameters. This is called a negative
0 1 → ∞ is β
exponential growth curve. The asymptote as X .
0
Negative exponential growth
We want to estimate the parameters of,
X
Y e u
1 i
i 0 i
( X )
or Y [1 e ] u (*)
1 i 2
i 0 i
where e (why would you do this???)
2
0
Compare model (*) to linear-log or cubic models:
Y ln( X ) u
i 0 1 i i
2 3
Y X X X u
i 0 1 i 2 i 2 i i
The linear-log and polynomial models are linear in the
β β –
parameters and but the model (*) is not.
0 1
Nonlinear Least Squares
• Models that are linear in the parameters can be estimated by OLS.
• Models that are nonlinear in one or more parameters can be
estimated by nonlinear least squares (NLS) (but not by OLS)
• The NLS problem for the proposed specification:
n
2
( X )
min Y 1 e
1 i 2
, , i 0
0 1 2
i 1
This is a nonlinear minimization problem (a “hill-climbing”
problem). How could you solve this?
– Guess and check
– There are better ways…
– Implementation in STATA…
. nl (testscr = {b0=720}*(1 - exp(-1*{b1}*(avginc-{b2})))), r
(obs = 420)
Iteration 0: residual SS = 1.80e+08 .
Iteration 1: residual SS = 3.84e+07 .
Iteration 2: residual SS = 4637400 .
Iteration 3: residual SS = 300290.9 STATA is “climbing the hill”
Iteration 4: residual SS = 70672.13 (actually, minimizing the SSR)
Iteration 5: residual SS = 66990.31 .
Iteration 6: residual SS = 66988.4 .
Iteration 7: residual SS = 66988.4 .
Iteration 8: residual SS = 66988.4
Nonlinear regression with robust standard errors Number of obs = 420
F( 3, 417) = 687015.55
Prob > F = 0.0000
R-squared = 0.9996
Root MSE = 12.67453
Res. dev. = 3322.157
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
b0 | 703.2222 4.438003 158.45 0.000 694.4986 711.9459
b1 | .0552339 .0068214 8.10 0.000 .0418253 .0686425
b2 | -34.00364 4.47778 -7.59 0.000 -42.80547 -25.2018
------------------------------------------------------------------------------
(SEs, P values, CIs, and correlations are asymptotic approximations)
Negative exponential growth; RMSE = 12.675
Linear-log; RMSE = 12.618 (slightly better!)
Interactions Between Independent Variables
(SW Section 8.3)
• Perhaps a class size reduction is more effective in some
circumstances than in others…
• Perhaps smaller classes help more if there are many English
learners, who need individual attention
TestScore
• That is, might depend on PctEL
STR
Y
• More generally, might depend on X
2
X 1
• How to model such “interactions” between X and X ?
1 2
• X’s, X’s
We first consider binary then continuous
(a) Interactions between two binary
variables β β β
Y = + D + D + u
i 0 1 1i 2 2i i
• D , D are binary
1i 2i
• β is the effect of changing D = 0 to D = 1. In this specification,
1 1 1
this effect doesn’t depend on the value of D .
2
• To allow the effect of changing D to depend on D , include the
1 2
“interaction term” ×
D D as a regressor:
1i 2i
β β β β ×
Y = + D + D + (D D ) + u
i 0 1 1i 2 2i 3 1i 2i i
Interpreting the coefficients
β β β β ×
Y = + D + D + (D D ) + u
i 0 1 1i 2 2i 3 1i 2i i
General rule: compare the various cases
β β
E(Y |D = 0, D = d ) = + d (b)
i 1i 2i 2 0 2 2
β β β β
E(Y |D = 1, D = d ) = + + d + d (a)
i 1i 2i 2 0 1 2 2 3 2
–
subtract (a) (b): – β β
E(Y |D = 1, D = d ) E(Y |D = 0, D = d ) = + d
i 1i 2i 2 i 1i 2i 2 1 3 2
• The effect of D depends on d (what we wanted)
1 2
• β = increment to the effect of D , when D = 1
3 1 2
Example: TestScore, STR, English learners (1 of 2)
Let
1 if STR 20 1 if PctEL l0
HiSTR and HiEL
0 if STR 20 0 if PctEL 10
TestScore 664.1 18.2 HiEL 1.9 HiSTR 3.5( HiSTR HiEL )
(1.4) (2.3) (1.9) (3.1)
• “Effect” of –1.9
HiSTR when HiEL = 0 is
• “Effect” of –1.9 – –5.4
HiSTR when HiEL = 1 is 3.5 =
• Class size reduction is estimated to have a bigger effect when the
percent of English learners is large
• This interaction isn’t statistically significant: t = 3.5/3.1
Example: TestScore, STR, English learners (2 of 2)
Let
1 if STR 20 1 if PctEL l0
HiSTR and HiEL
0 if STR 20 0 if PctEL 10
TestScore 664.1 18.2 HiEL 1.9 HiSTR 3.5( HiSTR HiEL )
(1.4) (2.3) (1.9) (3.1)
• Can you relate these coefficients to the following table of group
(“cell”) means? Y hat (0,0) Y hat (1,0)
Low STR High STR
Y hat (0,0) Low EL 664.1 662.2
Y hat (0,1) High EL 645.9 640.5
Y hat (1,1)
(b) Interactions between continuous and
binary variables β β β
Y =