Documento 5 - Non linear regression functions

Appunti relativi al quinto pacchetto di slide affrontati durante il corso di Econometria dalla professoressa Aparicio nell'anno 2023/24, i quali verranno ripresi dalla professoressa Borella nell'anno 2024/25.

Esame Econometrics

Facoltà Economia

Dal corso del Prof. Aparicio Fenoll Aiona

Università Università degli studi di Torino

Publisher dimartinodaniel

A.A. 2023-2024

62 pagine

Schemi e mappe concettuali

Vota

Scarica

Estratto del documento

ΔX,

for small 

Y

  * 1/100

1/100 * 

1 X / X



X

 

Now 100 percentage change in X , so a 1% increase

X 

in X ( multiplying X by 1.01) is associated with a .01 1

change in Y . → →

(1% increase in X .01 increase in ln(X ) .01β increase in Y )

Example: TestScore vs. ln(Income)

• First defining the new regressor, ln(Income)

• The model is now linear in ln(Income), so the linear-log model

can be estimated by OLS:

  

TestScore 557.8 36.42 ln( Income )

(3.8) (1.40)

so a 1% increase in Income is associated with an increase in

TestScore of 0.36 points on the test.

• –

Standard errors, confidence intervals, R all the usual tools of

regression apply here.

• How does this compare to the cubic model?

The linear-log and cubic regression functions

II. Log-linear population regression function

(1 of 2) β β

ln(Y ) = + X (b)

0 1

ΔY β β ΔX

Now change X: ln(Y + ) = + (X + ) (a)

0 1

– ΔY – β ΔX

Subtract (a) (b): ln(Y + ) ln(Y ) = 1



Y 

 

so X

Y 

Y / Y

  

or (small X )



1 X

II. Log-linear population regression function

(2 of 2)  

  

ln(

Y ) X u

i 0 1 i i



Y / Y



 

for small X , 

1 X



Y

 

• Now 100 percentage change in Y , so a change in X by

Y 



one unit ( X = 1) is associated with a 100 % change in Y .

• → β

1 unit increase in X increase in ln(Y )

→ 100β % increase in Y

• Note: What are the units of u and the SER?

o fractional (proportional) deviations

for example, SER = .2 means…

III. Log-log population regression function

(1 of 2) β β

ln(Y ) = + ln(X ) + u (b)

i 0 1 i i

ΔY β β ΔX

Now change X: ln(Y + ) = + ln(X + ) (a)

0 1

ΔY – β βX –

Subtract: ln(Y + ) ln(Y ) = [ln(X + ) ln(X )]

 

Y X





so 1

Y X



Y / Y

  

or (small X )



1 X / X

III. Log-log population regression function

(2 of 2) β β

ln(Y ) = + ln(X ) + u

i 0 1 i i

ΔX,

for small 

Y / Y

  

1 X / X

 

Y X

   

Now 100 percentage change in Y , and 100 percentage

Y X 

change in X , so a 1% change in X is associated with a %

change in Y . β

In the log-log specification, has the interpretation of an

elasticity.

Example: ln(TestScore) vs. ln(Income) (1 of 2)

• First defining a new dependent variable, ln(TestScore), and the

new regressor, ln(Income)

• The model is now a linear regression of ln(TestScore) against

ln(Income), which can be estimated by OLS:

  

ln(

TestScore ) 6.336 0.0554 ln( Income )

(0.006) (0.0021)

An 1% increase in Income is associated with an increase of .0554%

in TestScore (Income up by a factor of 1.01, TestScore up by a

factor of 1.000554)

Example: ln(TestScore) vs. ln(Income) (2 of 2)

  

ln(

TestScore ) 6.336 0.0554 ln( Income )

(0.006) (0.0021)

• For example, suppose income increases from $10,000 to

$11,000, or by 10%. Then TestScore increases by approximately

.0554 10% = .554%. If TestScore = 650, this corresponds to an

increase of .00554 650 = 3.6 points.

• How does this compare to the log-linear model?

The log-linear and log-log specifications:

• Note vertical axis

• model doesn’t seem to fit as well as the log-log

The log-linear model,

based on visual inspection.

Summary: Logarithmic transformations

• Three cases, differing in whether Y and/or X is transformed by

taking logarithms.

• The regression is linear in the new variable(s) ln(Y ) and/or ln(X ),

and the coefficients can be estimated by OLS.

• Hypothesis tests and confidence intervals are now implemented

and interpreted “as usual.”

• β

The interpretation of differs from case to case.

The choice of specification (functional form) should be guided by

judgment (which interpretation makes the most sense in your

application?), tests, and plotting predicted values

Other nonlinear functions (and nonlinear

least squares) (SW Appendix 8.1)

The foregoing regression functions have limitations…

• Polynomial: test score can decrease with income

• Linear-log: test score increases with income, but without bound

• Here is a nonlinear function in which Y always increases with X

and there is a maximum (asymptote) value of Y:



  

  X

Y e 1

β β α

, , and are unknown parameters. This is called a negative

0 1 → ∞ is β

exponential growth curve. The asymptote as X .

Negative exponential growth

We want to estimate the parameters of,



  

  

X

Y e u

1 i

i 0 i

 

  

  

( X )

or Y [1 e ] u (*)

1 i 2

i 0 i



 



where e (why would you do this???)

Compare model (*) to linear-log or cubic models:

 

  

Y ln( X ) u

i 0 1 i i

   

    

2 3

Y X X X u

i 0 1 i 2 i 2 i i

The linear-log and polynomial models are linear in the

β β –

parameters and but the model (*) is not.

0 1

Nonlinear Least Squares

• Models that are linear in the parameters can be estimated by OLS.

• Models that are nonlinear in one or more parameters can be

estimated by nonlinear least squares (NLS) (but not by OLS)

• The NLS problem for the proposed specification:

 

 2

 

 

  

  ( X )

min Y 1 e

 

1 i 2

  

, , i 0

0 1 2 

i 1

This is a nonlinear minimization problem (a “hill-climbing”

problem). How could you solve this?

– Guess and check

– There are better ways…

– Implementation in STATA…

. nl (testscr = {b0=720}*(1 - exp(-1*{b1}*(avginc-{b2})))), r

(obs = 420)

Iteration 0: residual SS = 1.80e+08 .

Iteration 1: residual SS = 3.84e+07 .

Iteration 2: residual SS = 4637400 .

Iteration 3: residual SS = 300290.9 STATA is “climbing the hill”

Iteration 4: residual SS = 70672.13 (actually, minimizing the SSR)

Iteration 5: residual SS = 66990.31 .

Iteration 6: residual SS = 66988.4 .

Iteration 7: residual SS = 66988.4 .

Iteration 8: residual SS = 66988.4

Nonlinear regression with robust standard errors Number of obs = 420

F( 3, 417) = 687015.55

Prob > F = 0.0000

R-squared = 0.9996

Root MSE = 12.67453

Res. dev. = 3322.157

------------------------------------------------------------------------------

| Robust

testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

b0 | 703.2222 4.438003 158.45 0.000 694.4986 711.9459

b1 | .0552339 .0068214 8.10 0.000 .0418253 .0686425

b2 | -34.00364 4.47778 -7.59 0.000 -42.80547 -25.2018

------------------------------------------------------------------------------

(SEs, P values, CIs, and correlations are asymptotic approximations)

Negative exponential growth; RMSE = 12.675

Linear-log; RMSE = 12.618 (slightly better!)

Interactions Between Independent Variables

(SW Section 8.3)

• Perhaps a class size reduction is more effective in some

circumstances than in others…

• Perhaps smaller classes help more if there are many English

learners, who need individual attention



TestScore

• That is, might depend on PctEL



STR 

Y

• More generally, might depend on X

 2

X 1

• How to model such “interactions” between X and X ?

1 2

• X’s, X’s

We first consider binary then continuous

(a) Interactions between two binary

variables β β β

Y = + D + D + u

i 0 1 1i 2 2i i

• D , D are binary

1i 2i

• β is the effect of changing D = 0 to D = 1. In this specification,

1 1 1

this effect doesn’t depend on the value of D .

• To allow the effect of changing D to depend on D , include the

1 2

“interaction term” ×

D D as a regressor:

1i 2i

β β β β ×

Y = + D + D + (D D ) + u

i 0 1 1i 2 2i 3 1i 2i i

Interpreting the coefficients

β β β β ×

Y = + D + D + (D D ) + u

i 0 1 1i 2 2i 3 1i 2i i

General rule: compare the various cases

β β

E(Y |D = 0, D = d ) = + d (b)

i 1i 2i 2 0 2 2

β β β β

E(Y |D = 1, D = d ) = + + d + d (a)

i 1i 2i 2 0 1 2 2 3 2

–

subtract (a) (b): – β β

E(Y |D = 1, D = d ) E(Y |D = 0, D = d ) = + d

i 1i 2i 2 i 1i 2i 2 1 3 2

• The effect of D depends on d (what we wanted)

1 2

• β = increment to the effect of D , when D = 1

3 1 2

Example: TestScore, STR, English learners (1 of 2)

Let  

 

1 if STR 20 1 if PctEL l0

 

 

HiSTR and HiEL

 

 

0 if STR 20 0 if PctEL 10

    

TestScore 664.1 18.2 HiEL 1.9 HiSTR 3.5( HiSTR HiEL )

(1.4) (2.3) (1.9) (3.1)

• “Effect” of –1.9

HiSTR when HiEL = 0 is

• “Effect” of –1.9 – –5.4

HiSTR when HiEL = 1 is 3.5 =

• Class size reduction is estimated to have a bigger effect when the

percent of English learners is large

• This interaction isn’t statistically significant: t = 3.5/3.1

Example: TestScore, STR, English learners (2 of 2)

Let  

 

1 if STR 20 1 if PctEL l0

 

 

HiSTR and HiEL

 

 

0 if STR 20 0 if PctEL 10

    

TestScore 664.1 18.2 HiEL 1.9 HiSTR 3.5( HiSTR HiEL )

(1.4) (2.3) (1.9) (3.1)

• Can you relate these coefficients to the following table of group

(“cell”) means? Y hat (0,0) Y hat (1,0)

Low STR High STR

Y hat (0,0) Low EL 664.1 662.2

Y hat (0,1) High EL 645.9 640.5

Y hat (1,1)

(b) Interactions between continuous and

binary variables β β β

Y =

Anteprima

Vedrai una selezione di 14 pagine su 62