Che materia stai cercando?

Anteprima

ESTRATTO DOCUMENTO

Two Stage Least Squares, ctd.

Suppose you have a valid instrument, Z .

i ˆ

on Z , obtain the predicted values

Stage 1: Regress X X

i i i

ˆ ˆ

on ; the coefficient on is

Stage 2: Regress Y X X

i i i

β

ˆ TSLS

the TSLS estimator, .

1

β

β

ˆ TSLS is a consistent estimator of .

1

1 7

The IV Estimator, one X and one Z, ctd.

Explanation #2: a little algebra…

β β

= + X + u

Y i 0 1 i i

Thus, β β

cov(Y ,Z ) = cov( + X + u ,Z )

i i 0 1 i i i

β β

,Z ) + cov( X ,Z ) + cov(u ,Z )

= cov( 0 i 1 i i i i

β

= 0 + cov( X ,Z ) + 0

1 i i

β

= cov(X ,Z )

1 i i

,Z ) = 0 (instrument exogeneity hypothesis); thus

where cov(u i i cov(

Y , Z )

β i i

=

1 cov( X , Z )

i i 8

The IV Estimator, one X and one Z, ctd.

cov(

Y , Z )

β i i

=

1 cov( X , Z )

i i

The IV estimator replaces these population covariances with

sample covariances: s

β

ˆ TSLS YZ

= ,

1 s XZ

s and s are the sample covariances.

YZ XZ

This is the same as the TSLS estimator – just a different

derivation! 9

Consistency of the TSLS estimator

s

β

ˆ TSLS YZ

=

1 s XZ p p

→ →

cov(Y,Z) and s

The sample covariances are consistent: s

YZ XZ

cov(X,Z). Thus, p cov(

Y , Z )

s β

β →

ˆ TSLS YZ

= = 1

1 cov( X , Z )

s XZ

• ≠

The instrument relevance condition, cov(X,Z) 0, ensures that

you don’t divide by zero. 10

Example: Supply and demand for butter

IV regression was originally developed to estimate demand

elasticities for agricultural goods, for example butter:

β β

butter butter

ln( ) = + ln( ) + u

Q P

0 1 i

i i

β

• = price elasticity of butter = percent change in quantity for a

1

1% change in price (recall log-log specification discussion)

• Data: observations on price and quantity of butter for different

years

• butter butter

Q ) on ln( P ) suffers from

The OLS regression of ln( i i

simultaneous causality bias (why?) 11

butter

Simultaneous causality bias in the OLS regression of ln( ) on

Q

i

butter ) arises because price and quantity are determined by the

ln( P

i of demand and supply

interaction 12

This interaction of demand and supply produces…

Would a regression using these data produce the demand curve? 13

But…what would you get if only supply shifted?

• TSLS estimates the demand curve by isolating shifts in price

and quantity that arise from shifts in supply.

• Z is a variable that shifts supply but not demand. 14

TSLS in the supply-demand example:

β β

butter butter

ln( ) = + ln( ) + u

Q P

0 1 i

i i

Let Z = rainfall in dairy-producing regions.

Is Z a valid instrument? ,u ) = 0?

(1) Exogenous? corr(rain i i

Plausibly: whether it rains in dairy-producing regions

shouldn’t affect demand ≠

butter

(2) Relevant? corr(rain ,ln( P )) 0?

i i

Plausibly: insufficient rainfall means less grazing means

less butter 15

TSLS in the supply-demand example, ctd.

β β

butter butter

) = + ln( ) + u

ln( Q P

0 1 i

i i

Z = rain = rainfall in dairy-producing regions.

i i ˆ butter

ln( P )

butter

P ) on rain, get

Stage 1: regress ln( i

i

ˆ butter

ln( P ) isolates changes in log price that arise from

i

supply (part of supply, at least)

ˆ butter

ln( P )

butter

Q ) on

Stage 2: regress ln( i

i

The regression counterpart of using shifts in the supply

curve to trace out the demand curve. 16

Inference using TSLS

• In large samples, the sampling distribution of the TSLS

estimator is normal

• Inference (hypothesis tests, confidence intervals) proceeds in the

usual way, e.g. ± 1.96×SE

• The idea behind the large-sample normal distribution of the

TSLS estimator is that – like all the other estimators we have

considered – it involves an average of mean zero i.i.d. random

variables, to which we can apply the CLT.

• Here is a sketch of the math (see SW App. 12.3 for the details)...

17

n

1 − −

∑ (

Y Y )( Z Z )

− i i

s n 1

β

ˆ =

TSLS 1

i

YZ

= =

1 n

1

s − −

XZ ( X X )( Z Z )

− i i

n 1 =

i 1

n −

∑ Y ( Z Z )

i i

=

i 1

= n −

∑ X ( Z Z )

i i

=

i 1

β β

= + X + u and simplify:

Substitute in Y i 0 1 i i

n n

β − + −

∑ ∑

X ( Z Z ) u ( Z Z )

1 i i i i

β = =

ˆ TSLS 1 1

i i

=

1 n −

∑ X ( Z Z )

i i

=

i 1

so… 18

n −

∑ u ( Z Z )

i i

β

β =

ˆ TSLS 1

i

= + .

1

1 n −

∑ X ( Z Z )

i i

=

i 1

n −

∑ u ( Z Z )

i i

β

β =

ˆ TSLS i 1

so – =

1

1 n −

∑ X ( Z Z )

i i

=

i 1

:

Multiply through by n n

1 −

∑ Z Z u

( )

i i

n

β

β =

ˆ TSLS i 1

( – ) =

n 1

1 n

1 −

∑ X Z Z

( )

i i

n =

i 1 19

n

1 −

∑ ( Z Z ) u

i i

n

β

β =

ˆ TSLS i 1

( – ) =

n 1

1 n

1 −

∑ X ( Z Z )

i i

n =

i 1

n n p

1 1

• → ≠

− − −

∑ ∑

= cov(X,Z) 0

( ) ( )( )

X Z Z X X Z Z

i i i i

n

n = =

i 1 i 1

n

1 µ

• −

∑ is dist’d N(0,var[(Z– )u]) (CLT)

( )

Z Z u Z

i i

n =

i 1 β σ

β

ˆ TSLS 2ˆ

so: is approx. distributed N( , ),

1 β TSLS

1 1

µ

1 var[( ) ]

Z u

σ 2ˆ i Z i

= .

where β TSLS 2

[cov( , )]

n Z X

1 i i

where cov(X,Z) 0 because the instrument is relevant 20

Inference using TSLS, ctd. β σ

β

ˆ 2ˆ

TSLS is approx. distributed N( , ),

1 β TSLS

1 1

• Statistical inference proceeds in the usual way.

• The justification is (as usual) based on large samples

• This all assumes that the instruments are valid – we’ll discuss

what happens if they aren’t valid shortly.

• Important note on standard errors:

The OLS standard errors from the second stage regression

o aren’t right – they don’t take into account the estimation in

ˆ

the first stage ( is estimated).

X i

Instead, use a single specialized command that computes the

o TSLS estimator and the correct SEs.

as usual, use heteroskedasticity-robust SEs

o 21

Example: Cigarette demand, ctd.

β β

cigarettes cigarettes

) = + ln( ) + u

ln( Q P

0 1 i

i i

• Annual cigarette consumption and average prices paid

(including tax)

• n=48 continental US states

Proposed instrumental variable:

• th

= general sales tax per pack in the i state = SalesTax

Z i i

• Is this a valid instrument? ≠

cigarettes

(1) Relevant? corr(SalesTax , ln( P )) 0?

i i

,u ) = 0?

(2) Exogenous? corr(SalesTax

i i 22

Cigarette demand, ctd.

Though panel data, use data from 1995 only.

First stage OLS regression:

ˆ cigarettes

ln( P ) = 4.62 + .031SalesTax , n = 48

i i

Second stage OLS regression:

ˆ ˆ

cigarettes cigarettes

ln(

Q ) ln( P )

= 9.72 – 1.08 , n = 48

i i

Combined regression with correct, heteroskedasticity-robust

standard errors:

ˆ ˆ

cigarettes cigarettes

ln(

Q ) ln( P )

= 9.72 – 1.08 , n = 48

i i

(1.53) (0.32) 23

Summary of IV Regression with a Single X and Z

• A valid instrument Z must satisfy two conditions:

(1) relevance: corr(Z ,X ) 0

i i

(2) exogeneity: corr(Z ,u ) = 0

i i

• , then

TSLS proceeds by first regressing X on Z to get X̂

regressing Y on X̂ .

• The key idea is that the first stage isolates part of the variation in

X that is uncorrelated with u

• If the instrument is valid, then the large-sample sampling

distribution of the TSLS estimator is normal, so inference

proceeds as usual 24

The General IV Regression Model

(SW Section 12.2)

• So far we have considered IV regression with a single

endogenous regressor (X) and a single instrument (Z).

• We need to extend this to:

multiple endogenous regressors (X ,…,X )

o 1 k

multiple included exogenous variables (W ,…,W )

o 1 r

These need to be included for the usual OV reason

,…,Z )

multiple instrumental variables (Z

o 1 m

More (relevant) instruments can produce a smaller variance

2

of TSLS: the R of the first stage increases, so you have

.

more variation in X̂

• Terminology: identification & overidentification 25

Identification

• In general, a parameter is said to be identified if different values

of the parameter would produce different distributions of the

data.

• In IV regression, whether the coefficients are identified depends

on the relation between the number of instruments (m) and the

number of endogenous regressors (k)

• Intuitively, if there are fewer instruments than endogenous

β β

regressors, we can’t estimate ,…,

1 k

For example, suppose k = 1 but m = 0 (no instruments)!

o 26

Identification, ctd.

β β

The coefficients ,…, are said to be:

1 k

• exactly identified if m = k. β β

There are just enough instruments to estimate ,…, .

1 k

• overidentified if m > k. β β

,…, .

There are more than enough instruments to estimate 1 k

If so, you can test whether the instruments are valid (a test of

the “overidentifying restrictions”) – we’ll return to this later

• underidentified if m < k. β β

There are too few instruments to estimate ,…, . If so, you

1 k

need to get more instruments! 27

The general IV regression model: Summary of jargon

β β β β β

= + X + … + X + W + … + W + u

Y i 0 1 1i k ki k+1 1i k+r ri i

• Y is the dependent variable

i

• X ,…, X are the endogenous regressors (potentially correlated

1i ki

)

with u i

• ,…,W are the included exogenous variables or included

W 1i ri )

exogenous regressors (uncorrelated with u i

β β β

• , ,…, are the unknown regression coefficients

0 1 k+r

• Z ,…,Z are the m instrumental variables (the excluded

1i mi

exogenous variables)

• The coefficients are overidentified if m > k; exactly identified if

m = k; and underidentified if m < k. 28


PAGINE

26

PESO

451.44 KB

AUTORE

Atreyu

PUBBLICATO

+1 anno fa


DETTAGLI
Corso di laurea: Corso di laurea in economia, mercati e istituzioni
SSD:
Università: Bologna - Unibo
A.A.: 2011-2012

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher Atreyu di informazioni apprese con la frequenza delle lezioni di Econometria applicata e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Bologna - Unibo o del prof Golinelli Roberto.

Acquista con carta o conto PayPal

Scarica il file tutte le volte che vuoi

Paga con un conto PayPal per usufruire della garanzia Soddisfatto o rimborsato

Recensioni
Ti è piaciuto questo appunto? Valutalo!

Altri appunti di Econometria applicata

Econometria - Elementi
Dispensa
Riepilogo di concetti statistici
Dispensa
Regressione multipla
Dispensa
Regressione non lineare
Dispensa