Che materia stai cercando?

Anteprima

ESTRATTO DOCUMENTO

The Least Squares Assumptions

β β

= + X + u , i = 1,…, n

Y i 0 1 i i

1. The conditional distribution of u given X has mean zero, that is,

) = 0 and that Cov(u , X ) = 0

E(u|X = x) = 0 , i.e. that E(u i i i

β

ˆ

This implies that is unbiased

1

,Y ), i =1,…,n, are i.i.d.

2. (X i i

• This is true if X, Y are collected by simple random sampling

• β β

ˆ ˆ

This delivers the sampling distribution of and

0 1

3. Large outliers in X and/or Y are rare.

• Technically, X and Y have finite fourth moments

• β

ˆ

Outliers can result in meaningless values of 1 23

Least squares assumption #1: E(u|X = x) = 0.

For any given value of X, the mean of u is zero:

β β

Example: Test Score = + STR + u , u = other factors

i 0 1 i i i

• What are some of these “other factors”?

• Is E(u|X=x) = 0 plausible for these other factors? 24

Least squares assumption #1, ctd.

A benchmark for thinking about this assumption is to consider an ideal

randomized controlled experiment:

• X is randomly assigned to people (students randomly assigned to

different size classes; patients randomly assigned to medical

treatments). Randomization is done by computer – using no

information about the individual.

• Because X is assigned randomly, all other individual characteristics –

the things that make up u – are independently distributed of X

• Thus, in an ideal randomized controlled experiment, E(u|X = x) = 0

(that is, LSA #1 holds)

• In actual experiments, or with observational data, we will need to

think hard about whether E(u|X = x) = 0 holds. 25

Least squares assumption #2: (X ,Y ), i = 1,…,n are i.i.d.

i i

This arises automatically if the entity (individual, district) is sampled

by simple random sampling: the entity is selected then, for that entity, X

and Y are observed (recorded).

The main place we will encounter non-i.i.d. sampling is when data

are recorded over time (“time series data”) – this will introduce some

extra complications. 26

Least squares assumption #3: Large outliers are rare

∞ ∞

4 4

) < and 0 < E(Y ) <

Technical statement: 0 < E(X

• A large outlier is an extreme value of X or Y

• On a technical level, if X and Y are bounded, then they have finite

fourth moments. (Standardized test scores automatically satisfy this;

STR, family income, etc. satisfy this too).

• However, the substance of this assumption is that a large outlier

can strongly influence the results 27

OLS can be sensitive to an outlier:

• Is the lone point an outlier in X or Y?

• In practice, outliers often are data glitches (coding/recording

problems) – so check your data for outliers! The easiest way is to

produce a scatterplot. 28

The Sampling Distribution of the OLS Estimator

(SW Section 4.5)

The OLS estimator is computed from a sample of data; a different

β

ˆ . This is the source of the “sampling

sample gives a different value of 1

β

ˆ . We want to:

uncertainty” of 1 β

• ˆ

quantify the sampling uncertainty associated with 1

β

β

• ˆ to test hypotheses such as = 0

use 1

1 β

• construct a confidence interval for 1

• All these require figuring out the sampling distribution of the OLS

estimator. Two steps to get there…

Probability framework for linear regression

o Distribution of the OLS estimator

o 29

Probability Framework for Linear Regression

The probability framework for linear regression is summarized by the

three least squares assumptions.

Population

The group of interest (ex: all possible school districts)

Random variables: Y, X

Ex: (Test Score, STR)

Joint distribution of (Y, X)

The population regression function is linear

st Least Squares Assumption)

E(u|X) = 0 (1 rd

X, Y have finite fourth moments (3 LSA)

Data Collection by simple random sampling:

nd

{(X , Y )}, i = 1,…, n, are i.i.d. (2 LSA)

i i 30

β

ˆ

The Sampling Distribution of 1

β

ˆ

, has a sampling distribution.

Like Y 1 β

• ˆ )? (where is it centered?)

What is E( 1 β

β

ˆ

If E( ) = , then OLS is unbiased – a good thing!

o 1

1 β

• ˆ )? (measure of sampling uncertainty)

What is var( 1 β

• ˆ in small samples?

What is the distribution of 1

It can be very complicated in general

o β

• ˆ

What is the distribution of in large samples?

1 β

ˆ

It turns out to be relatively simple – in large samples, is

o 1

normally distributed. 31

β

ˆ

The mean and variance of the sampling distribution of 1

Some preliminary algebra:

β β

Y = + X + u

i 0 1 i i

β β

= + +

Y u

X

0 1

β

– = (X – ) + (u – ) . Thus, given the formula:

so Y Y X u

i 1 i i

n

∑ − −

( X X )(

Y Y )

i i

β

ˆ =

i 1

= , we can substitute:

1 n

∑ − 2

( X X )

i

=

i 1

n

∑ β

− − + −

( X X )[ ( X X ) ( u u )]

i 1 i i

=

i 1

= n

∑ − 2

( X X )

i

=

i 1 32

n n

∑ ∑

− − − −

( X X )( X X ) ( X X )( u u )

i i i i

β

β +

ˆ = =

1 1

i i

= 1

1 n n

∑ ∑

− −

2 2

( X X ) ( X X )

i i

= =

i 1 i 1

n

∑ − −

( X X )( u u )

i i

β

β

ˆ =

i 1

– = .

so 1

1 n

∑ − 2

( X X )

i

=

i 1  

n n n

∑ ∑ ∑

− − − −

= –

Now ( X X )( u u ) ( X X ) u ( X X ) u

 

i i

i i i

 

= = =

i 1 i 1 i 1

 

 

n n

n ∑

∑ ∑

− −

= – =

X nX u X X u

( X X ) u ( )

 

  i i

i i i

 

  =

= = i 1

i 1 i 1 33

n n

∑ ∑

− − −

Substitute = into the expression for

( X X )( u u ) ( X X ) u

i i

i i

= =

i 1 i 1

β

β

ˆ – :

1

1 n

∑ − −

( X X )( u u )

i i

β

β

ˆ =

i 1

– =

1

1 n

∑ − 2

( X X )

i

=

i 1

so n

∑ −

( X X ) u

i i

β

β =

ˆ i 1

– =

1

1 n

∑ − 2

( X X )

i

=

i 1 34

β β

ˆ ˆ

Now we can calculate E( ) and var( ):

1 1  

n

∑ −

( X X ) u

 

i i

β β β

β β β =

ˆ ˆ ˆ  

i 1

E( – ) = E( ) – E( ) then: E( ) = + E

1 1 1

1 1 1 n

 

∑ − 2

( )

X X

 

i

 

=

i 1

 

 

n

∑ −

 

( X X ) u

 

 

i i

β β

=

 

i 1

 

+ = + 0

= ,...,

E E X X

1 1

n

1

n

 

 

− 2

( )

X X

 

i

 

 

 

=

1

i

because E(u |X =x) = 0 by LSA #1

i i β

β

• ˆ

Thus LSA #1 implies that E( ) = 1

1 β

β

• ˆ is an unbiased estimator of .

That is, 1

1

• For details see App. 4.3 35

β

ˆ

Next calculate var( ):

1

write n n

1

∑ ∑

( X X ) u v

i i i

n

β

β

ˆ = =

i 1 i 1

– = =

1 −

1  

n n 1

∑ − 2

2 s

 

( X X ) X

i  

n

=

i 1 −

n 1

σ

≈ ≈

2 2

where v = (X – )u . If n is large, and 1, so

s

X

i i i X X n

n

1 ∑ v i

n

β

β ≈

ˆ =

i 1

– ,

1 σ

1 2

X

= ( – ) (see App. 4.3). Thus,

where v X u

X

i i i 36

n

1 ∑ v i

n

β

β ≈

ˆ =

i 1

– 1 σ

1 2

X

β

β β

ˆ ˆ

so var( – ) = var( )

1

1 1

var( v ) / n

= σ 2 2

( )

X

so µ

var[( X ) u ]

1

β

β ×

ˆ i x i

var( – ) = .

1 σ

1 4

n X

Summary so far β

β β

• ˆ ˆ

is unbiased: E( ) = – just like !

Y

1

1 1

β

• ˆ ) is inversely proportional to n – just like !

var( Y

1 37

β

ˆ

What is the sampling distribution of ?

1

The exact sampling distribution is complicated – it depends on the

population distribution of (Y, X) – but when n is large we get some

simple (and good) approximations: p

β β

β β β →

ˆ ˆ ˆ

(1) Because var( ) depends on 1/n and E( ) = ,

1 1

1 1 1

β

ˆ

(2) When n is large, the sampling distribution of is

1

well approximated by a normal distribution (CLT)

}, i = 1,…, n is i.i.d. with E(v) = 0 and

Recall the CLT: suppose {v

i n

1

σ ∑

2

var(v) = . Then, when n is large, is approximately distributed

v

i

n =

i 1

σ 2

N(0, / n ).

v 38

β

ˆ

Large-n approximation to the distribution of :

1

n n

1 1

∑ ∑

v v

i i

n n

β

β ≈

ˆ = =

i 1 i 12

– = , where v = (X – )u

X

1 i i i

− σ

1  

1

n 2

s

  X

X

 

n µ

• ≈

When n is large, v = (X – )u (X – )u , which is i.i.d. (why?)

X

i i i i X i

n

1 ∑

) < (why?). So, by the CLT, is approximately

and var(v v

i i

n =

i 1

σ 2

distributed N(0, / n ).

v

β

• ˆ is approximately distributed

Thus, for n large, 1 σ

 

2 µ

β

β

ˆ v

~ , where v = (X – )u

N ,

  i i X i

σ

1 1 4

n

 

X 39

β

ˆ

The larger the variance of X, the smaller the variance of 1

The math µ

var[( X ) u ]

1

β

β ×

ˆ i x i

var( – ) =

1 σ

1 4

n X

σ 2

where = var(X ). The variance of X appears in its square in the

i

X

denominator – so increasing the spread of X decreases the variance of

β .

1

The intuition

If there is more variation in X, then there is more information in the

data that you can use to fit the regression line. This is most easily seen

in a figure… 40


PAGINE

22

PESO

2.55 MB

AUTORE

Atreyu

PUBBLICATO

+1 anno fa


DESCRIZIONE DISPENSA

Materiale didattico per il corso di Econometria applicata del prof. Roberto Golinelli. Trattasi di slides in lingua inglese a cura del docente, all'interno delle quali sono affrontati i seguenti argomenti: la regressione lineare; l'Ordinary Least Squares Estimator (OLS); l'errore standard di regressione (SER).


DETTAGLI
Corso di laurea: Corso di laurea in economia, mercati e istituzioni
SSD:
Università: Bologna - Unibo
A.A.: 2011-2012

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher Atreyu di informazioni apprese con la frequenza delle lezioni di Econometria applicata e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Bologna - Unibo o del prof Golinelli Roberto.

Acquista con carta o conto PayPal

Scarica il file tutte le volte che vuoi

Paga con un conto PayPal per usufruire della garanzia Soddisfatto o rimborsato

Recensioni
Ti è piaciuto questo appunto? Valutalo!

Altri appunti di Econometria applicata

Econometria - Elementi
Dispensa
Riepilogo di concetti statistici
Dispensa
Regressione con variabili strumentali
Dispensa
Regressione multipla
Dispensa