Documento 1 - Linear regression with one regressor

Appunti relativi a linear regression with one regressor del corso di Econometria tenuto nell'anno 2023/2024 dalla professoressa Aparicio. Gli stessi appunti verranno utilizzati nell'anno 2024/2025 dalla professoressa Borella.

Esame Econometrics

Facoltà Economia

Dal corso del Prof. Aparicio Fenoll Aiona

Università Università degli studi di Torino

Publisher dimartinodaniel

A.A. 2023-2024

46 pagine

Schemi e mappe concettuali

Vota

Scarica

Estratto del documento

X.

• The slope is the difference in the expected values of Y, for two

values of X that differ by one unit

• The estimated regression can be used either for:

– causal inference (learning about the causal effect on Y of a change in X)

– prediction (predicting the value of Y given X, for an observation not in

the data set)

• Causal inference and prediction place different requirements on

–

the data but both use the same regression toolkit.

The problem of statistical inference for linear regression is, at a general

level, the same as for estimation of the mean or of the differences between

two means. Statistical, or econometric, inference about the slope entails:

• Estimation:

– How should we draw a line through the data to estimate the population

slope?

 Answer: ordinary least squares (OLS).

– What are advantages and disadvantages of OLS?

• Hypothesis testing:

– How to test whether the slope is zero?

• Confidence intervals:

– How to construct a confidence interval for the slope?

The Linear Regression Model (SW Section 4.1)

The population regression line:

β β

Test Score = + STR

0 1

β = slope of population regression line

1 β β “population” parameters?

Why are and

0 1

• β

We would like to know the population value of .

• We don’t know β , so must estimate it using data.

The Population Linear Regression Model

β β

Y = + X + u , i = 1,…, n

i 0 1 i i

• We have n observations, (X , Y ), i = 1,.., n.

i i

• X is the independent variable or regressor

• Y is the dependent variable

• β = intercept

• β = slope

• u = the regression error

• The regression error consists of omitted factors. In general, these omitted

factors are other factors that influence Y, other than the variable X. The

regression error also includes error in the measurement of Y.

The population regression model in a picture: Observations on Y and X

(n = 7); the population regression line; and the regression error (the

“error term”):

The Ordinary Least Squares Estimator

(SW Section 4.2)

β β

How can we estimate and from data?

0 1 μ

Recall that was the least squares estimator of :Ȳ solves,

Y

  2

min (

Y m )

m i



i 1

we will focus on the least squares (“ordinary

By analogy, least

or “OLS”) estimator of the unknown parameters β

squares” 0

and . The OLS estimator solves,

1 n

   2

min [

Y (

b b X )]

b , b i 0 1 i

0 1 

i 1

Mechanics of OLS β β

The population regression line: E(Test Score|STR) = + STR

0 1

  

slope ??

1 n

   2

The OLS estimator solves: min [

Y ( b b X )]

b , b i 0 1 i

0 1 

i 1

• The OLS estimator minimizes the average squared difference

between the actual values of Y and the prediction (“predicted

value”) based on the estimated line.

• This minimization problem can be solved using calculus

(App. 4.2).

• β β

The result is the OLS estimators of and .

0 1

Key Concept 4.2: The OLS Estimator,

Predicted Values, and Residuals

β β

The OLS estimators of the slope and the intercept are

1 0

  

( X X )(

Y Y )

i i s

  



i 1 XY (4.7)

1 n 2

 s

 2 X

( X X )



i 1

ˆ ˆ

 

 

Y X . (4.8)

0 1

ˆ ˆ

The OLS predicted values Y and residuals u are

i i

ˆ ˆ

ˆ  

  

Y X , i 1,..., n (4.9)

i 0 1 i

  

u Y Y , i 1,..., n

. (4.10)

i i i

ˆ ˆ

  ˆ

The estimated intercept ( ), slope ( ), and residual (

u ) are computed

0 1 i



from a sample of n observations of X and Y , i 1,..., n

. These are estimates

i i

 

of the unknown true population intercept ( ), slo

pe ( ), and error term (

u ).

0 1 i

–

Application to the California Test Score

Class Size data



  

• Estimated slope 2.28

1 ˆ



 

• Estimated intercept 698.9

0   

• Estimated regression line: TestScore 698.9 2.28 STR

Interpretation of the estimated slope and

intercept   

• TestScore 698.9 2.28 STR

• Districts with one more student per teacher on average have test

scores that are 2.28 points lower.



E (

Test score

STR )

•  

That is, 2.28



STR

• The intercept (taken literally) means that, according to this

estimated line, districts with zero students per teacher would have a

(predicted) test score of 698.9. But this interpretation of the

–

intercept makes no sense it extrapolates the line outside the range

–

of the data here, the intercept is not economically meaningful.

Predicted values & residuals:

One of the districts in the data set is Antelope, CA, for which

STR = 19.33 and Test Score = 657.8

ˆ   

predicted value: 698.9 – 2.28 19.33 654.8

Y

Antelope  

residual: 657.8 – 654.8 3.0

u Antelope

OLS regression: STATA output

regress testscr str, robust

Regression with robust standard errors Number of obs = 420

F( 1, 418) = 19.26

Prob > F = 0.0000

R-squared = 0.0512

Root MSE = 18.581

-------------------------------------------------------------------------

| Robust

testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]

--------+----------------------------------------------------------------

str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671

_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057

-------------------------------------------------------------------------

 

698.9 – 2.28

TestScore STR

(We’ll discuss the rest of this output later.

)

Measures of Fit (SW Section 4.3)

Two regression statistics provide complementary measures of how

well the regression line “fits” or explains the data:

• 2

The regression R measures the fraction of the variance of Y that

is explained by X; it is unitless and ranges between zero (no fit)

and one (perfect fit)

• The standard error of the regression (SER) measures the

magnitude of a typical regression residual in the units of Y.

The regression R is the fraction of the sample

“explained” by the regression.

variance of Y

   

Y Y u OLS prediction OLS residual

i i i ˆ

   ˆ

sample var (

Y ) sample var(

u )( why ?)

i i

  

total sum of squares “explained” SS “residual” SS

 ˆ ˆ

 2

(

Y Y )

ESS

  

2 2 i 1

Definition of R : R n



TSS  2

(

Y Y )



i 1

• 2

R = 0 means ESS = 0

• 2

R = 1 means ESS = TSS

• 0 ≤ ≤ 1

R

• 2

For regression with a single X, R = the square of the correlation coefficient

between X and Y

The Standard Error of the Regression (SER)

The SER measures the spread of the distribution of u. The SER is

(almost) the sample standard deviation of the OLS residuals:

1 

 

ˆ ˆ 2

SER (

u u )

 i

n 2 

i 1

1 

 ˆ 2

 i

n 2 

i 1 n

1 

 

ˆ ˆ

The second equality holds because u u 0.

n 

i 1

1 

 ˆ 2

SER u

 i

n 2 

i 1

The SER:

has the units of u, which are the units of Y

“size”

measures the average of the OLS residual (the average

“mistake” made by the OLS regression line)

The root mean squared error (RMSE) is closely related to the SER:

1 

 ˆ 2

RMSE u

n 

i 1 –

This measures the same thing as the SER the minor difference is

–

division by 1/n instead of 1/(n 2). –

Technical note: why divide by n 2 instead

–

of n 1? n

1 

 ˆ 2

SER u

 i

n 2 

i 1

 

• Division by 2 is a “degrees of freedom” correction just

n 

like division by n 1 in, except that for the SER , two parameters

ˆ ˆ

    2

have been estimated ( and , by and ), whereas in s

0 1 0 1 Y



onl

y one has been estimated ( , by Y ).

Y

• is large, it doesn’t matter whether – –

When n n, n 1, or n 2 are

– –

used although the conventional formula uses n 2 when there is

a single regressor.

• For details, see Section 18.4

Example of the R and the SER

    

TestScore 698.9 2.28 STR , R .05, SER 18.6

STR explains only a small fraction of the variation in test scores.

Does this make sense? Does this mean the STR is unimportant in

a policy sense?

The Least Squares Assumptions for

Causal Inference (SW Section 4.4)

• So far we have treated OLS as a way to draw a straight line

through the data on Y and X. Under what conditions does the

slope of this line have a causal interpretation? That is, when will

the OLS estimator be unbiased for the causal effect on Y of X?

• What is the variance of the OLS estimator over repeated

samples?

• To answer these questions, we need to make some assumptions

about how Y and X are related to each other, and about how they

are collected (the sampling scheme)

• – –

These assumptions there are three are known as the Least

Squares Assumptions for Causal Inference.

Definition of Causal Effect

• The causal effect on Y of a unit change in X is the expected

difference in Y as measured in a randomized controlled

experiment

– For a binary treatment, the causal effect is the expected difference in

means between the treatment and control groups, as discussed in Ch. 3

• With a binary treatment, for the difference in means to measure a

causal effect requires random assignment or as-if random

assignment.

– Random assignment ensures t

Anteprima

Vedrai una selezione di 11 pagine su 46