Anteprima
Vedrai una selezione di 8 pagine su 31
Appunti Econometrics Pag. 1 Appunti Econometrics Pag. 2
Anteprima di 8 pagg. su 31.
Scarica il documento per vederlo tutto.
Appunti Econometrics Pag. 6
Anteprima di 8 pagg. su 31.
Scarica il documento per vederlo tutto.
Appunti Econometrics Pag. 11
Anteprima di 8 pagg. su 31.
Scarica il documento per vederlo tutto.
Appunti Econometrics Pag. 16
Anteprima di 8 pagg. su 31.
Scarica il documento per vederlo tutto.
Appunti Econometrics Pag. 21
Anteprima di 8 pagg. su 31.
Scarica il documento per vederlo tutto.
Appunti Econometrics Pag. 26
Anteprima di 8 pagg. su 31.
Scarica il documento per vederlo tutto.
Appunti Econometrics Pag. 31
1 su 31
D/illustrazione/soddisfatti o rimborsati
Disdici quando
vuoi
Acquista con carta
o PayPal
Scarica i documenti
tutte le volte che vuoi
Estratto del documento

ECONOMETRICS - 1a lezione

The error term is the fundamental concept in econometrics. It represents something that is omitted in the model specification and is part of various factors such as wages, education, etc. It is important to consider the error term in every model as it indicates that something is wrong with the model. On average, the error term should be equal to 0. If it is not completely random, there is an issue with the model and changes need to be made until it reaches 0. The error term captures all the factors that can affect someone's wage and highlights any incorrect assumptions.

Once our model is deemed good enough, we can use it to test hypotheses and parameters. The econometric modelling process consists of three steps: specification (writing the empirical model), estimation, and testing. An example of a model is yi = a + b'xi + εi, where the variables in bold are the explanatory variables and b is the vector of the estimator variable.

unrelated with the error. If the explanatory variables are endogenous is not possible to use OLS method. If there's a correlation with the error, we want to know the systematic relation. The error must not be correlated, this change the meaning according to the kind of data, this means the error in period T must be unrelated with the error of the previous period. Hypotheses errors have zero mean and are not correlated are condensed in: ~ i.i.d. (0, σε^2) OLS is the most precise estimator, with the less variance, is the linear unbiased estimator. The variance of the error must be constant, it is a homoscedasticity assumption. When the error is considered in a heteroscedasticity assumption OLS is no more the best estimator model, but we use the GLS. If errors are correlated neither OLS nor GLS are good method. 2a lezione Cross sectional data the order in not important, we can say that random sampling is reasonable for cross sectional data. Explanatory variable as well as independent variable are

randomvariables.DGP = data generating process

In identically distributed the DGP Is the same for every unit in oursample.

Independently distributed every unit has the samechance/probability to be extracted, in this case cov is equal to 0.

PDF= probability density function

Heteroskedasticity is one of the problems we must face with crosssectional data.

Another type of data is given by TIME SERIES DATA, consist ofobservations on variables observed over a stretch of time. (i.e2008,2010,2012 …)

Each observation is uncorrelated with other observations, however,is not possible to assume that observations are independent.

Another is in time series data we can observe business cycle, thatcan go very well in some period or even bad in others.

They can also follow some index like GDP.

We also have seasonality, for example production increase duringNovember/December and decrease during summer due to thevacations.

If I want to analyze the movements in stock data, I need to havehigh frequency

observations and not annually observations such as can be with GDP.
Is not possible to reshuffle the observations because we must know the temporal line to obtain and check whether there are cycles or not.
The observations of the period T are correlated with the observations of the previous period.
The assumption of identical distribution is substituted by the stationary assumption.
In time series, it is also very important to check the autocorrelation analysis. Another difference is in time series static model need to consider the temporal dependence of the variables.
We have to take into consideration the learning by doing process.
Independently Pooled cross sections are cross-sectional data at different points in time and pool them together.
Panel data is a sample where you observe many units over time and some units are followed over time. Some units are not observed in the same period but join later.
SIMPLE REGRESSION MODEL
Two variables: dependent and explanatory combined in a linear model.
In a normaldistribution will be negatively skewed. On the other hand, when the average is higher than the median, the distribution will have a bigger right tail, with extreme values tending to the positive, and the distribution will be positively skewed.

positive ones will be small numbers. Vice versa when the mean is larger than the median, we will have more positive numbers. The perfect case is where the median is quite close to the average, so the skewness tends to 0 and the distribution is symmetric.

The first quartile is associated to the first 25% of cases, is a measure regarding the lowest number of our observations. The third quartile correspond to the first 75% of cases, median Is 50%. Comparing the numbers of the 75% cases and the 25% I obtain the interquartile range, is a measure corresponding sometimes to the standard deviation.

This range is a measure of the variability similar to the SD with the advantage that it is unaffected by the outliers. The outliers can inflate even more the SD rather than the mean. The comparison will tell us even more regarding the kurtosis, if the SD is higher than the interquartile, the kurtosis will be higher than 3, with many probabilities in the tail more than I expect in a normal distribution.

The line inside

the box is the median, the red line is the mean and the part inside the box is the interquartile.

In the variance I can't use all the observations because I used all of them before to obtain the average.

Since I have the average, I don't have to check all the observations.

In the covariance we have two variables compared, it can be positive if one increases and so does the other, or negative when one increases and the other decreases.

The correlation is derived by the covariance.

The variance can be considered as the covariance of a single term.

The whiskers are important to understand whether you have or no outliers.

If we don't have mild outliers, we won't have severe outliers.

3a lezione Trim mean allows us to exclude extreme variables.

The Jarque-Bera test is based on kurtosis, it is a test of normality, it allows us to know whether the life expectancy is distributed as a normal.

Is a test dedicated on two hypothesis, a normal distribution has a skewness equal to 0.

The first hypothesis is to keep the skewness number equal to 0, the idea is the number we obtained is possible to be considered as 0.

The other hypothesis is that the kurtosis must be equal to 3, from a statistical point of view.

The JB test computes both kurtosis and skewness, and kurtosis are equal to 0, if the answer is yes, this means our distribution is symmetric and the kurtosis is equal to 3.

Sktest variablename, to test use the skewness test.

In the last 2 columns, we test the null in joint hypothesis if both are equal to 0 together, in this case the two hypotheses together are rejected.

Sometimes we have problems only in one or in both.

We know that if the estimated probability is equal or lower than 5%, we reject the null hypothesis.

This test is based on a chi-square distribution, it is used when we want to test something that has to do with variances, standard deviations, kurtosis.

The t test is used for averages and the first moments, which can condense some characteristics, the second moment.

is the variance or the SD. When we want to test an average or a parameter, we use the student t test. In this case the chi square distribution has 2 degrees of freedom because we are testing kurtosis and skewness.

If we are on the left of the critical line, we are under the null, if we are outside of the red line, we do not accept the null.

The third column, compare the value with the chi-square critical value, and in this case, we reject the null.

The critical value is a benchmark that we can know from the plot.

This chi-square distribution is not under the null, so we must refuse it.

The first possibility is to compare the computed value in adj2 chi2, with the critical value, and in this case, we are far away.

The second possibility is that we know that the area Is 5%, so the idea is that when we have to decide whether to accept or reject a hypothesis, what happens when we look at the computed value, the probability of making an error rejecting the null is 0, so if we reject the null, we're not

wrong.In terms of the plot our probability of making an error is equal to 0, so of course we reject the null.

Also for the kurtosis, the probability of making an error is 0, whereas the probability of making an error with the skewness is 20%, if we reject the null, the probability of making an error is 20% and is higher than the kurtosis one.

di invchi2tail(2, .05) allows us to obtain the critical value

di chi2tail(2, 44.19) gives us the p-value

Why we want to know if is normal distributed:

  1. Outliers influence our distribution and the results
  2. We are using linear models so is essential to know if the distribution fit over the assumption

Reg estimate your model with the OLS, if I don’t have other term, the constant term is included in the definition.

The constant term is estimated equal to the average which is the last line of the results we obtained.

But why average?

A simple regression model is a model where we would like to explain the dependent variable with something.

Simple regression

Il modello è Yi=α + βxi + εi. La regressione dei minimi quadrati ordinari (OLS) utilizza i valori stimati di α e β indicati con un cappello, e sono gli stimatori. Se applichiamo la formula ai nostri dati, otteniamo la stima. In questo caso, la media, che è il coef, è l'estimatore. β1 è la pendenza, il modello di regressione semplice è lineare nei parametri; l'errore è piuttosto importante, in questo termine sono contenute tutte le variabili omesse e gli errori che abbiamo commesso scrivendo un modello che è lineare nei parametri. La u è sconosciuta e utilizziamo un estimatore come i residui, se ripetiamo.
Dettagli
Publisher
A.A. 2021-2022
31 pagine
1 download
SSD Scienze economiche e statistiche SECS-P/05 Econometria

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher giowe_peta di informazioni apprese con la frequenza delle lezioni di Econometrics e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Università degli Studi di Bologna o del prof Bontempi Maria Elena.