Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
vuoi
o PayPal
tutte le volte che vuoi
Formatting the Text
YY is μ Y,0, using data Y 1, .., Yn . The formula for this statistic is given by Equation (3.10), where the standard error of Y is given by Equation (3.8). Substitution of the latter expression into the former yields the formula for the t-statistic: -Y mY,0 , (3.22) t = 22s >nY2 where is given in Equation (3.7). s Y 2 where s is given in Equation (3.7).
As discussed in Section 3.2, under general conditions the t-statistic has a standard normal distribution if the sample size is large and the null hypothesis is true [see Equation (3.12)]. Although the standard normal approximation to the t-statistic is reliable, if the population distribution is normally distributed, then critical values from the Student t-distribution can be used to perform hypothesis tests and to construct confidence intervals. As an example, consider a hypothetical problem in which =2.15 and n=20 so that the degrees of freedom are 19 [see Equation (3.12)].
For a wide range of distributions, if n is large, it can be unreliable. The degrees of freedom is n-1=19. From Appendix Table 2, the 5% two-sided critical value for the t-distribution with 19 degrees of freedom is 2.09. Because the t-statistic is larger in absolute value than the critical value (2.15 > 2.09), the null hypothesis would be rejected at the 5% significance level against the two-sided alternative. The exact distribution of the t-statistic depends on the distribution of X and it can be very complicated. There is, however, one special case in which the exact distribution of the t-statistic is relatively simple: If X is normally distributed, then the 95% confidence interval for m, constructed using the t-distribution, would be Y ± 2.09 SE(Y). This confidence interval is somewhat wider than the confidence interval constructed using the t-distribution with 19 degrees of freedom. (The mathematics behind this result is provided in Sections standard)
normal critical value of 1.96.17.4 and 18.4.)((( What is the link between the p-value and the significance level?
If the population distribution is normally distributed, then critical values fromThe significance level is prespecified. For example, if the prespecified significance level is 5%; youthe Student distribution can be used to perform hypothesis tests and to constructtreject the null hypothesis if |t|>=1.96. Equivalently, you react if p<=0.05.confidence intervals. As an example, consider a hypothetical problem in whichThe p-value is sometimes called the marginal significance level. Often, it is better to communicateConfidence bandact -2.15 and 20 so that the degrees of freedom is 1 19. Fromt n nConfidence band= = =the p-value than simply whether a test rejects or not – the p-value contains more information thanConfidence BandsAppendix Table 2, the 5% two-sided critical value for the distribution is 2.09.tConfidence Bands 19the “yes/no” statement about
whether the test rejects. )))Because the is larger in absolute value than the critical value t-statistic(((Confidence bands7(2.15 2.09), the null hypothesis would be rejected at the 5% significance level
Confidence bands are used for symmetric distributions. For instance, a 90% confidence band is
Confidence bands are used for symmetric distributions. For instance, a
Confidence bands are used for symmetric distributions. For instance, aagainst the two-sided alternative. The 95% confidence interval for x, constructedconstructed by finding critical value c such that m90% Yconfidence band is constructed by finding critical value c such that90% confidence band is constructed by finding critical value c such that{using the distribution, would be 2.09 This confidence interval ist Y SE(Y).19 0.9Pr c x c( < + ) = (36)µ µsomewhat wider than the confidence interval constructed using the standard nor-0.9Pr c x c( < + )= (36)µ µmal critical value of 1.96. 0.9 0.95
95%Replace by to get confidence band. For example, if0.9 0.95 95%Replace by to get confidence band. For example, if22Replace 0.9 by 0.95 to get 95% confidence band. For example, if x N(μ,σ ), then∼x ⇠ N ( ) , thenµ, s 2x ⇠ N ( ) , thenµ, sThe t-statistic testing differences of means. The testing the differencet-statistic1.65s 1.65s 0.9Pr x( < + )= (37)µ µ1.65s 1.65s 0.9Pr x( < + )= (37)µ µof two means, given in Equation (3.20), does not have a Student distribution,t1.96s 1.96s 0.95Pr x( < + )=µ µ )))1.96s 1.96s 0.95Pr x( < + )=µ µeven if the population distribution of is normal. (The Student distribution doesY tnot apply here because the variance estimator used to compute the standard errorin Equation (3.19) does not produce a denominator in the with a chi-t-statisticsquared distribution.) 167Measures of Fit4.34 - LINEAR REGRESSION WITH ONE REGRESSORThe OLS estimators also have
desirable theoretical properties. They are analogous to the desirable properties, studied in Section 3.1, of as an estimator of Y
Yi = β0 + β1Xi + μi the population mean. Under the assumptions introduced in Section 4.4, the OLS based on these data. One way to draw the line would be to take out a pencil
Measures of Fit 4.3
It is the linear regression model with a single regressor, where Y is the dependent variable and X is the independent variable or the regressor.
It is very unscientific, and different people will create different estimatedcertain class of unbiased estimators; however, this efficiency result holds undermators also have desirable theoretical properties. They are anal-The intercept β0 and the slope β1 are the
The coefficients of the regression line, also known as the slope and intercept, are studied in Section 3.1 as an estimator of Y. The slope represents the change in Y associated with a unit change in X, while the intercept is the value of the regression line when X=0 (intersection with the Y axis).
So, how do you choose among the many possible lines? The most common way is to choose the line that produces the "least squares" fit to these data. This means that the line minimizes the sum of the squared differences between the observed Y values and the predicted Y values.
The OLS (ordinary least squares) estimator is used to find the line that provides the best fit. It is unbiased and consistent, meaning that it provides accurate estimates of the true parameters. Under certain assumptions introduced in Section 4.4, the OLS estimator is also efficient among unbiased estimators.
The error term, denoted as μi, incorporates all the other factors besides X that determine the value of Y. It represents the factors that are responsible for any difference between the actual Y values and the predicted Y values.
Further discussion of the properties and measures of fit of the regression line is deferred until Section 5.5.
value and the value predicted by the regression model (unpredictable element of randomness in human response; omitted variables; measurement error in y).
Having estimated a linear regression, you might wonder how well that regression line describes the data. Does the regressor account for much or for little of the variation in the dependent variable? Are the observations tightly clustered around the regression line, or are they spread out?
As discussed in Section 3.1, the Ordinary Least Squares Estimator chooses the regression coefficients so that the estimated regression line is as close as possible to the observed data, where closeness is measured by the sum of the squared mistakes made in predicting given Y X.
4.2 Estimating the Coefficients of the Linear Regression Model
The sample average, Y, is the least squares estimator of a linear regression. You might wonder how well that regression, Y, is close as possible to the observed data, where closeness is measured by the sum of the squared errors. The standard error of the regression measures how well the OLS estimator, Y, minimizes the total squared estimation mistakes, E(Y). Does the regressor account for much or for little of the mistakes made in predicting Y given X? The coefficient of determination, R^2, ranges between 0 and 1 and measures the fraction of the variance of Y that is explained by X. The standard error of the regression, SE(Y), measures how tightly the observations are clustered around the regression line. The OLS estimator extends this idea to the linear regression model. Let b and a be the coefficients of the regression line. Are the observations tightly clustered around the line, or are they spread out? The OLS estimator, Y, minimizes the total squared estimation.
mistakes Σ(Yi-m)^2 among all possible estimators m.
Regression measures how far Y typically is from its predicted value.
Let Y be some estimators of β0 and β1. The regression line based on these estimators is Y = b0 + b1X.
The standard error of the regression measures how well the OLS (Ordinary Least Squares) estimator fits the data.
The range of b1 is between 0 and 1 and measures the proportion of the variance of Y that is explained by X.
The value of Yi predicted using this line is b0 + b1Xi. Thus, the mistake made in predicting the ith observation is Yi - (b0 + b1Xi).
The sum of these squared prediction mistakes over all observations is the sum of squared errors (SSE).
The regression R2 measures how well the regression line fits the data. It ranges between 0 and 1 and measures the proportion of the variance of Y that is explained by the regression model.
The OLS estimator extends this idea to the linear regression model. Let b0 and b1 be some estimators of β0 and β1.
The value of Yi predicted using this line is b0 + b1Xi. Thus, the mistake made in predicting the ith observation is (Yi - (b0 + b1Xi)).
The sum of these squared prediction mistakes over all observations is the sum of squared errors (SSE).
The standard error of the regression measures how well the OLS estimator fits the data.
The sum of these squared prediction mistakes over all observations is the sum of squared errors (SSE).
The regression R2 measures how well the regression line fits the data. It ranges between 0 and 1 and measures the proportion of the variance of Y that is explained by the regression model.
is the fraction of the sample variance of explained by (or predicted by) . The definitions of the predicted value and the residual (see KeyXa in Concept 4.2)