代写*OverviewRegressionAnalysiswithCrossSectionalData

100%原创包过,高质量代写&免费提供Turnitin报告--24小时客服QQ&微信：273427

代写

OverviewRegressionAnalysiswithCrossSectionalData

Definition of the multiple linear regression model

Motivation for multiple regression

Incorporate more explanatory factors into the model

Explicitly hold fixed other factors that otherwise would be in

Allow for more flexible functional forms

Example: Wage equation

Interpretation of the multiple regression model

The multiple linear regression model manages to hold the values of other explanatory variables fixed even if, in reality, they are correlated with the explanatory variable under consideration

„Ceteris paribus“-interpretation

It has still to be assumed that unobserved factors do not change if the explanatory variables are changed

Example: Determinants of college GPA

Interpretation

Holding ACT fixed, another point on high school grade point average is associated with another .453 points college grade point average

Or: If we compare two students with the same ACT, but the hsGPA of student A is one point higher, we predict student A to have a colGPA that is .453 higher than that of student B

Holding high school grade point average fixed, another 10 points on ACT are associated with less than one point on college GPA

Standard assumptions for the multiple regression model

Assumption MLR.1 (Linear in parameters)

Assumption MLR.2 (Random sampling)

Standard assumptions for the multiple regression model (cont.)

Assumption MLR.3 (No perfect collinearity)

Remarks on MLR.3

The assumption only rules out perfect collinearity/correlation bet-ween explanatory variables; imperfect correlation is allowed

If an explanatory variable is a perfect linear combination of other explanatory variables it is superfluous and may be eliminated

Constant variables are also ruled out (collinear with intercept)

Example for perfect collinearity: small sample

Example for perfect collinearity: relationships between regressors

Standard assumptions for the multiple regression model (cont.)

Assumption MLR.4 (Zero conditional mean)

In a multiple regression model, the zero conditional mean assumption is much more likely to hold because fewer things end up in the error

Example: Average test scores

Discussion of the zero mean conditional assumption

Explanatory variables that are correlated with the error term are called endogenous; endogeneity is a violation of assumption MLR.4

Explanatory variables that are uncorrelated with the error term are called exogenous; MLR.4 holds if all explanat. var. are exogenous

Exogeneity is the key assumption for a causal interpretation of the regression, and for unbiasedness of the OLS estimators

Theorem 3.1 (Unbiasedness of OLS)

Unbiasedness is an average property in repeated samples; in a given sample, the estimates may still be far away from the true values

Including irrelevant variables in a regression model

Omitting relevant variables: the simple case;

Conclusion: All estimated coefficients will be biased

Standard assumptions for the multiple regression model (cont.)

Assumption MLR.5 (Homoscedasticity)

Example: Wage equation

Short hand notation

Assumption MLR.6 (Normality of error terms)

代写

OverviewRegressionAnalysiswithCrossSectionalData

Theorem 3.2 (Sampling variances of OLS slope estimators)

An example for multicollinearity

Discussion of the multicollinearity problem

In the above example, it would probably be better to lump all expen-diture categories together because effects cannot be disentangled

In other cases, dropping some independent variables may reduce multicollinearity (but this may lead to omitted variable bias)

Only the sampling variance of the variables involved in multicollinearity will be inflated; the estimates of other effects may be very precise

Note that multicollinearity is not a violation of MLR.3 in the strict sense

Multicollinearity may be detected through „variance inflation factors“

Estimating the error variance

Theorem 3.3 (Unbiased estimator of the error variance)

Efficiency of OLS: The Gauss-Markov Theorem

Under assumptions MLR.1 - MLR.5, OLS is unbiased

However, under these assumptions there may be many other estimators that are unbiased

Which one is the unbiased estimator with the smallest variance?

In order to answer this question one usually limits oneself to linear estimators, i.e. estimators linear in the dependent variable

Theorem 3.4 (Gauss-Markov Theorem)

Under assumptions MLR.1 - MLR.5, the OLS estimators are the best linear unbiased estimators (BLUEs) of the regression coefficients, i.e.

OLS is only the best estimator if MLR.1 – MLR.5 hold; if there is heteroscedasticity for example, there are better estimators.

Estimation of the sampling variances of the OLS estimators

Note that these formulas are only valid under assumptions MLR.1-MLR.5 (in particular, there has to be homoscedasticity)

Terminology

Theorem 4.1 (Normal sampling distributions)

Testing hypotheses about a single population parameter

Theorem 4.1 (t-distribution for standardized estimators)

Null hypothesis (for more general hypotheses, see below)

t-statistic (or t-ratio)

Distribution of the t-statistic if the null hypothesis is true

Goal: Define a rejection rule so that, if it is true, H0 is rejected only with a small probability (= significance level, e.g. 5%)

Testing against one-sided alternatives (greater than zero)

Example: Wage equation

Test whether, after controlling for education and tenure, higher work experience leads to higher hourly wages

Example: Wage equation (cont.)

Testing against one-sided alternatives (less than zero)

Example: Student performance and school size

Test whether smaller school size leads to better student performance

Example: Student performance and school size (cont.)

Alternative specification of functional form:

Example: Student performance and school size (cont.)

Testing against two-sided alternatives

Example: Determinants of college GPA

„Statistically significant“ variables in a regression

If a regression coefficient is different from zero in a two-sided test, the corresponding variable is said to be „statistically significant“

If the number of degrees of freedom is large enough so that the nor-mal approximation applies, the following rules of thumb apply:

Guidelines for discussing economic and statistical significance

If a variable is statistically significant, discuss the magnitude of the coefficient to get an idea of its economic or practical importance

The fact that a coefficient is statistically significant does not necessa-rily mean it is economically or practically significant!

If a variable is statistically and economically important but has the „wrong“ sign, the regression model might be misspecified

If a variable is statistically insignificant at the usual levels (10%, 5%, 1%), one may think of dropping it from the regression

If the sample size is small, effects might be imprecisely estimated so that the case for dropping insignificant variables is less strong

Testing more general hypotheses about a regression coefficient

Null hypothesis

t-statistic

The test works exactly as before, except that the hypothesized value is substracted from the estimate when forming the statistic

Example: Campus crime and enrollment

An interesting hypothesis is whether crime increases by one percent if enrollment is increased by one percent

Computing p-values for t-tests

If the significance level is made smaller and smaller, there will be a point where the null hypothesis cannot be rejected anymore

The reason is that, by lowering the significance level, one wants to avoid more and more to make the error of rejecting a correct H0

The smallest significance level at which the null hypothesis is still rejected, is called the p-value of the hypothesis test

A small p-value is evidence against the null hypothesis because one would reject the null hypothesis even at small significance levels

A large p-value is evidence in favor of the null hypothesis

P-values are more informative than tests at fixed significance levels

How the p-value is computed (here: two-sided test)

Confidence intervals

Simple manipulation of the result in Theorem 4.2 implies that

Interpretation of the confidence interval

The bounds of the interval are random

In repeated samples, the interval that is constructed in the above way will cover the population regression coefficient in 95% of the cases

Confidence intervals for typical confidence levels

Relationship between confidence intervals and hypotheses tests

Example: Model of firms‘ R&D expenditures

Testing hypotheses about a linear combination of parameters

Example: Return to education at 2 year vs. at 4 year colleges

Impossible to compute with standard regression output because

Alternative method

Estimation results

This method works always for single linear hypotheses

Testing multiple linear restrictions: The F-test

Testing exclusion restrictions

Estimation of the unrestricted model

Estimation of the restricted model

Test statistic

Rejection rule (Figure 4.7)

Test decision in example

Discussion

The three variables are „jointly significant“

They were not significant when tested individually

The likely reason is multicollinearity between them

Test of overall significance of a regression

The test of overall significance is reported in most regression packages; the null hypothesis is usually overwhelmingly rejected

Testing general linear restrictions with the F-test

Example: Test whether house price assessments are rational

Unrestricted regression

Restricted regression

Test statistic

Regression output for the unrestricted regression

The F-test works for general multiple linear hypotheses

For all tests and confidence intervals, validity of assumptions MLR.1 – MLR.6 has been assumed. Tests may be invalid otherwise.

Models with interaction terms

Interaction effects complicate interpretation of parameters

Reparametrization of interaction effects

Advantages of reparametrization

Easy interpretation of all parameters

Standard errors for partial effects at the mean values available

If necessary, interaction may be centered at other interesting values

Qualitative Information

Examples: gender, race, industry, region, rating grade, …

A way to incorporate qualitative information is to use dummy variables

They may appear as the dependent or as independent variables

A single dummy independent variable

Dummy variable trap

Estimated wage equation with intercept shift

Does that mean that women are discriminated against?

Not necessarily. Being female may be correlated with other produc-tivity characteristics that have not been controlled for.

Using dummy explanatory variables in equations for log(y)

Using dummy variables for multiple categories

1) Define membership in each category by a dummy variable

2) Leave out one category (which becomes the base category)
代写

OverviewRegressionAnalysiswithCrossSectionalData