代写*OverviewRegressionAnalysiswithCrossSectionalData
100%原创包过,高质量代写&免费提供Turnitin报告--24小时客服QQ&微信:273427
代写
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA52B.tmp)
OverviewRegressionAnalysiswithCrossSectionalData
Definition of the multiple linear regression model
Motivation for multiple regression
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA53E.tmp)
Incorporate more explanatory factors into the model
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA53F.tmp)
Explicitly hold fixed other factors that otherwise would be in
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA540.tmp)
Allow for more flexible functional forms
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA542.tmp)
Example: Wage equation
Interpretation of the multiple regression model
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA559.tmp)
The multiple linear regression model manages to hold the values of other explanatory variables fixed even if, in reality, they are correlated with the explanatory variable under consideration
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA55A.tmp)
„Ceteris paribus“-interpretation
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA55B.tmp)
It has still to be assumed that unobserved factors do not change if the explanatory variables are changed
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA56C.tmp)
Example: Determinants of college GPA
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA571.tmp)
Interpretation
Holding ACT fixed, another point on high school grade point average is associated with another .453 points college grade point average
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA573.tmp)
Or: If we compare two students
with the same ACT, but the hsGPA of student A is one point higher, we predict student A to have a colGPA that is .453 higher than that of student B
Holding high school grade point average fixed, another 10 points on ACT are associated with less than one point on college GPA
Standard assumptions for the multiple regression model
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA587.tmp)
Assumption MLR.1 (Linear in parameters)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA58A.tmp)
Assumption MLR.2 (Random sampling)
Standard assumptions for the multiple regression model (cont.)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA58D.tmp)
Assumption MLR.3 (No perfect collinearity)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA590.tmp)
Remarks on MLR.3
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA591.tmp)
The assumption only rules out
perfect collinearity/correlation bet-ween explanatory variables; imperfect correlation is allowed
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA592.tmp)
If an explanatory variable is a perfect linear combination of other explanatory variables it is superfluous and may be eliminated
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA593.tmp)
Constant variables are also ruled out (collinear with intercept)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA5A4.tmp)
Example for perfect collinearity: small sample
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA5A9.tmp)
Example for perfect collinearity: relationships between regressors
Standard assumptions for the multiple regression model (cont.)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA5AC.tmp)
Assumption MLR.4 (Zero conditional mean)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA5B0.tmp)
In a multiple regression model, the zero conditional mean assumption is much more likely to hold because fewer things end up in the error
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA5B1.tmp)
Example: Average test scores
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA5B3.tmp)
Discussion of the zero mean conditional assumption
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA5C4.tmp)
Explanatory variables that are correlated with the error term are called
endogenous; endogeneity is a violation of assumption MLR.4
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA5C5.tmp)
Explanatory variables that are uncorrelated with the error term are called
exogenous; MLR.4 holds if all explanat. var. are exogenous
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA5C6.tmp)
Exogeneity is the key assumption for a causal interpretation of the regression, and for unbiasedness of the OLS estimators
Theorem 3.1 (Unbiasedness of OLS)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA5C9.tmp)
Unbiasedness is an average property in repeated samples; in a given sample, the estimates may still be far away from the true values
Including irrelevant variables in a regression model
Omitting relevant variables: the simple case;
Conclusion: All estimated coefficients will be biased
Standard assumptions for the multiple regression model (cont.)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA5E8.tmp)
Assumption MLR.5 (Homoscedasticity)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA5EB.tmp)
Example: Wage equation
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA5EE.tmp)
Short hand notation
Assumption MLR.6 (Normality of error terms)
代写
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA52B.tmp)
OverviewRegressionAnalysiswithCrossSectionalData
Theorem 3.2 (Sampling variances of OLS slope estimators)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA604.tmp)
An example for multicollinearity
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA606.tmp)
Discussion of the multicollinearity problem
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA607.tmp)
In the above example, it would probably be better to lump all expen-diture categories together because effects cannot be disentangled
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA608.tmp)
In other cases, dropping some independent variables may reduce multicollinearity (but this may lead to omitted variable bias)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA609.tmp)
Only the sampling variance of the variables involved in multicollinearity will be inflated; the estimates of other effects may be very precise
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA60A.tmp)
Note that multicollinearity is not a violation of MLR.3 in the strict sense
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA60B.tmp)
Multicollinearity may be detected through „variance inflation factors“
Estimating the error variance
Theorem 3.3 (Unbiased estimator of the error variance)
Efficiency of OLS: The Gauss-Markov Theorem
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA629.tmp)
Under assumptions MLR.1 - MLR.5, OLS is unbiased
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA62A.tmp)
However, under these assumptions there may be many other estimators that are unbiased
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA62B.tmp)
Which one is the unbiased estimator with the
smallest variance?
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA62C.tmp)
In order to answer this question one usually limits oneself to linear estimators, i.e. estimators linear in the dependent variable
Theorem 3.4 (Gauss-Markov Theorem)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA62F.tmp)
Under assumptions MLR.1 - MLR.5, the OLS estimators are the best linear unbiased estimators (BLUEs) of the regression coefficients, i.e.
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA644.tmp)
OLS is only the best estimator if MLR.1 – MLR.5 hold; if there is heteroscedasticity for example, there are better estimators.
Estimation of the sampling variances of the OLS estimators
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA64D.tmp)
Note that these formulas are only valid under assumptions MLR.1-MLR.5 (in particular, there has to be homoscedasticity)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA65F.tmp)
Terminology
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA663.tmp)
Theorem 4.1 (Normal sampling distributions)
Testing hypotheses about a single population parameter
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA666.tmp)
Theorem 4.1 (t-distribution for standardized estimators)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA66C.tmp)
Null hypothesis (for more general hypotheses, see below)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA66E.tmp)
t-statistic (or t-ratio)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA683.tmp)
Distribution of the t-statistic
if the null hypothesis is true
Goal: Define a rejection rule so that, if it is true, H0 is rejected only with a small probability (= significance level, e.g. 5%)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA687.tmp)
Testing against one-sided alternatives (greater than zero)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA689.tmp)
Example: Wage equation
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA68A.tmp)
Test whether, after controlling for education and tenure, higher work experience leads to higher hourly wages
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA68C.tmp)
Example: Wage equation (cont.)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA69D.tmp)
Testing against one-sided alternatives (less than zero)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA69F.tmp)
Example: Student performance and school size
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6A0.tmp)
Test whether smaller school size leads to better student performance
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6A2.tmp)
Example: Student performance and school size (cont.)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6A4.tmp)
Example: Student performance and school size (cont.)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6A5.tmp)
Alternative specification of functional form:
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6A7.tmp)
Example: Student performance and school size (cont.)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6A9.tmp)
Testing against two-sided alternatives
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6AB.tmp)
Example: Determinants of college GPA
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6AD.tmp)
„Statistically significant“ variables in a regression
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6BE.tmp)
If a regression coefficient is different from zero in a two-sided test, the corresponding variable is said to be
„statistically significant“
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6BF.tmp)
If the number of degrees of freedom is large enough so that the nor-mal approximation applies, the following rules of thumb apply:
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6C1.tmp)
Guidelines for discussing economic and statistical significance
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6C2.tmp)
If a variable is statistically significant, discuss the magnitude of the coefficient to get an idea of its economic or practical importance
The fact that a coefficient is statistically significant does not necessa-rily mean it is economically or practically significant!
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6C4.tmp)
If a variable is statistically and economically important but has the „wrong“ sign, the regression model might be misspecified
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6C5.tmp)
If a variable is statistically insignificant at the usual levels (10%, 5%, 1%), one may think of dropping it from the regression
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6C6.tmp)
If the sample size is small, effects might be imprecisely estimated so that the case for dropping insignificant variables is less strong
Testing more general hypotheses about a regression coefficient
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6C9.tmp)
Null hypothesis
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6CC.tmp)
t-statistic
The test works exactly as before, except that the hypothesized value is substracted from the estimate when forming the statistic
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6E1.tmp)
Example: Campus crime and enrollment
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6E2.tmp)
An interesting hypothesis is whether crime increases by one percent if enrollment is increased by one percent
Computing p-values for t-tests
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6E5.tmp)
If the significance level is made smaller and smaller, there will be a point where the null hypothesis cannot be rejected anymore
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6E6.tmp)
The reason is that, by lowering the significance level, one wants to avoid more and more to make the error of rejecting a correct H0
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6E7.tmp)
The smallest significance level at which the null hypothesis is still rejected, is called the
p-value of the hypothesis test
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6E8.tmp)
A small p-value is evidence against the null hypothesis because one would reject the null hypothesis even at small significance levels
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6E9.tmp)
A large p-value is evidence in favor of the null hypothesis
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6EA.tmp)
P-values are more informative than tests at fixed significance levels
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6EC.tmp)
How the p-value is computed (here: two-sided test)
Confidence intervals
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA6FF.tmp)
Simple manipulation of the result in Theorem 4.2 implies that
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA705.tmp)
Interpretation of the confidence interval
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA706.tmp)
The bounds of the interval are random
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA707.tmp)
In repeated samples, the interval that is constructed in the above way will cover the population regression coefficient in 95% of the cases
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA709.tmp)
Confidence intervals for typical confidence levels
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA71F.tmp)
Relationship between confidence intervals and hypotheses tests
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA721.tmp)
Example: Model of firms‘ R&D expenditures
Testing hypotheses about a linear combination of parameters
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA724.tmp)
Example: Return to education at 2 year vs. at 4 year colleges
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA726.tmp)
Impossible to compute with standard regression output because
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA729.tmp)
Alternative method
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA72B.tmp)
Estimation results
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA743.tmp)
This method works
always for single linear hypotheses
Testing multiple linear restrictions: The F-test
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA746.tmp)
Testing exclusion restrictions
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA748.tmp)
Estimation of the unrestricted model
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA75A.tmp)
Estimation of the restricted model
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA761.tmp)
Test statistic
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA763.tmp)
Rejection rule (Figure 4.7)
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA765.tmp)
Test decision in example
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA77B.tmp)
Discussion
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA77C.tmp)
The three variables are „jointly significant“
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA77D.tmp)
They were not significant when tested individually
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA77E.tmp)
The likely reason is multicollinearity between them
Test of overall significance of a regression
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA799.tmp)
The test of overall significance is reported in most regression packages; the null hypothesis is usually overwhelmingly rejected
Testing general linear restrictions with the F-test
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA79C.tmp)
Example: Test whether house price assessments are rational
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA79E.tmp)
Unrestricted regression
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7A1.tmp)
Restricted regression
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7A4.tmp)
Test statistic
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7A6.tmp)
Regression output for the unrestricted regression
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7BE.tmp)
The F-test works for general multiple linear hypotheses
For all tests and confidence intervals, validity of assumptions MLR.1 – MLR.6 has been assumed. Tests may be invalid otherwise.
Models with interaction terms
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7D7.tmp)
Interaction effects complicate interpretation of parameters
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7D9.tmp)
Reparametrization of interaction effects
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7DE.tmp)
Advantages of reparametrization
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7DF.tmp)
Easy interpretation of all parameters
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7E0.tmp)
Standard errors for partial effects at the mean values available
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7E1.tmp)
If necessary, interaction may be centered at other interesting values
Qualitative Information
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7E4.tmp)
Examples: gender, race, industry, region, rating grade, …
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7F5.tmp)
A way to incorporate qualitative information is to use dummy variables
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7F6.tmp)
They may appear as the dependent or as independent variables
A single dummy independent variable
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7FA.tmp)
Dummy variable trap
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA7FC.tmp)
Estimated wage equation with intercept shift
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA803.tmp)
Does that mean that women are discriminated against?
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA804.tmp)
Not necessarily. Being female may be correlated with other produc-tivity characteristics that have not been controlled for.
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA815.tmp)
Using dummy explanatory variables in equations for log(y)
Using dummy variables for multiple categories
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA819.tmp)
1) Define membership in each category by a dummy variable
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA81A.tmp)
2) Leave out one category (which becomes the base category)
代写
![*](file:///C:\Users\ADMINI~1\AppData\Local\Temp\artA52B.tmp)
OverviewRegressionAnalysiswithCrossSectionalData