代写ECON3360/7360 endogeneity problem

  • 100%原创包过,高质量代写&免费提供Turnitin报告--24小时客服QQ&微信:273427
  • 代写ECON3360/7360 endogeneity problem
    ECON3360/7360: Homework I
    2016 Semester2
    Due: 5pm, 16 th Sep 2016
    Part A. True/False/Uncertain Questions
    Please provide short explanation (limit to three lines) to justify your answers for
    the questions in Part A.
    1. If a linear regression has omitted variables, robust variance estimation is misleading.
    2. FE models automatically control for endogeneity.
    3. FE estimator is consistent if RE assumptions apply.
    4. There is no case for using cluster robust variance estimator for a FE regression.
    5. RCT provides the gold standard for estimating causal effects.
    6. For exactly identified case, IV and 2SLS estimators are the same.
    7. For over-identified case with homoscedasticity assumption, 2SLS and GMM estimators
    are the same.
    8. We can always statistically test whether instruments are strong and valid for IV
    regression.
    9. Suppose that we only care about consistency or inconsistency. For panel data analysis
    with a valid IV, using both fixed effects and IV could make more inconsistency (in
    absolute term) than only using FE.
    10. As the variance of measurement error increases to infinity, the bias of OLS also
    increases.
    11. 3SLS estimator is better than 2SLS because 3SLS uses more information and
    estimates parameters more precisely.
    12. If there is endogeneity problem, FE is preferred to pooled OLS.
    13. Increasing sample size does help with mitigating multicollinearity problem.
    14. Including more covariates do help with mitigating multicollinearity problem.
    15. It is more costly to correct problems from internal validity than those from external
    validity in experimental setting.
    16. In cross-sectional data setting, we can only conditioning on observed variables.
    17. In repeated cross-section data setting, we can only conditioning on observed variables.
    18. IV method can be combined with FE but cannot be combined with RE.
    19. Cluster robust SE allows more flexibility (i.e. valid with less assumption) than
    heteroskedasticity robust SE both in cross-section and panel data.
    20. IV helps to capture both direct and indirect effect of endogenous variable on outcome
    variable.代写ECON3360/7360
    21. RE can identify coefficient on time-invariant variable.
    22. IV> OLS in absolute value could imply the endogeneity for the variable of interest due
    to measurement error.
    23. Over-identification test is a test for whether the variable of interest is endogenous.
    24. We can get IV estimates from OLS for 1 st stage and OLS for reduced form equations.
    25. Hausman-taylor estimator is a RE IV estimator for a panel data model.
    26. Arellano-bond estimator is a FE IV estimator for a dynamic panel data model.
    27. In the RD design, we only use observations around cut-off points.
    28. We don’t want a kink in density around cut-off points for a forcing variable.
    29. We don’t want a kink in outcome values around cut-off points for a forcing variable.
    30. We don’t want a kink in control values around cut-off points for a forcing variable.
    Short-Answer Questions (2marks for each subquestions)
    I. IV Estimator
    We consider the following regression model:
    ???(????) = ? ? + ? ? ∙ ???? + ? ? ∙ ????? + ? ? ∙ ??????? + ?
    where we are interested in the return of education on wage. Suppose that  ability is
    unobserved. Thus, we consider the following equation instead.
    ???(????) = ? ? + ? ? ∙ ???? + ? ? ∙ ????? +?
    For endogenous  educ, a dummy variable, z, is constructed using information on the quarter
    of birth, where z is 0 if born in the 1 st quarter and 1 otherwise. You are trying to use this
    dummy variable as an instrument to get ? ? .
    1.  Derive omitted variable bias for the OLS estimator for ? ? .
    2.  Card (1995) instead uses  college4 4 (distance from student's home to nearest 4-year
    college) as an IV for  educ. Can we test the validity/relevance of  college4 4 as an IV for
    educ?
    3.  If you can perform the test in 2, provide a Stata procedure to test the relevance of
    college4 4 as an IV for  educ.
    4.  Numerous studies reported that the IV estimate of ? ? is greater than the OLS
    of ? ? . Infer the sign and the magnitude of ? ? by comparing OLS and IV
    estimates of ? ? ?代写ECON3360/7360
    5.  Evaluate and compare the OLS and IV standard errors for ? ? .
    6.  Provide a Stata procedure to perform a test for the validity of IV.
    7.  Suppose you have panel data. How do you change the model to avoid endogeneity
    problem?
    8.  Continue from 7, what is new assumption for IV method? Provide a produce to
    implement IV with panel data.
    QUESTIONS CONTINUE OVER PAGE
    II. Panel Data Estimator
    Consider the following unobserved effects model:
    ??????? ?? = ? ? + ? ? ∙ ???? ?? + ? ? ∙ ???? ?? + ????? ? + ???? ? + ? ?? (1)
    where ??????? ?? is number of murders per 100,000 people, ???? ?? is number of executions,
    ???? ??  is unemployment rate for state ? at year ?. Data set is state-level (50 US states) data
    for two years (1990 and 1991).
    1.  How many variable/variables should be included for ????? ? ? Interpret ????? ? in the
    equation.
    2.  Provide fixed effects (FE) transformation for the equation. [Hint: No derivation is
    required. All you need to provide is a transformed equation.]
    3.  Using the fixed effects transformed equation, state the condition/conditions for the FE
    estimator for ? ? to be consistent.
    4.  Explain how the source of variation for identifying ? ? changes as estimate the
    following estimation instead
    ??????? ?? = ? ?? + ? ?? ∙ ???? ?? + ? ?? ∙ ???? ?? + ???? ? + ? ?? (2)
    5.  Suppose the estimates for ? ? and ? ?? differ substantially. Explain the source of
    difference in the estimates.
    6.  Write down the estimating equation for first-differenced estimator for ? ? in (1).
    7.  Write down the estimating equation in 6 with state fixed effects. How we can interpret
    state fixed effects here?
    8.  Compare the FE and FD estimates for ? ? in the equation (1).
    9.  Construct a Hausman test statistic that compares ?
    ̂ ?,?? and ? ̂ ?,?? .
    10.  Suppose the FE and RE estimates are substantially different (Null hypothesis is
    rejected in 9. What does this imply for endogeneity of ???? ?? in (2).
    QUESTIONS CONTINUE OVER PAGE
    III. Simultaneous equations models
    A model to estimate the effect of smoking on annual income equation (1) is:
    ???(??????) = ? ? + ? ? ∙ ???? + ? ? ∙ ???? + ? ? ????? + ? ? (1)
    where  cigs is the number of cigarettes smoked per day on average and ???? is years of
    education.
    To reflect the fact that cigarette consumption might be jointly determined with income, a
    demand for cigarettes equation (2) is also considered:
    ???? = ? ? + ? ? ∙ ???(??????) + ? ? ∙ ???? + ? ?? ∙ ??? (???????) + ? ?? ∙ ???????? + ? ? (2)
    where  cigpric is the price of a pack of cigarettes (in cents), and  restaurn is a binary variable
    equal to unity if the person lives in a state with restaurant smoking restrictions.
    1.  How do you interpret the OLS estimator ? ? in the equation (1).
    2.  Under what assumption/assumptions is/are the equation for the demand of cigarettes
    (2) identified?
    3.  Provide a procedure of the test for the relevance of exclusion restrictions in
    estimating (1).
    Suppose we collect panel data and instead estimate the below equation for
    demand for cigarettes.
    ???(??????) ?? = ? ? + ? ? ∙ ???? ?? + ? ? ∙ ???? ?? + ? ? ????? ?? + ? ?? + ? ? (3)
    ???? ?? = ? ? + ? ? ∙ ???(??????) ?? + ? ? ∙ ???? ?? + ? ?? ∙ ???(????????) ?? + ? ?? + ? ?? (4)
    4.  Under what assumption/assumptions is/are the equation for the demand of
    cigarettes (4) identified?
    5.  Provide a Stata procedure of the test for the relevance of exclusion restrictions
    using the reduced form regression for (4).
    6.  Provide the equation for the demand of cigarettes (4) using first-differenced
    estimation.
    7.  Provide dependent and explanatory variables for the reduced form equation for
    ∆ cigs.
    8.  Provide a Stata procedure for first differenced IV estimation for (3).
    QUESTIONS CONTINUE OVER PAGE
    IV. Regression Discontinuity Design (RD)
    The 1988 Education Act allowed English state schools to opt out of local authority
    control and become "Grant Maintained" (GM) schools. GM schools are directly
    funded by the central government and are governed by a governing body and the
    head teacher rather than the local authority. To become a GM school, parents of
    current students had to hold a secret ballot. If more than 50% of the parents cast
    their vote in favour of converting to GM status the school would essentially be
    automatically converted into a GM school. You have been asked to evaluate the
    impact of becoming a GM school on student achievement. For this purpose, you
    have collected data on student test results at age 16 for the year 1997. You use
    this data to estimate a regression of the form:
    ????? = ? ? + ? ? ∙ ?? + ? (7)
    where ????? is the result of a student in a standardized math test and GM is a dummy
    variable which takes on a value of 1 if the student is enrolled in a GM school and zero
    otherwise.
    1.  Is ? ? in the equation (7) likely to capture the causal effect of becoming a GM
    school on student achievement?
    After 1988, a large number of schools held ballots about conversion to GM status.
    While many ballots were successful, there were also a large number of ballots where
    the majority of parents were opposed to convert the school into a GM school. Suppose
    your dataset also include the percentage of votes that were cast in favour of
    conversion to a GM school.
    2.  How would you exploit that additional information for an alternative estimation
    strategy to measure the causal effect of GM status on student achievement?
    3.  Provide the appropriate regression specification. What is the key coefficient of
    interest?
    4.  What are the assumptions that are required to hold for the approach in question
    3 to work?
    5.  How do you test the validity of the method in question 3? Describe a Stata
    procedure for a test.
    6.  Suppose that you include additional control variables to the regression equation
    in question 3. How does it affect to the key coefficient estimates and their
    standard errors?
    7.  Describe the potential threats to the identification strategy in question 3.
    8.  Describe how you can address those concerns in question 7.
    代写ECON3360/7360 endogeneity problem