STAC51H3:Categorical Data Analysis
Assign 1
Due: Thu Sep 28, 2017 in class
All relevant work must be shown for credit.
Note: In any question, if you are using R, all R codes and R outputs must be
included in your answers. You should assume that the reader is not familiar
with R outputs and so explain all your findings, quoting necessary values form
your outputs.
Whenever you are using an R command for generating random numbers, set
seed to 123. This can be done by simply adding the command set.seed(123)
before the your R command for generating the random number.
Please note that academic integrity is fundamental to learning and scholarship.
You may discuss questions with other students. However, the work you submit
should be your own. If I feel suspicious of any assignment (e.g. if your work
doesn’t appear to be consistent with what we have discussed in class), I will not
mark the assignment. Instead, I will ask you to present your work in my office
and your grade will be assigned based on your presentation.
Total points for this assignment: 45
1. Let Y ∼ Bin(n,π), where n = 20 and π = 0.8. Y can be interpreted as the number of
successes in a sample of size n = 20 from a Bernoulli distribution with probability of
success π = 0.8.
(a) (12 points) y = 15 is an observed value of Y where Y ∼ Bin(n,π), where n = 20
and π = 0.8. Calculate the Wald , score (i.e. Wilson’s method), Agresti-Coull and
Clopper-Pearson 95 percent confidence intervals for π.
In this part (i.e all confidence intervals in part a ) do not use R or many computer
package. Show your work clearly.
(b) (3 points) Calculate a 95% confidence interval for π based on likelihood ratio test.
(For this part you may use the R code we discussed in class but do not use any R
functions that give the confidence interval directly.)
2. Observed (or true) coverage and the targeted coverage probabilities of confidence in-
tervals are not necessarily equal. In this question we will calculate the observed (or
true) coverage probability of Wald confidence intervals using two methods: Mote Carlo
simulation and direct calculation.
(a) (5 points) (Monte Carlo simulation) Generate N = 1000000 observations on Y
where Y ∼ Bin(n,π), where n = 20 and π = 0.8. From each observation gener-
ated, calculate a Wald 95% confidence interval for the population proportion (π).
(Note: This means you are calculating 1000000 confidence intervals). Calculate the
Question 2 continues on the next page...
Page 2 of 2
proportion of these Wald intervals that contain 0.8 (the value of π). Comment on
your results.
(b) (5 points) (Direct calculation) In order to calculate the coverage probability for a
known value of π, calculate a confidence interval for every possible value of y (y =
0,...,n) and check whether true value of the parameter is in the confidence interval
calculated. Identify those confidence intervals that contain the true parameter. For
example if the interval with n = 20, y = 5 contains the true value of π (say
π = 0.8), then the probability for that interval is P(y = 5) =
? 20
5
?
× 0.8 5 × 0.2 20−5 .
The coverage probability is the sum of all these probabilities for the intervals that
contain π (in this example 0.2). Use this way to calculate the coverage probability
of 95% Wald confidence intervals cased on a sample of size n = 20 if the true value
of π is 0.8.
3. In this question also we will calculate and plot the true coverage probabilities of Wald
confidence intervals for proportions (i.e. Binomial parameter) based on a sample of
given size (n), but this time we calculate the coverage probabilities for many values of
π making a plot of coverage probably versus π.
(a) (5 points) For a Bernoulli sample of size n = 25, use the method in part (b) of
the previous question (i.e. direct calculation) to calculate the coverage probability
of a 95% confidence interval for π = 0.01,0.02,...,0.99 and plot them against π.
Draw a horizontal line through the target probability 0.95. Comment on what you
learned from your plot.
(b) (5 points) Repeat part (a) above with n = 100 and plot both the curves on the
same plot. Compare and comment on your findings.
(c) (10 points) Repeat part (a) for Wald, Wilson, Agresti-Coull and Clopper-Preason
confidence intervals and plot the coverage probabilities versus π for all four confi-
dence intervals on one graph (i.e all four curves on the same system of axes). Use
four different colours for easy comparison. Compare and comment on your results.
(Note that in this part, we are using the same values as in aprt (a) above, i.e n = 25,
95% confidence interval and π = 0.01,0.02,...,0.99 )