DO NOT use SAS for any of these problems.
HOMEWORK #1 due Tuesday 9-20
Table 1 - Survival distributions for two samples (Example 3.3 from p29 of text) 6-MP (m=21) 6,6,6,6+, 7, 9+, 10,10+, 11+, 13, 16, 17+, 19+,20+, 22, 23, 25+, 32+,32+, 34+, 35+ Placebo (n=21) 1,1, 2,2, 3, 4,4, 5,5, 8,8,8,8, 11,11, 12,12, 15, 17, 22, 23
NOTES: (a) The data for problem 4.1 is in Table 3.1 on page 20. The data for problem 4.4 is in Table 1 above. The data for problem 4.7 is on page 103. Read pages 19-20 in the text before doing problem 4.1. The data for problem 4.1 came from an experiment in which cancer patients were exposed to two different kinds of bacteria in an attempt to stimulate their immune systems.
NOTES ADDED 9-17 (e) Typo in Problem 4.7: ``a life-table like Table 4.5'' should be ``a life-table like Table 4.6''. (The difference is that Table 4.5 is a census-like table in which mortality rates in different age ranges are measured for different groups of people, while Table 4.6 is a ``cohort table'' that follows the same group of people across several age ranges. Censoring makes no sense for a census-like table, since different groups of people are involved, while censoring is very important for cohort tables. The 2nd edition of the textbook does refer to Table 4.6 in this problem.)
APPENDIX I: MANTEL'S TABLEAU FOR THE GEHAN-WILCOXON TEXT
In both the method discussed in the text (and class) and the method of Table 5.1, the numerator of the Gehan-Wilcoxon test statistic is
where n1 is the first sample size and U_i is essentially the inner sum of the double sum in (5.1.1). (``Essentially'' since U_i above has the sum of both XX and XY interactions for Xs in the first sample, while the inner sum in (5.1.1) only has XY interactions. The sum of the XX interactions cancels out in the sum for U. The expanded definition of U_i, with XX, XY, and YY interactions, is needed for the formula for the variance.)
The first step is to rewrite U_i above, for X_i in the first sample, as
where T1i is the number of observations in both samples that are ``known to be strictly less than'' than X_i and T2i is the number of observations that are ``known to be strictly greater'' than X_i (in the sense of (5.1.1) ). Note than T2i=0 for any censored value X_i, since no value can be known to be greater than a censored value. In particular, U_i>=0 for censored values.
In fact, in Table 5.1, R1i=T1i+1 and R2i=T2i+1, so that U_i = R1i-R2i = T1i-T2i. That is, R1i=T1i+1 is always the number of observations in both samples that are ``known to be strictly less than'' X_i, plus one, and R2i=T2i+1 is always the number of observations in both samples that are ``known to be strictly greater than'' X_i, plus one. In particular, T2i=1 for all censored values. Mantel apparently doesn't like 0s, so that he has arranged things so that the R1i,R2i are always positive.
To see this for R1i, note that censored values aren't used in the ``ranks'' of uncensored X_i in Step 1 in Table 5.1. This is because a censored value X_j or Y_j cannot be known to be strictly less than X_i, whether X_i is censored or not. Thus the number of X_j or Y_j that are strictly less than an untied uncensored value X_i, plus one, is exactly the rank of X_i among uncensored values X_j and Y_j. The same rule holds for tied uncensored values if you take the smallest rank in the tie group. The number of observations ``known to be strictly less than'' a censored value X_i is equal to the number of uncensored observations than are ``strictly less'' than X_i, which is the same as the ``rank'' of the next highest uncensored value, as in Table 5.1. More precisely, this is the ``rank'' of the smallest uncensored value that is greater than X_i. In particular, a tie group of censored values are all given the ``rank'' of the next highest uncensored value. If there are no uncensored values that are larger than a censored value X_i, then its ``rank'' is the total number of uncensored values, plus one.
In contrast, for R2i, both censored and uncensored values X_j,Y_j count for ``known to be strictly greater than'' in Step 5, so that (descending) ranks appear for all observations. The rules ``known to be strictly less than (or greater than)'' plus one accounts for the handling of ties in both Step 1 and Step 5 in Table 5.1. Since censored values always have ``rank''=+1 in Step 5, it doesn't matter whether they are tied or not, nor with what. Thus, R1i=T1i+1 and R2i=T2i+1 in all cases.