Math 434 Homework 1 - Fall 2005

  • Click here for Math434 home page
  • Click here for Math434 homework page
  • Click here for Prof. Sawyer's home page

  • Text references are to
    Statistical methods for survival data analysis., 3rd edition, by Lee and Wang

    DO NOT use SAS for any of these problems.

    HOMEWORK #1 due Tuesday 9-20

         Table 1 - Survival distributions for two samples
         (Example 3.3 from p29 of text)
         6-MP (m=21)    6,6,6,6+, 7, 9+, 10,10+, 11+, 13, 16, 17+,
              19+,20+, 22, 23, 25+, 32+,32+, 34+, 35+
         Placebo (n=21) 1,1, 2,2, 3, 4,4, 5,5, 8,8,8,8, 11,11,
              12,12, 15, 17, 22, 23
    

  • Problems 1,2,3 -- (Text p102) problems #4-1ac, #4-4ac, #4-7.

    NOTES: (a) The data for problem 4.1 is in Table 3.1 on page 20. The data for problem 4.4 is in Table 1 above. The data for problem 4.7 is on page 103. Read pages 19-20 in the text before doing problem 4.1. The data for problem 4.1 came from an experiment in which cancer patients were exposed to two different kinds of bacteria in an attempt to stimulate their immune systems.

    (b) Problem 4.1a says to compare your results with those in Table 3.2 on page 21 in text. The previous edition of the text had an error in Table 3.2, although the rest of the values were correct. Do all of your results agree?
    (c) Note that Problem 4.1 asks about survival times, not remission times. You may not need to use all of the data in the table.
    (d) Use the text's linear interpolation method (p66-67) to estimate median survival times.

    NOTES ADDED 9-17 (e) Typo in Problem 4.7: ``a life-table like Table 4.5'' should be ``a life-table like Table 4.6''. (The difference is that Table 4.5 is a census-like table in which mortality rates in different age ranges are measured for different groups of people, while Table 4.6 is a ``cohort table'' that follows the same group of people across several age ranges. Censoring makes no sense for a census-like table, since different groups of people are involved, while censoring is very important for cohort tables. The 2nd edition of the textbook does refer to Table 4.6 in this problem.)

    See pp87-90 for the definitions of the terms in Exercise Table 4.1. Note that l_i and w_i are treated exactly the same and so could be included in a single ``number censored'' c_i=l_i+w_i. Note also that n_i' is the true number ``at risk'' at the beginning of the i-th time interval and that n_i=n_i'-(1/2)c_i  is a derived quantity. The reason for the customary ``actuarial correction'' n_i=n_i'-(1/2)c_i  and not the more intuitive n_i=n_i'-(1/2)(d_i+c_i) is explained in a handout (``Actuarial Estimates for Life Tables'') on the Math434 Web site. The difference is basically due to the difference between a differential equation and a difference equation.
    The ``hazard function'' or ``hazard rate'' is the death rate for people who are alive during that time interval, as opposed to the proportion of the total population that die during that time interval (see p90). We will talk more about hazard rates etc on Tuesday.
    (f) ``by hand'' in the next three problems means don't use a statistical package (like SAS). Using a spreadsheet program is OK if you enter the spreadsheet links yourself.
    (g) COMMENT ADDED 9-19 ABOUT THE GEHAN-WILCOXON TEST:  The textbook discusses two apparently different methods for calculating the Gehan-Wilcoxon statistic: The method discussed in text based on equation (5.1.1) (which we did in class) and Mantel's tableau in Table 5.1. In fact, these methods are exactly equivalent (see Appendix I  below).

  • Problem 4 -- (Like problem #5-4 on text p131). Apply the generalized Wilcoxon test by hand to the 6-MP/placebo data in Table 1 above.

  • Problem 5 -- Problem #5-9 (Text p132). This asks you to test significance with two 2x2 tables that are stratified by sex. The wording suggests the Mantel-Henszel (strata) test of Section 5.2 p121. Carry out the test by hand.

  • Problem 6 -- Apply the Cox-Mantel test by hand to the 6-MP/placebo data in Table 1 above. How do the results compare with the results in Problem 4?
    
    
    
    APPENDIX I:  MANTEL'S TABLEAU FOR THE GEHAN-WILCOXON TEXT

    In both the method discussed in the text (and class) and the method of Table 5.1, the numerator of the Gehan-Wilcoxon test statistic is

    U = Sum(i=1,n1) U_i

    where n1 is the first sample size and U_i  is essentially the inner sum of the double sum in (5.1.1). (``Essentially'' since U_i above has the sum of both XX and XY interactions for Xs in the first sample, while the inner sum in (5.1.1) only has XY interactions. The sum of the XX interactions cancels out in the sum for U. The expanded definition of U_i, with XX, XY, and YY interactions, is needed for the formula for the variance.)

    The first step is to rewrite U_i above, for X_i in the first sample, as

    U_i = T1i - T2i

    where T1i is the number of observations in both samples that are ``known to be strictly less than'' than X_i and T2i is the number of observations that are ``known to be strictly greater'' than X_i (in the sense of (5.1.1) ). Note than T2i=0 for any censored value X_i, since no value can be known to be greater than a censored value. In particular, U_i>=0 for censored values.

    In fact, in Table 5.1, R1i=T1i+1 and R2i=T2i+1, so that U_i = R1i-R2i = T1i-T2i. That is, R1i=T1i+1 is always the number of observations in both samples that are ``known to be strictly less than'' X_i, plus one, and R2i=T2i+1 is always the number of observations in both samples that are ``known to be strictly greater than'' X_i, plus one. In particular, T2i=1 for all censored values. Mantel apparently doesn't like 0s, so that he has arranged things so that the R1i,R2i are always positive.

    To see this for R1i, note that censored values aren't used in the ``ranks'' of uncensored X_i in Step 1 in Table 5.1. This is because a censored value X_j or Y_j cannot be known to be strictly less than X_i, whether X_i is censored or not. Thus the number of X_j or Y_j that are strictly less than an untied uncensored value X_i, plus one, is exactly the rank of X_i among uncensored values X_j and Y_j. The same rule holds for tied uncensored values if you take the smallest rank in the tie group. The number of observations ``known to be strictly less than'' a censored value X_i is equal to the number of uncensored observations than are ``strictly less'' than X_i, which is the same as the ``rank'' of the next highest uncensored value, as in Table 5.1. More precisely, this is the ``rank'' of the smallest uncensored value that is greater than X_i. In particular, a tie group of censored values are all given the ``rank'' of the next highest uncensored value. If there are no uncensored values that are larger than a censored value X_i, then its ``rank'' is the total number of uncensored values, plus one.

    In contrast, for R2i, both censored and uncensored values X_j,Y_j count for ``known to be strictly greater than'' in Step 5, so that (descending) ranks appear for all observations. The rules ``known to be strictly less than (or greater than)'' plus one accounts for the handling of ties in both Step 1 and Step 5 in Table 5.1. Since censored values always have ``rank''=+1 in Step 5, it doesn't matter whether they are tied or not, nor with what. Thus, R1i=T1i+1 and R2i=T2i+1 in all cases.

  • Top of this page