Click here for Prof. Sawyer's
home page
TAKEHOME FINAL due on or before Thu 12-22 by 5:30 P.M.
(Return to Prof. Sawyer or
to math receptionist in Cupples I Room 100.)
NOTE: There should be NO COLLABORATION on the takehome final,
other than for the mechanics of using the computer.
Open textbook and notes (including course handouts).
In general where the results of a statistical test are asked for,
(i) EXPLAIN CLEARLY what the hypotheses H0 is and what
alternative you are testing against, (ii) find the P-value for the test
indicated (and state what test you used), and (iii) state whether the
results are significant (P<0.05), highly significant (P<0.01), or
not significant (P >= 0.05). If the P-value is based on a Student's t
or Chi-square or F distribution, also give the degrees of freedom.
ORGANIZE YOUR WORK in the following manner: (i) your answers
to all questions, (ii) all your SAS programs, and (iii) your
SAS output. ADD CONSECUTIVE PAGE NUMBERS to your homework so that you
can make references from part (i) to part (iii). (For example,
so that you can say things like, ``The answer in part (a) is 57.75.
The scatterplot for part (b) is on page #Y below.'') If you do
different SAS problems at different times, it may be easiest to write
page numbers yourself on the SAS output.
Different parts of problems may not be equally weighted.
5 problems.
Problem 1. Walloopia is a small, apocryphal country that is famous
for its pure water and mild climate. A total of 1391 Walloopians died
during the previous year, amounting to a crude death rate of 1.77 per
thousand. The elders of the country feel that this death rate is too high
given the relatively young Walloopian population and are concerned about
what this says about the Walloopian health infrastructure.
Census data for Walloopia in the previous year, along with death rates
(per individual per year) for a climatically comparable U.S. population,
are given in Table 1.
Table 1. Census Data for Walloopia
Age U.S. Walloopian
Range death rate census
0 to 15 0.0016 212000
15 to 30 0.0011 188000
30 to 45 0.0013 162000
45 to 60 0.0029 143000
60 to 75 0.0057 83000
----------------------------------------
Total: 0.0032 788,000
The crude death rate in the climatically matched U.S. population was
0.0032, or 3.2 per thousand, which was nearly twice the Walloopian crude
death rate of 1391/788000=0.00177.
(i) The U.S. crude death rate (3.2 per thousand) times the total
Walloopian population yields 2521.6, deaths, which is considerably higher
than the observed Walloopian 1391 deaths. Why is this an inappropriate
comparison of the public-health institutions in the two countries?
(ii) Using the Walloopian population distribution as the standard, what
was the (Walloopian-population-standardized) death rate in the U.S.
population? Was it higher or lower than the observed (crude) U.S. death
rate?
(iii) Using the climatically comparable U.S. population as the standard,
what was the (US-population-standardized) death rate in Walloopia during
that year? Was it higher or lower than the observed crude death rate of
1.77 per thousand? Assuming that the health infrastructure is comparable,
do the pure water and mild climate appear to help or hurt the Walloopians?
Why?
(iv) Which population-standardization method, direct or indirect
standardization, did you use in part (ii)? in part (iii)?
Problem 2. Disease remission times for 40 patients, some of whom
were treated and some of whom were not treated, are given in Table 2.
A trailing + in Table 2 means a right-censored value. (For example,
if a patient withdrew from the study at that time or died due to unrelated
causes.)
Table 2. Remission times for two groups.
Control Group (not treated)
14 43 45 52 67 83 111 145 169 175 196 225 103+
108+ 113+ 158+ 164+
Treated Group
20 24 25 25 30 31 41 42 42 45 45 68 70 75 75
91 107 131 9+ 50+ 62+ 63+ 148+
(i) Is there a significant difference in survival time between the two
groups, as determined by a Cox regression? A highly significant
difference? What is the P-value?
(ii) What is the relative risk of the Control group in comparison with
the Treatment group? Is it greater than one or smaller than one?
(iii) In general, if the relative risk of one group with respect to a
second group is greater than one, does this mean that a typical subject
from the first group will tend to live longer than a typical subject from
the second group, or that the subject will tend to die sooner? (Be
careful!)
Problem 3. Survival times in days are given in Table 3 below
for patients who had been diagnosed with a particular disease and had
either been given a particular treatment (Treat=1) or no treatment
(Treat=0). Measurements for morphness (Morph
), spatility
(Spat
), and hypochronicity (Hypo
) were also
recorded at the time of diagnosis and are given in Table 3.
The columns in Table 3 are (i) a subject number,
(ii) survival time in days, (iii) censoring status,
(iv) treatment state (1 if Treated, 0 if Control), and values
for (v) Morphness, (vi) Spatility, and
(vii) Hypochronicity.
Table 3: Survival times in days in terms of treatment status,
morphness, and two other variables.
(Status: 1 if censored, 0 if observed.)
(Treatment: Treat=1 if treated, Treat=0 if not treated.)
Subj Time Status Treat Morph Spat Hypo
1. 35 0 1 496 62 279
2. 60 0 1 838 24 179
3. 96 0 1 740 72 252
4. 114 0 1 511 106 165
5. 165 0 1 982 112 160
6. 173 0 1 607 127 257
7. 178 0 1 1021 115 226
8. 182 0 0 745 21 239
9. 185 0 1 531 76 148
10. 220 0 0 569 47 192
11. 240 0 1 368 93 117
12. 254 0 0 1013 54 145
13. 262 0 1 588 63 210
14. 275 0 0 881 52 144
15. 314 0 0 902 86 236
16. 339 0 1 842 56 201
17. 385 0 1 947 28 51
18. 394 0 0 994 85 221
19. 425 0 0 822 77 194
20. 474 0 1 926 23 104
21. 484 0 1 1238 31 181
22. 595 0 0 1469 48 169
23. 605 0 1 1239 67 146
24. 638 0 1 1321 40 226
25. 732 0 0 1025 89 220
26. 782 0 1 1168 99 155
27. 884 0 0 650 99 114
28. 38 1 0 1171 49 235
29. 75 1 0 436 74 176
30. 165 1 1 543 100 141
31. 179 1 0 522 68 179
32. 219 1 1 893 103 90
33. 321 1 0 906 112 269
34. 493 1 0 1197 48 182
35. 539 1 1 1011 75 173
(i) Analyze the data in the table using the Cox PH regression model. Is
there an overall significant effect of the four covariates together?
What is the P-value? Which version of the model test did you use?
(ii) Which of the four variables (Treatment status, Morphness,
Spatility, and Hypochronicity) individually have a significant effect on
survival time? Which have a highly significant effect? For those with
significant P-values, what are the P-values? For each variable with a
significant effect, does increasing the value of that variable imply a
higher death rate or a lower death rate?
(iii) Suppose that it is known that the average morphness level in the
general population is 1000. Suppose that a given patient has a Morphness
level of 1500. What is her estimated increased or decreased survival rate
or risk due to her increase morphness level? Is she under increased or
decreased risk due to her increased morphness?
Problem 4.. Samples of two groups were followed over 17 years. The
numbers of deaths and censoring events (that is, individuals who were last
seen at that time) over the 17 years are recorded in Table 4. All
individuals in Groups O and X are accounted for in Table 4, so
that the last 8 individuals in the combined dataset were recorded as
censored in Year 17.
Table 4: Survival times in years for two groups.
Group O Group X
Year deaths censored deaths censored
1 21 0 114 0
2 8 2 57 10
3 7 2 38 6
4 6 2 43 6
5 7 2 34 6
6 6 7 31 27
7 5 8 21 33
8 3 8 19 26
9 4 6 13 17
10 3 7 11 16
11 1 6 11 11
12 1 6 9 13
13 1 5 5 8
14 1 2 2 7
15 1 2 2 6
16 1 3 0 0
17 0 0 0 8
(i) Using a Cox regression model, is there a significant differences
between the lifetimes of the two groups? What is the P-value? What is the
estimated relative hazard rate of Group X with respect to
Group O? Is Group X at greater hazard than Group O, or vice
versa? Use the default Breslow tie-handling method for the ties in the
data.
(Hints: See ltangina.sas
on the Math434 Web site
for clues about how to read tabled data of this form into a useful SAS
dataset. If num
is the name of your variable for the counts
in Table 4, DON'T FORGET to include freq num
in SAS
procedures that need to know that your data set is describing groups of
individuals and not individual records. See Section 12.1 in the text
for a discussion of tie-correction methods. (See also
phresid.sas
on the Math434 web site.) )
(ii) Since the ties in Table 4 result from individuals dying at
different times of the year and then being grouped by year, the ``exact''
tie-correction method should be more accurate than the Breslow method in
this case. Redo the analysis in part (i) using the exact
tie-correction method instead of the Breslow method. How do your results
change? What is the estimated hazard rate using the more accurate
``exact'' method? (See the hints in part (i). )
(iii) Does Group (that is, Group X or Group O) have a
time-dependent effect on mortality in Table 4? Test for a
time-dependent effect of Group using either the Breslow or the exact
tie-correction method. Recall that by ``time-dependent'' we mean an effect
that is the same for both individuals and their risk sets at any
particular time and not the result of a covariate that can be attached to
records. (Hint: See comments about time-dependent variables in
ph2samp.sas
and in other example SAS datasets on the Math434
Web site.)
Problem 5.. Forty (40) subjects were recruited for a study of the
effectiveness of a particular treatment. Remission times for the subjects
were recorded over a period of 90 days with all surviving subjects
recorded as censored on day 91. It is known that remission is also
strongly affected by a variable called X that can vary over time.
The value of X was recorded for each subject initially (X=X0), at day
30 (X=X30), and also at day 60 (X=X60). It is assumed that X0 is a good
approximation for X for days 0 to 29, that X30 can be used for days
30 to 59, and that X60 is a good approximation for days 60
to 90. The data from the study are given in Table 5.
Table 5. Remission times in terms of Sex, Treatment status, and
values of X initially (X0), at 30 days (X30), and at 60 days (X60).
(Status: 1 if censored, 0 if observed.)
(Treatment status: Treat=1 if treated, Treat=0 if not treated.)
Subj Time/Status Sex Treat X0 X30 X60
1. 1 0 1 0 41 33 12
2. 2 0 1 0 42 37 30
3. 4 0 0 1 10 44 42
4. 5 0 1 0 24 29 19
5. 10 0 1 1 17 37 36
6. 24 0 0 0 33 31 7
7. 26 0 1 0 26 18 32
8. 29 0 1 1 28 32 9
9. 31 0 1 1 13 42 7
10. 32 0 1 1 8 40 20
11. 32 0 1 0 22 35 36
12. 36 0 1 0 14 11 45
13. 38 0 1 0 40 38 32
14. 44 0 0 0 31 20 44
15. 50 0 1 0 21 40 29
16. 54 0 0 1 33 43 28
17. 59 0 1 1 11 43 39
18. 61 0 1 1 15 31 45
19. 66 0 1 0 10 35 23
20. 67 0 0 0 6 6 40
21. 67 0 0 0 21 24 34
22. 68 0 1 1 7 25 45
23. 68 0 0 1 19 32 42
24. 69 0 0 1 21 23 40
25. 70 0 0 0 5 26 8
26. 74 0 0 1 37 29 20
27. 91 1 1 0 9 16 11
28. 91 1 0 1 10 33 27
29. 91 1 0 1 11 21 32
30. 91 1 0 0 25 25 20
31. 91 1 0 1 27 21 18
32. 91 1 0 0 38 7 12
33. 91 1 1 1 43 16 42
34. 91 1 0 1 44 24 35
35. 91 1 0 1 44 42 6
(i) Using the appropriate model to analyze the data, is there a
significant effect for Sex, Treatment, and X together? Which of the three
covariates have a significant effect? Which have a highly significant
effect? For each variable that has a significant effect, do larger values
of this variable tend to increase or decrease the time to remission?
(Hint: See comments in the sample programs
ph2samp.sas
and phresid.sas
on the Math434 Web
site for remarks about modeling time-dependent variables.)
(ii) Does Treatment have a significant affect? If so, what is the relative
risk of NOT being treated?
Top of this page