Math 475 Homework 2 - Fall 2005
Click here for Math475 home page
Click here for Math475 homework page
Click here for Prof. Sawyer's
home page
HOMEWORK #2 due 10-4
Text references are to Cody & Smith,
``Applied statistics and the SAS programming language''
Organize your homework in the following manner:
(i) your answers to all questions,
(ii) all of your SAS programs, and
(iii) all of the SAS output that you got.
Add page numbers to your homework so that you can make references
from part (i) to part (iii): for example, so that you can say
things like, ``The answer in part (a) is 17. The scatterplot for
part (b) is on page #Y below.'' Include your SAS output even if you
don't refer to it explicitly. Except for forward references like these, a
grader should not have to look beyond part (i) of your homework
unless he or she thinks that you have done something wrong.
Include your name in a title statement so that your name
will appear at the top of each output page.
If a problem asks you to do a statistical test, EXPLAIN CLEARLY what
the null hypothesis H_0 is, what test you used, what the P-value is, and
whether the data is significant, highly significant, or neither. Include
this as part of your answer in part (i).
(See also the main page on the Math475 Web page.)
1. Twenty five (25) individuals volunteered for a study.
Confidential identifiers for the 25 individuals are given in the following
table.
Table 1. Confidential identifiers for 25 volunteers
A11 B33 C22 D61 E88
F07 G21 H91 I37 J19
K90 L30 M98 N48 O11
P77 Q07 R54 S18 T31
U45 V11 W71 X76 Y32
(i) Write a SAS program to randomly assign these 25 individuals to a
treatment group with m=17 individuals and a control group
with n=8 individuals. In your SAS program, format the 25 identifiers in a
datalines;
block exactly as they appear in the table above
(WITHOUT the ``Table 1'' line). Write the (SAS) data step so that
these identifiers appear in a single column in the output data set.
(Hint: See randtwosamp.sas
.)
(ii) Which 8 individuals did you (or your program) assign to the treatment
group? List their confidential identifiers in alphabetical order.
2. (Similar to Problem #3-7 in the text) Some summary
statistics for the occurrence of asthma and SES (socioeconomic class) are
Asthma Yes No
-------------------------
LowSES 39 101
HighSES 29 137
Create a SAS data set from these data and test the hypothesis of
independence of rows and columns. Make sure that the 2x2 table appears in
your output with the same row and column order as above. For this table,
what is the Pearson chi-square P-value? The P-value for the two-sided
Fisher exact test? On the basis of these data, do you accept or reject the
hypothesis that there is no association between SES and Asthma?
3. A total of 2000 observations are made of individuals
that can have any of three different levels of Zubricity (A,B,C) and any
of four different levels of Income. The counts are
Income
1 2 3 4
A 66 98 127 180
Zubricity B 111 136 170 228
C 168 193 240 283
(i) Is there an association between Zubricity and Income in this
table? Have SAS do the Pearson chi-square test on the 3 by 4 table to
find out. What is the degrees of freedom? What is the P-value?
(ii) Have SAS also compute the Mantel-Haenszel (trend) chi-square test
(for a trend). What is its number of degrees of freedom? Why is the
P-value different? What is this test designed to detect? That is, what
alternative H_1 should one conclude if the P-value is significant?
4. Suppose that the same treatment as in Problem 1
is given to patients suffering from four different but related diseases,
which are labeled as Dis#A, Dis#B, Dis#C, and Dis#D. The numbers of
individuals surviving for or dying within six months were collected in the
following table.
Table 2. Morbidity results for four diseases
Dis#A Dis#B Dis#C Dis#D
Surv Die Surv Die Surv Die Surv Die
Treated 250 107 390 702 218 141 317 757
Control 454 240 173 390 488 436 113 348
Note that Dis#B and Dis#D appear to be more severe than the others,
although all four diseases have high mortality rates in both treatment
groups.
(i) Does the treatment have an overall positive or negative effect on
mortality over the four strata? Carry out a test that gives you a single
P-value and that is not subject to Simpson's Paradox. (For example, the
Mantel-Haenszel (strata) test.) Do you accept or reject the hypothesis
that treatment has no effect on survival? Do you get the same results for
each of the diseases separately?
(ii) Is the effect of the treatment positive or negative? That is, do
relatively more treated individuals survive than control individuals?
(Hint: Consider the phi coefficient for each disease.) Would you
recommend that this treatment be given for individuals with these
conditions, assuming that no other treatment was available? Would your
recommendation depend on which of the four conditions?
(iii) What is the P-value for the Breslow-Day test in the output?
Does this suggest that an instance of Simpson's Paradox might ensue if
the counts for the three diseases are combined into one table?
(iv) Combine the diseases into one 2x2 table. What is the Pearson
Chi-Square P-value for this possibly-incorrect table? Is this consistent
with your answer to part (i)? What is the phi coefficient for the
combined table? Is it consistent with your results in part (ii)? In
the combined table, do relatively more treated individuals survive than
control individuals?
5. A test is made of the effects of a new drug on people
who are occasional sufferers from a newly discovered allergy that affects
people only during the winter. Eighty (80) people are enrolled in the
study. Forty (40) subjects are first asked if they had allergic symptoms
during a particular year, then given the drug, and then asked again if
they had allergic symptoms after the following year. The other half (40)
are given the drug the first year but not the second year and, again,
asked if had allergic symptoms with and without the drug. Thus, there are
two Yes-or-No responses from each enrollee, and, in particular,
8 individuals had no symptoms with the drug but did have symptoms
without the drug. The experimenters state that this experimental design
helps to control for variable severity of the allergy among the subjects.
The results were
Numbers of individuals with allergic symptoms
with and without a drug over two seasons
Without Drug
Yes No Totals
Yes 11 22 33
With Drug
No 8 39 47
------------------------------------
Totals 19 61 80
(i) On the basis of these data, does the drug tend to change
significantly the incidence of allergy in vulnerable individuals?
(ii) If the drug has an effect, would you recommend the drug to
someone who suffers from this allergy? That is, does the drug help or
hurt?
(Warning: Although the data is in the form of a 2x2 contigency
table, the Pearson chi-square test may not be appropriate. For example, a
large number of (Yes,Yes) counts may simply mean that these particular
individuals would have allergic symptoms no matter what. Similarly, a
large number of (No,No) counts might be due to a subset of the sample who
are almost never affected. Thus all of usable information in the table is
in the (Yes,No) and (No,Yes) counts. Before using either the Pearson or
Fisher exact tests, read about McNemar's test in the text.)
Top of this page