Click here for Prof. Sawyer's
home page
HOMEWORK #4 due Thursday 11-2
Six problems.
Text references are to the textbook, Cody & Smith,
``Applied statistics and the SAS programming language''
NOTE: See the main Math475 Web page for how to organize a homework
assignment using SAS. In particular,
ALWAYS INCLUDE YOUR NAME in a title statement in your SAS
programs, so that your name will appear at the top of each output page.
ALL HOMEWORKS MUST BE ORGANIZED in the following order:
(Part 1) First, your answers to all the problems in the homework,
whether you use SAS for that problem or not. If the problem asks you to
generate a graph or table, refer to the graph or table by page number in
the SAS output (see below). (Xeroxing a page or two from the SAS output or
cutting and pasting into a Word file or TeX source file is also OK.)
(Part 2) Second, all SAS programs that you used to obtain the output for
any of the problems. If possible, similar problems should be done with the
same SAS program. (In other words, write one SAS program for several
problems if that makes things easier, using Better yet would be one SAS
title or title2 statements to separate the problems in
your output.)
(Part 3) Third, all output for all the SAS programs in the previous
step.
If an answer in Part 1 requires a table or a scatterplot that you need to
refer to, make sure that your SAS output has overall increasing (unique)
page numbers and make references to Part 3 by page number, such as
``The scatterplot for Problem 2 part (b) is on page #X in
the SAS output below.'' DO NOT say, ``see Page 3 in the SAS output''
if Part 3 has output from several SAS runs, each of which has its own
Page 3. In that case, either write your own (increasing) page numbers
on the SAS output, or else (for example) refer to ``Page 2-7 in the
SAS output'' (for page 7 in the second set of SAS output) and write
page numbers in the format ``2-7'' at the top of pages in your output.
Problem 1. (30) The responses of 35 patients to 5
experimental drugs were:
Table 1. Responses to Five Experimental Drugs
A1: 13.32 18.87 14.61 15.02 15.42 16.23 14.01
A2: 17.01 18.14 18.06 18.46 15.91 16.94 14.50
H1: 17.83 18.13 19.89 19.01 16.84 19.53 14.77
C2: 20.83 19.87 21.04 17.12 20.50 17.55 20.17
C3: 19.62 19.03 20.11 20.52 21.05 20.21 25.91
(i) Was there a significant difference in the responses to the
different drugs, as measured by a one-way ANOVA? What is the
P-value? What is the estimate of the error standard deviation (that is,
sigma)? Where did you find it in the output? How did SAS calculate it?
(ii) Which PAIRS OF TREATMENTS, and HOW MANY pairs of treatment,
are significantly different at the alpha=0.05 level, using pairwise
two-sample t-tests using the MSE (in part (i)) to estimate the error?
(That is, using the LSD method. This should have a relatively large number
of pairwise differences since there is no multiple-comparison correction.)
(iii) Use Bonferroni's procedure to find out WHICH PAIRS differ at
alpha=0.05. How many significant pairs are there? Do you obtain different
conclusions from those from part (ii)?
(iv) Use Tukey's procedure to find out which pairs differ at
alpha=0.05. Do you obtain different conclusions from those in
part (ii)? From the Bonferroni test in part (iii)? (Usually
Tukey's procedure finds significant differences that are intermediate
between the LSD and Bonferroni procedures.)
(v) Use the Regwq procedure to find out which pairs differ at
alpha=0.05. Do you obtain different conclusions from those of Tukey's
procedure in part (iv)?
(vi) The Duncan and SNK (Studentized-Newman-Keuls) procedures
used to be popular years ago, but more recent research has suggested that
in some cases they are no more reliable for multiple comparison purposes
than the LSD method. (See the SAS online help and documentation pages for
the details.) SAS supports them, but recommends that they not be used.
Use the Duncan procedure (anyway) to find out which pairs of
treatments are significantly different at the alpha=0.05 level. How do
your conclusion compare with those of the LSD method in part (ii)?
with the Tukey method in part (iv)? (Usually the Duncan procedure
finds differences that are intermediate between those of the LSD and Tukey
methods.)
(vii) A consultant conjectures that the two A
drugs
should behave similarly in the human body due to a similar chemical
structure, but that the two C
drugs should be metabolized
differently. Using the same MSE as in the previous analyses, test whether
or not the AVERAGE of the two A
drugs is significantly
different from the AVERAGE of the two C
drugs. What is the
P-value? (Hint: Use a Contrast test. See for example
OnewayMC.sas
on the Math475 Web site.)
Problem 2. (20) An engineer is studying the response to a
system that depends on two factors, Shrillness
and
Color
. The first factor can take the values Hi
or Low
and the second Red
or Blue
.
She gathers data from six experimental runs for each pair of settings of
the two factors. The results are presented in Table 2.
Table 2. Responses of System to Settings of Two Factors
Hi Red 224 255 261 214 192 232
Hi Blue 174 148 187 158 189 211
Lo Red 224 181 200 155 195 200
Lo Blue 257 204 229 200 205 233
(i) Use SAS to run a two-factor full factorial ANOVA analysis. Is the
Model Test significant? What is its P-value?
(ii) Plot the residuals against the predicted values in the model. Do the
residuals appear to be independent of the predicted value and of
Shrillness?
(iii) Which of the two main effects and one interaction are significant?
highly significant? What are the P-values of the significant interactions?
(iv) If the interaction is significant, display an interaction plot. Is
the interaction visible in the interaction plot? What can you conclude
about the interaction and how it affects the dependent variable?
Problem 3. (20) An engineer is interested in the frequency
of a mechanical device as a function of three variables: Pressure, with
three levels (Press1,Press2,Press3), Drubness, with two levels
(Drub1,Drub2), and Abrasiveness, with three levels (Ab1,Ab2,Ab3). The
frequencies of two devices are measured for each set of levels of the
three variables. The resulting frequencies are listed in Table 3.
Table 3. Frequencies of a Device
Press1 Press2 Press3
Drub1 Drub2 Drub1 Drub2 Drub1 Drub2
Ab1 3839 3202 326 117 5950 1254 357 1550 484 227 1915 2924
Ab2 1313 3202 276 368 1574 8814 530 538 1046 1128 1373 2795
Ab3 2097 6417 374 429 3614 1293 238 2476 201 886 1803 1647
(i) Use SAS to run a three-factor full factorial model with the three
variables as factors. Is the Model Test significant? What is its P-value?
(ii) Plot the residuals against the predicted values in the model. In
order to get a better idea of the distribution of the residuals, include
the level of Pressure in the residual plot as the plotting symbol. (Make
sure that the plotting symbol identifies the Pressure level!)
Do the residuals appear to be independent of the predicted value and
of the Pressure? Why? Do the residuals appear to be normally distributed?
Carry out a test that provides a P-value for the normality of the
residuals.
(iii) Run the full factorial model again with the values in Table 3
replaced by their logarithms. Is the Model test now more significant?
Analyze the residuals of the log-transformed data in the same way as in
part (ii). Do they now look more independent of the predicted value
and of the value of Pressure?
(iv) Which of the main effects of Abrasiveness, Drubness, and Pressure are
significant for the log-transformed data? highly significant? Which of the
three two-way and one three-way interactions are significant? highly
significant? What are the P-values for the significant effects? For the
effects that are significant, what are the degrees of freedom for the
F-tests involved, numerator and denominator?
(v) For each of the two-way interactions that are significant, display an
interaction plot. For each such interaction, is the interaction visible in
the interaction plot? What can you conclude about the interaction and how
it affects the dependent variable (that is, the frequency of the device)?
Problem 4. (20) A warehouse manager is comparing motorized
carts from three different manufacturers with the idea of purchasing one
of the brands. She is primarily interested in the time (Y) that operators
take to fetch and deliver a load in a cart. She also keeps track of the
weight of each load in case that has a confounding effect. Trial runs are
made for 15 loads for each motorized cart, for a total of 45 trial runs.
Forty-five (45) different operators were used. The times and weights for
the 45 trial runs were
Table 4 - Times and Weights for Motorized Carts
A 42 104 A 38 79 A 47 75 A 44 95 A 51 102
A 44 107 A 54 110 A 39 98 A 44 106 A 56 101
A 56 120 A 43 88 A 50 99 A 59 122 A 52 99
B 56 107 B 42 85 B 49 98 B 54 106 B 44 88
B 48 110 B 40 93 B 46 104 B 45 87 B 44 86
B 44 101 B 46 86 B 46 87 B 62 121 B 55 80
C 51 87 C 47 92 C 62 97 C 57 117 C 43 85
C 66 120 C 59 101 C 52 115 C 57 107 C 46 99
C 53 109 C 54 99 C 46 91 C 41 72 C 55 105
Each triple of values in Table 4 denotes the cart type (one of three
values A,B,C
), the delivery time for that
load (Y
), and the weight of the
load (X
).
(i) Is there a significant variation in the cart brands, as measured by
time efficiency, NOT allowing for load weight? (That is, ignoring the
weight measurements. For definiteness, use Carttype
for cart
brand and Weight
for the weight.) What is the P-value for the
ANOVA? What is the model R2?
(ii) Do the carts vary significantly in efficiency, ALLOWING for load
weight and cart brand, but NOT allowing for load weight and cart brand
interactions, listing cart brand before weight in the regression for
definiteness? What is the P-value for the ANOCOVA? What is the model
R2?
Which appears to have a stronger effect on the times, the load
weight or the type of cart? Which of the two are significant in the
Type I table? What are their P-values? Which are significant in the
Type III table? What are their P-values? Why are the P-values
different between the two tables?
Is the effect of cart brand significant in the Type I table? in
the Type III table?
Note that the significance of cart brand in the presence of a load
weight correction would be a statistically valid measure of quality of
cart brand, as long as the warehouse manager is convinced that the
distribution of load weights in Table 4 is the same as she would
encounter in practice.
(iii) Is there a significant interaction between the effect of weight and
the type of cart, ALLOWING for load weight and cart brand interaction in
the regression, using ``interaction'' in the usual sense for ANOCOVA
models? Which effects are significant in the Type I table? in the
Type III table? Do your conclusions change from part (ii)?
DO EXACTLY ONE of the following two parts, (iv)(A)
or (iv)(B). Part (iv)(B) was the part that was originally
assigned. Part (iv)(A) is about as difficult but, I think, gives a clearer
picture of what is going on. Both (iv)(A) and (iv)(B) give roughly
the same information, but the results of (iv)(B) are harder to
interpret.
(iv)(A) Construct a scatterplot of time versus weight with
CartType
as the plotting symbol. Does this picture give a
clear idea of how the relationship between time and weight various across
cart type? Also, construct a scatterplot of the three WITHIN-CART-TYPE
regression lines on the same plot, using cart type as the plotting symbol.
Do either of these plots give a clear idea of how the relationship between
time and weight various across cart type? Do these conclusions affect your
answer to part (ii)? (Hint: See the last analysis in
AnCova.sas
on the Math 475 Web site.)
(iv)(B) Find the (Pearson) correlation coefficients for time (Y) versus
Weight within each cart type. (That is, run proc corr
with
by Carttype
.) Do the within-cart-type correlation
coefficients vary? Also, construct a scatterplot of time versus weight
with cart type as the plotting symbol. Do these conclusions affect your
answer to part (ii)? (Hint: See AnCova.sas
on the
Math 475 Web site.)
Top of this page