1. A warehouse manager is comparing motorized carts
from three different manufacturers with the idea of purchasing one of the
brands. She is primarily interested in the time (Y) that operators take
to fetch and deliver a load in a cart. She also keeps track of the weight
of each load in case that has a confounding effect. Trial runs are made
for 15 loads for each motorized cart, for a total of 45 trial runs.
Forty-five (45) different operators were used. The times and weights for
the 45 trial runs were
Table 1 - Times and Weights for Motorized Carts
A 42 104 A 38 79 A 47 75 A 44 95 A 51 102
A 44 107 A 54 110 A 39 98 A 44 106 A 56 101
A 56 120 A 43 88 A 50 99 A 59 122 A 52 99
B 56 107 B 42 85 B 49 98 B 54 106 B 44 88
B 48 110 B 40 93 B 46 104 B 45 87 B 44 86
B 44 101 B 46 86 B 46 87 B 62 121 B 55 80
C 51 87 C 47 92 C 62 97 C 57 117 C 43 85
C 66 120 C 59 101 C 52 115 C 57 107 C 46 99
C 53 109 C 54 99 C 46 91 C 41 72 C 55 105
Each triple of values in Table 1 denotes the cart type (one of three
values A,B,C
), the delivery time for that
load (Y
), and the weight of the
load (X
).
(i) Is there a significant variation in the cart brands, as measured by
time efficiency, NOT allowing for load weight? (That is, ignoring the
weight measurements. For definiteness, use Carttype
for
cart brand and Weight
for the weight.) What is the P-value
for the ANOVA? What is the model R2?
(ii) Do the carts vary significantly in efficiency, ALLOWING for load
weight and cart brand, but NOT allowing for load weight and cart brand
interactions, listing cart brand before weight in the regression for
definiteness? What is the P-value for the ANOCOVA? What is the model
R2?
Which appears to have a stronger effect on the times, the load
weight or the type of cart? Which of the two are significant in the
Type I table? What are their P-values? Which are significant in the
Type III table? What are their P-values? Why are the P-values
different between the two tables?
Is the effect of cart brand significant in the Type I table? in
the Type III table?
Note that the significance of cart brand in the presence of a load
weight correction would be a statistically valid measure of quality of
cart brand, as long as the warehouse manager is convinced that the
distribution of load weights in Table 1 is the same as she would
encounter in practice.
(iii) Is there a significant interaction between the effect of weight
and the type of cart, ALLOWING for load weight and cart brand
interaction in the regression, using ``interaction'' in the usual sense
for ANOCOVA models? Which effects are significant in the Type I
table? in the Type III table? Do your conclusions change from
part (ii)?
(iv) Find the (Pearson) correlation coefficients for time (Y)
versus Weight within each cart type. (That is, run proc
corr
with by Carttype
.) Do the within-cart-type
correlation coefficients vary? Also, construct a scatterplot of time
versus weight with cart type as the plotting symbol. Do these
conclusions affect your answer to part (ii)?
What would you advise the warehouse operator about the best choice
of brand for the three types of cart?
2. Variables AA, BB, and CC were measured for 32 subjects.
Of these subjects, 12 later developed Condition X while the
remaining 20 did not develop Condition X. The data are listed in
Table 2.
An experimenter is interesting in finding which of the variables
AA, BB, and CC are significantly related to developing Condition X.
The experimenter is also interested in finding a rule that, given the
values of AA, BB, and CC for a subject, predicts the probability that
that subject will later develop Condition X.
Table 2 - Covariates AA BB CC for 32 subjects
that later either developed or did not develop Condition X
Developed Condition X Did NOT develop Condition X
Subj AA BB CC Subj AA BB CC
1 69 83 51 13 36 55 39
2 51 74 32 14 50 69 44
3 27 68 33 15 36 59 28
4 55 85 46 16 31 26 44
5 27 99 34 17 31 49 47
6 44 68 38 18 32 45 50
7 49 88 57 19 40 59 33
8 28 64 66 20 49 51 42
9 32 58 46 21 38 70 47
10 47 81 39 22 46 63 26
11 35 77 31 23 46 64 47
12 30 69 62 24 67 94 43
25 47 60 56
26 56 62 45
27 39 64 27
28 52 71 24
29 33 62 52
30 57 63 48
31 39 78 23
32 48 70 55
(i) Use the data in Table 2 to find a linear discriminant function
L(data) = c0 + c1AA
+ c2BB + c3CC
with the property that L(data)>0 predicts Condition X. Assume
SAS's default assumptions for proc discrim
that the
variables AA BB CC in each group have joint normal distributions with the
same covariance matrix in each group and begin with a prior belief
of 0.50 that a randomly chosen subject has Condition X.
(Hints: (a) See plogistic.sas
on the Math475 Web
site. (b) Look for Linear Discriminant Function
in the
SAS output for proc discrim
followed by Coefficient
Vector = COV(-1)Xbar_j
or something similar. The coefficients in
the linear discriminant function are the differences between the
covariate coefficients for the two groups. The cutoff value is given by
the difference between the Constant
values.)
Have SAS print out the means and standard deviations of each covariate
within each group. (Hint: The option simple
tells SAS
to do this, as in (for example) proc discrim data=xnotx simple
....;
. Alternatively, you can use proc means
.) Which
covariates seem to be the most divergent between the two groups as judged
by the within-group means and standard deviations?
(ii) Which variables have the highest coefficients in the discriminant
function? Is this consistent with your answer to the previous problem?
Are the signs of these coefficients consistent with the differences in
within-group means? That is, are large (or positive) values in a
covariate suggestive of Condition X, or small (or negative) values?
(iii) Using the data in Table 2 as a test data set, how many of the
subjects are incorrectly classified? (This is called a
Resubstitution
analysis.) If you enter
crossvalidate
on the proc discrim
command line,
then SAS will also do a crossvalidation procedure in which each
subject is classified on the basis of the discrimination rule defined by
the other subjects, NOT INCLUDING that subject itself. (The
resubstitution compares each subject with the rule for all subjects,
including the subject itself, which influences the rule about what group
that subject should belong to.) How does the number of misclassified
subjects change under this crossvalidation?
3. For the data in Table 2,
(i) Use a logistic regression to predict the probability of developing
Condition X given values of AA, BB, and CC.
Is there an overall statistically significant effect of the three
covariates together on whether or not a subject develops Condition X?
What is the P-value? (If more than one test is available, pick one of
them.) What is the number of degrees of freedom of the chi-square
statistic?
(Hint:: See plogistic.sas
on the Math475 Web
site.)
(ii) Which of the three variables AA, BB, and CC individually have a
significant effect on the probability of developing Condition X in
the logistic regression? Which have a highly significant effect? For the
variables that have significant effects, what is the P-value for each?
For each variable with a significant effect, does increasing the value
of that variable make Condition X more likely to occur, or less
likely? How can you tell from the output?
(iii) Are your answers to part (ii) consistent with the means of
the variables in the two groups? That is, if increasing a covariate also
increases the probability of Condition X, is this consistent with the
mean of that covariate being higher among the records with
Condition X?
Top of this page