Math 475 Homework 4

Math 475 Homework 4 - Fall 2010

Click here for Prof. Sawyer's home page

HOMEWORK #4 due Thursday 11-2

Six problems.

Text references are to the textbook, Cody & Smith, ``Applied statistics and the SAS programming language''

NOTE: See the main Math475 Web page for how to organize a homework assignment using SAS. In particular,
ALWAYS INCLUDE YOUR NAME in a title statement in your SAS programs, so that your name will appear at the top of each output page.
ALL HOMEWORKS MUST BE ORGANIZED in the following order:
(Part 1) First, your answers to all the problems in the homework, whether you use SAS for that problem or not. If the problem asks you to generate a graph or table, refer to the graph or table by page number in the SAS output (see below). (Xeroxing a page or two from the SAS output or cutting and pasting into a Word file or TeX source file is also OK.)
(Part 2) Second, all SAS programs that you used to obtain the output for any of the problems. If possible, similar problems should be done with the same SAS program. (In other words, write one SAS program for several problems if that makes things easier, using Better yet would be one SAS title or title2 statements to separate the problems in your output.)
(Part 3) Third, all output for all the SAS programs in the previous step.
If an answer in Part 1 requires a table or a scatterplot that you need to refer to, make sure that your SAS output has overall increasing (unique) page numbers and make references to Part 3 by page number, such as ``The scatterplot for Problem 2 part (b) is on page #X in the SAS output below.'' DO NOT say, ``see Page 3 in the SAS output'' if Part 3 has output from several SAS runs, each of which has its own Page 3. In that case, either write your own (increasing) page numbers on the SAS output, or else (for example) refer to ``Page 2-7 in the SAS output'' (for page 7 in the second set of SAS output) and write page numbers in the format ``2-7'' at the top of pages in your output.

Problem 1. (30) The responses of 35 patients to 5 experimental drugs were:

    Table 1. Responses to Five Experimental Drugs
    A1:   13.32   18.87   14.61   15.02   15.42   16.23   14.01
    A2:   17.01   18.14   18.06   18.46   15.91   16.94   14.50
    H1:   17.83   18.13   19.89   19.01   16.84   19.53   14.77
    C2:   20.83   19.87   21.04   17.12   20.50   17.55   20.17
    C3:   19.62   19.03   20.11   20.52   21.05   20.21   25.91

(i) Was there a significant difference in the responses to the different drugs, as measured by a one-way ANOVA? What is the P-value? What is the estimate of the error standard deviation (that is, sigma)? Where did you find it in the output? How did SAS calculate it?

(ii) Which PAIRS OF TREATMENTS, and HOW MANY pairs of treatment, are significantly different at the alpha=0.05 level, using pairwise two-sample t-tests using the MSE (in part (i)) to estimate the error? (That is, using the LSD method. This should have a relatively large number of pairwise differences since there is no multiple-comparison correction.)

(iii) Use Bonferroni's procedure to find out WHICH PAIRS differ at alpha=0.05. How many significant pairs are there? Do you obtain different conclusions from those from part (ii)?

(iv) Use Tukey's procedure to find out which pairs differ at alpha=0.05. Do you obtain different conclusions from those in part (ii)? From the Bonferroni test in part (iii)? (Usually Tukey's procedure finds significant differences that are intermediate between the LSD and Bonferroni procedures.)

(v) Use the Regwq procedure to find out which pairs differ at alpha=0.05. Do you obtain different conclusions from those of Tukey's procedure in part (iv)?

(vi) The Duncan and SNK (Studentized-Newman-Keuls) procedures used to be popular years ago, but more recent research has suggested that in some cases they are no more reliable for multiple comparison purposes than the LSD method. (See the SAS online help and documentation pages for the details.) SAS supports them, but recommends that they not be used.

Use the Duncan procedure (anyway) to find out which pairs of treatments are significantly different at the alpha=0.05 level. How do your conclusion compare with those of the LSD method in part (ii)? with the Tukey method in part (iv)? (Usually the Duncan procedure finds differences that are intermediate between those of the LSD and Tukey methods.)

(vii) A consultant conjectures that the two A drugs should behave similarly in the human body due to a similar chemical structure, but that the two C drugs should be metabolized differently. Using the same MSE as in the previous analyses, test whether or not the AVERAGE of the two A drugs is significantly different from the AVERAGE of the two C drugs. What is the P-value? (Hint: Use a Contrast test. See for example OnewayMC.sas on the Math475 Web site.)

Problem 2. (20) An engineer is studying the response to a system that depends on two factors, Shrillness and Color. The first factor can take the values Hi or Low and the second Red or Blue. She gathers data from six experimental runs for each pair of settings of the two factors. The results are presented in Table 2.

    Table 2. Responses of System to Settings of Two Factors
          Hi  Red     224  255  261  214  192  232
          Hi  Blue    174  148  187  158  189  211
          Lo  Red     224  181  200  155  195  200
          Lo  Blue    257  204  229  200  205  233

(i) Use SAS to run a two-factor full factorial ANOVA analysis. Is the Model Test significant? What is its P-value?

(ii) Plot the residuals against the predicted values in the model. Do the residuals appear to be independent of the predicted value and of Shrillness?

(iii) Which of the two main effects and one interaction are significant? highly significant? What are the P-values of the significant interactions?

(iv) If the interaction is significant, display an interaction plot. Is the interaction visible in the interaction plot? What can you conclude about the interaction and how it affects the dependent variable?

Problem 3. (20) An engineer is interested in the frequency of a mechanical device as a function of three variables: Pressure, with three levels (Press1,Press2,Press3), Drubness, with two levels (Drub1,Drub2), and Abrasiveness, with three levels (Ab1,Ab2,Ab3). The frequencies of two devices are measured for each set of levels of the three variables. The resulting frequencies are listed in Table 3.

    Table 3. Frequencies of a Device

                 Press1                 Press2                    Press3
             Drub1     Drub2        Drub1      Drub2          Drub1     Drub2
    Ab1   3839 3202   326 117     5950 1254   357 1550      484  227   1915 2924
    Ab2   1313 3202   276 368     1574 8814   530  538     1046 1128   1373 2795
    Ab3   2097 6417   374 429     3614 1293   238 2476      201  886   1803 1647

(i) Use SAS to run a three-factor full factorial model with the three variables as factors. Is the Model Test significant? What is its P-value?

(ii) Plot the residuals against the predicted values in the model. In order to get a better idea of the distribution of the residuals, include the level of Pressure in the residual plot as the plotting symbol. (Make sure that the plotting symbol identifies the Pressure level!)

Do the residuals appear to be independent of the predicted value and of the Pressure? Why? Do the residuals appear to be normally distributed? Carry out a test that provides a P-value for the normality of the residuals.

(iii) Run the full factorial model again with the values in Table 3 replaced by their logarithms. Is the Model test now more significant? Analyze the residuals of the log-transformed data in the same way as in part (ii). Do they now look more independent of the predicted value and of the value of Pressure?

(iv) Which of the main effects of Abrasiveness, Drubness, and Pressure are significant for the log-transformed data? highly significant? Which of the three two-way and one three-way interactions are significant? highly significant? What are the P-values for the significant effects? For the effects that are significant, what are the degrees of freedom for the F-tests involved, numerator and denominator?

(v) For each of the two-way interactions that are significant, display an interaction plot. For each such interaction, is the interaction visible in the interaction plot? What can you conclude about the interaction and how it affects the dependent variable (that is, the frequency of the device)?

Problem 4. (20) A warehouse manager is comparing motorized carts from three different manufacturers with the idea of purchasing one of the brands. She is primarily interested in the time (Y) that operators take to fetch and deliver a load in a cart. She also keeps track of the weight of each load in case that has a confounding effect. Trial runs are made for 15 loads for each motorized cart, for a total of 45 trial runs. Forty-five (45) different operators were used. The times and weights for the 45 trial runs were

    Table 4 - Times and Weights for Motorized Carts
    A  42  104    A  38   79    A  47   75    A  44   95    A  51  102
    A  44  107    A  54  110    A  39   98    A  44  106    A  56  101
    A  56  120    A  43   88    A  50   99    A  59  122    A  52   99
    B  56  107    B  42   85    B  49   98    B  54  106    B  44   88
    B  48  110    B  40   93    B  46  104    B  45   87    B  44   86
    B  44  101    B  46   86    B  46   87    B  62  121    B  55   80
    C  51   87    C  47   92    C  62   97    C  57  117    C  43   85
    C  66  120    C  59  101    C  52  115    C  57  107    C  46   99
    C  53  109    C  54   99    C  46   91    C  41   72    C  55  105

Each triple of values in Table 4 denotes the cart type (one of three values A,B,C), the delivery time for that load (Y), and the weight of the load (X).

(i) Is there a significant variation in the cart brands, as measured by time efficiency, NOT allowing for load weight? (That is, ignoring the weight measurements. For definiteness, use Carttype for cart brand and Weight for the weight.) What is the P-value for the ANOVA? What is the model R²?

(ii) Do the carts vary significantly in efficiency, ALLOWING for load weight and cart brand, but NOT allowing for load weight and cart brand interactions, listing cart brand before weight in the regression for definiteness? What is the P-value for the ANOCOVA? What is the model R²?

Which appears to have a stronger effect on the times, the load weight or the type of cart? Which of the two are significant in the Type I table? What are their P-values? Which are significant in the Type III table? What are their P-values? Why are the P-values different between the two tables?

Is the effect of cart brand significant in the Type I table? in the Type III table?

Note that the significance of cart brand in the presence of a load weight correction would be a statistically valid measure of quality of cart brand, as long as the warehouse manager is convinced that the distribution of load weights in Table 4 is the same as she would encounter in practice.

(iii) Is there a significant interaction between the effect of weight and the type of cart, ALLOWING for load weight and cart brand interaction in the regression, using ``interaction'' in the usual sense for ANOCOVA models? Which effects are significant in the Type I table? in the Type III table? Do your conclusions change from part (ii)?

DO EXACTLY ONE of the following two parts, (iv)(A) or (iv)(B). Part (iv)(B) was the part that was originally assigned. Part (iv)(A) is about as difficult but, I think, gives a clearer picture of what is going on. Both (iv)(A) and (iv)(B) give roughly the same information, but the results of (iv)(B) are harder to interpret.

(iv)(A) Construct a scatterplot of time versus weight with CartType as the plotting symbol. Does this picture give a clear idea of how the relationship between time and weight various across cart type? Also, construct a scatterplot of the three WITHIN-CART-TYPE regression lines on the same plot, using cart type as the plotting symbol. Do either of these plots give a clear idea of how the relationship between time and weight various across cart type? Do these conclusions affect your answer to part (ii)? (Hint: See the last analysis in AnCova.sas on the Math 475 Web site.)

(iv)(B) Find the (Pearson) correlation coefficients for time (Y) versus Weight within each cart type. (That is, run proc corr with by Carttype.) Do the within-cart-type correlation coefficients vary? Also, construct a scatterplot of time versus weight with cart type as the plotting symbol. Do these conclusions affect your answer to part (ii)? (Hint: See AnCova.sas on the Math 475 Web site.)

Top of this page