HOMEWORK #4 due 11-8
Table 1: Zubricity and Covariates ----------------------------------- OBS Zubric Drubn Visc Speed ----------------------------------- 1 310 16 27 12 2 210 17 36 10 3 450 24 40 20 4 390 24 44 15 5 780 26 44 8 6 330 28 53 18 7 580 39 55 19 8 330 22 56 24 9 400 29 57 16 10 230 28 58 17 11 470 34 60 24 12 510 35 61 17 13 490 37 66 20 14 450 36 68 11 15 630 46 73 21 16 400 38 78 6 17 760 34 80 22 18 590 47 83 17 19 520 43 84 12 20 540 44 89 17
proc reg
or proc glm
) to find out.
What is the model P-value? What is the model R2? What is the
value of the F-statistic that led to the model P-value? How many degrees
of freedom does it have in its numerator and denominator?
proc reg
or else one run of proc
glm
plus a proc print
for associated variables.)
2. A manufacturer tests the hardness of 30 alloys as a function of the amount of 5 different additives. The hardness, the date, and the amounts of the 5 additives for each of the alloys are given in Table 2.
Table 2: Alloy Hardness as a Function of Five Additives -------------------------------------------------------- Hardness Date AA BB CC DD EE ------------------------------------------------------ 2190 Mon 9.9 190 488.53 1464 4 2475 Mon 10.8 305 372.95 1623 3 2185 Tue 17.6 375 309.80 3106 2 1964 Tue 2.1 261 433.25 113 4 2115 Tue 14.6 199 529.34 2145 3 1721 Tue 12.4 536 147.51 1717 2 2217 Wed 13.0 322 357.55 2479 3 2879 Wed 18.4 311 413.35 3124 2 2523 Thu 11.2 265 452.12 2144 3 2003 Thu 8.3 393 300.76 1308 3 2733 Fri 19.3 213 522.37 3357 3 2866 Fri 26.2 315 399.55 4778 1 2295 Fri 5.2 343 418.58 477 4 1994 Fri 12.7 249 426.77 1956 3 2092 Fri 12.2 281 419.65 1587 3 2345 Mon 19.1 292 433.03 2897 2 2788 Mon 22.7 269 441.17 3958 2 2595 Mon 21.0 416 321.95 3284 1 2268 Tue 21.5 394 324.71 3906 2 3032 Tue 23.8 393 292.20 3739 1 2875 Tue 26.7 282 396.95 4777 1 2765 Tue 14.0 123 635.74 2243 4 1900 Tue 4.8 440 254.23 580 3 1874 Tue 15.0 395 335.37 2270 2 2132 Tue 9.4 318 375.88 1074 3 2125 Tue 11.6 396 306.15 1479 2 2145 Tue 12.2 381 334.74 2093 2 2775 Wed 21.4 319 368.25 3094 2 1979 Wed 9.6 470 236.03 1513 2 2292 Wed 15.9 270 442.34 2280 3
Hardness
on the 5
additives? What is the Model P-value? What is the
Model R2 ? What additives are significant in
the Type I SS table? in the Type III table? (Hint: Use proc
glm
.)
proc reg
with /
selection=adjrsq
.)
Hardness
on the 5
additive variables. What variables does SAS choose for the regression?
Are these the same as in part (ii)? (Hint: Run proc
reg
with / selection=backward
.)
Hardness
on the
5 additive variables. What variables does SAS choose for the regression?
Are these the same as in part (ii)? (Hint: Run proc
reg
with / selection=stepwise
.)
Hardness
on the
covariates for the model with the largest adjusted R2? What is
the Model P-value? What is the Model R2 ? How
does it compare with the Model R2 for the full
model? What additives are significant in the Type I SS table? in the Type
III table?
3. Annual reports for the 10 largest US corporations in 1990 are given in Table 3.
Table 3 - Data for the 10 largest US Corporations in 1990 # Source: Fortune Magazine (April 23, 1990) p346-367 Co 1990 Time Inc. # All numbers are in millions of dollars. Sales Profits Assets General_Motors 126974 4224 173297 Ford 96933 3835 160893 Exxon 86656 3510 83219 IBM 63438 3758 77734 General_Electric 55264 3939 128344 Mobil 50976 1809 39080 Philip_Morris 39069 2946 38528 Chrysler 36156 359 51038 Du_Pont 35209 2480 34715 Texaco 32416 2413 25636(i) Do a Principal Components Analysis for these 10 corporations to explain the variability of the financial data in Table 2. How many principal components are required to explain at least 90% of the variation in the data?
(ii) As one might have expected in advance, the first Principal
Component (Prin1
) is a measure of the overall size of the
corporation, since larger (or smaller) corporations are likely to have
larger (or smaller) amounts of sales, profits, and assets. Thus, in this
case, one is primarily interested in the proportion of variation that is
explained by Prin2
after the variability due to
Print1
is accounted for.
Prin2
says about the financial condition of these 10
corporations over the year, sort and display the list of companies by
descending values of Prin2
. Include profits
,
sales
, and assets
in the display, in that order.
Which companies are at the top of the list? at the bottom of the list?
What can you say about what caused them to be at the top or bottom of this
sorted list?
(iii) The analysis in part (ii) might be criticized on the grounds
that the analysis, which depends on quadratic sums of various quantities,
might be dominated by the largest companies. Note that Sales varies by
nearly a factor of four in Table 3 and Assets varies by more than a
factor of six. Repeat the analysis with Sales, Profits, and Assets
replaced by their logarithms in an attempt to ameliorate this problem.
(Use SAS commands logvar=log10(var)
instead of
logvar=log(var)
to get base-10 logarithms, so that displays
will be more intuitive.)
Prin2
for the
log-transformed data similar to the sort in part (ii)? Are the top
five companies the same?
4. A biologist is interested in the population structure of a particular lizard (Cophosaurus texanus). Data with three different measurements from 25 lizards in this species are collected (see Table 4). The biologist would like to use these data to show that the lizards in this species are highly variable in shape, perhaps in response to specialization to different subhabitats within the home range of the lizard.
Table 4 - Dimensions of a sample of 25 lizards # From Johnson&Wichern, ``Applied Multivariate Statistical Analysis'', # 5th ed, Table 1.3, p17, 2002 # Source: J&W say, data courtesy of Kevin E. Bonine # Mass is in grams. SVL (snout-vent length) and HLS (hind-limb span) # are in millimeters. Obs Mass SVL HLS 1 5.526 59.0 113.5 2 10.401 75.0 142.0 3 9.213 69.0 124.0 4 8.953 67.5 125.0 5 7.063 62.0 129.5 6 6.610 62.0 123.0 7 11.273 74.0 140.0 8 2.447 47.0 97.0 9 15.493 86.5 162.0 10 9.004 69.0 126.5 11 8.199 70.5 136.0 12 6.601 64.5 116.0 13 7.622 67.5 135.0 14 10.067 73.0 136.5 15 10.091 73.0 135.5 16 10.888 77.0 139.0 17 7.610 61.5 118.0 18 7.733 66.5 133.5 19 12.015 79.5 150.0 20 10.049 74.0 137.0 21 5.149 59.5 116.0 22 9.158 68.0 123.0 23 12.132 75.0 141.0 24 6.978 66.5 117.0 25 6.890 63.0 117.0(i) Do a Principal Components Analysis for this sample of lizards to explain the variability of the data in Table 4. How many principal components are required to explain at least 90% of the variation in the data? What percentage of the variation is explained by the first principal component?
(ii) Construct a Prin2*Prin1
plot of the data in Table 4 with
the observation number next to each point in order to illustrate the data.
Note that the scale of Prin2
is more compressed than that of
Prin1
. What are the Observation numbers, as measured by
Prin1
, for the largest and smallest lizards in this plot?
plot Y*X='*' $ Obs;
in
proc plot
will put the value of Obs
next to each
point in the scatterplot.)