(The reason for the (i,ii,iii) order is to have your conclusions
first, then your SAS programs, then the SAS output on which you based
your conclusions. This will make the organization of your homework much
clearer in later assignments, which will have a larger number of more
complex problems.)
The problems:
1. Text page 19, problem #1-1.
2. Text page 19, problem #1-3, and page 64, problem #2-3.
(Do as one problem.)
Problems 3-5 depend on the following data for the 47 current employees of
Vaporlock Computer Services:
Table 1. Height (inches), weight (pounds),
and sex for 47 employees:
67 123 F 67 143 M 69 174 M 64 127 F
61 116 F 70 159 M 71 142 M 66 146 F
61 128 F 59 139 F 65 127 F 69 172 M
64 166 M 63 120 F 69 166 M 67 152 F
62 153 F 60 152 F 66 168 M 66 155 M
71 145 M 64 164 M 72 168 M 64 123 F
64 135 F 68 158 M 63 159 M 71 177 M
65 158 M 63 169 M 60 139 F 71 177 M
65 150 F 63 145 M 62 141 F 64 118 F
64 168 M 66 151 F 68 171 M 63 158 M
63 146 M 68 149 M 66 162 M 68 144 F
61 131 F 72 179 M 62 142 F
3. (i) Enter the data in Table 1 into a SAS program in a
data step with variables for height, weight, and sex. Construct a scatter
plot of heights (Y-variable) by weights (X-variable) using sex as the
plotting symbol.
Do the heights and weights appear to be correlated? (That is, do
taller individuals appear also to be heavier?) Do heights and weights
appear to be correlated within each sex; that is, for Fs only and for Ms
only?
(Hint: It may be easier (and safer) to copy and paste the
data from the Math475 Web site into your program than to enter it by
hand.)
(ii) The company's insurance company is interested in the distribution
of the employees over various height and weight categories. The state
insurance commission requires that they use the following codes for
height and weight ranges:
Height: 1: le 63 Weight: 1: le 119
2: ge 64 2: 120 to 137
3: 138 to 170
4: ge 171
where le means `less than or equal to' and ge means
`greater than or equal to'.
Using these codes, use SAS's proc freq to construct tables for
(a) heights, (b) weights, and (c) height by weight (a 2
by 4 table) using these height and weight codes. (Hint: Define new
variables htcode=1,2 and wtcode=1,2,3,4 by if--then--else statements in
the data step. See the first program in Section 1C of the text for
an example.)
4. For the data in Table 1,
(i) Are the males in this sample significantly taller than the
females? Have SAS conduct a Student's t-test to find out. What is the
P-value?
(ii) In part (i), did you use the classical t-test or the
Satterthwaite t-test? Why? (That is, pick one of the two methods and
justify it.)
(iii) What does Prob>F' = ....
mean in the output? What
hypothesis H_0 is SAS testing here? Does SAS accept it or reject it? What
is the P-value?
(iv) Analyze the same data using the Wilcoxon rank-sum test. What
is the P-value? (Use the ``chi-square'' P-value, not the
continuity-corrected P-value.)
5. For the data in Table 1,
(i) Are the height and weight of these employees significantly
correlated, as measured by a Pearson correlation coefficient? What is the
correlation coefficient? (Hint: See the description of proc corr in
Chapter 5 in the text.)
(ii) Are the height and weight of employees significantly correlated
within each sex? Use a Pearson correlation coefficient within each sex.
What are the two correlation coefficients? (Hint: If you add the option
by sex;
to SAS's proc corr
, then SAS will
stratify by sex and run proc corr
within each sex.)
(iii) Why are your answers different in parts (i) and (ii)? Can
you deduce anything from the scatterplot in the problem 3?
(iv) What statistical test did SAS perform to find the P-value of the
correlations? What standard statistical distribution is this test based
on? Does this test assume that the data is normally distributed?
(NOTE for part (ii): In general if you say ``by var;
'' in
SAS, then SAS stratifies by contiguous groups with the same value of
var
. Thus to compute within-sex correlations, you must first
sort the data by sex, so that all Fs occur together and all Ms occur
together. In contrast, ``class var;
'' in SAS will usually let
you stratify by values of var
without sorting, but proc
corr;
does not currently support ``class var;
''. Make
sure that you get valid output.)
Top of this page