HOMEWORK #1 due Thursday 9-16
Text references are to the textbook, Cody & Smith, ``Applied statistics and the SAS programming language''
NOTE: See the main Math475 Web page for how to organize a homework
assignment using SAS. In particular,
ALWAYS INCLUDE YOUR NAME in a title statement in your SAS
programs, so that your name will appear at the top of each output page.
ALL HOMEWORKS MUST BE ORGANIZED in the following order:
(Part 1) First, your answers to all the problems in the homework,
whether you use SAS for that problem or not. If the problem asks you to
generate a graph or table, refer to the graph or table by page number in
the SAS output (see below). (Xeroxing a page or two from the SAS output or
cutting and pasting into a Word file or TeX source file is also OK.)
(Part 2) Second, all SAS programs that you used to obtain the output for
any of the problems. If possible, similar problems should be done with the
same SAS program. (In other words, write one SAS program for several
problems if that makes things easier, using SAS title or
title2 statements to separate the problems in your output.)
(Part 3) Third, all output for all the SAS programs in the previous
step.
If an answer in Part 1 requires a table or a scatterplot that you need to
refer to, make sure that your SAS output has overall increasing (unique)
page numbers and make references to Part 3 by page number, such as
``The scatterplot for Problem 2 part (b) is on page #X in
the SAS output below.'' DO NOT say, ``see Page 3 in the SAS output''
if Part 3 has output from several SAS runs, each of which has its own
Page 3. In that case, either write your own (increasing) page numbers
on the SAS output, or else (for example) refer to ``Page 2-7 in the
SAS output'' (for page 7 in the second set of SAS output) and write
page numbers in the format ``2-7'' at the top of pages in your output.
1. (Similar to Problem #3-7 in the text) Some summary statistics for the occurrence of asthma and SES (socioeconomic class) are
Asthma Yes No ------------------------- LowSES 39 101 HighSES 29 137Create a SAS data set from these data and test the hypothesis of independence of rows and columns. For this table, what is the Pearson chi-square P-value? The P-value for the two-sided Fisher exact test? On the basis of these data, do you accept or reject the hypothesis that there is no association between SES and Asthma?
2. Twenty five (25) individuals volunteered for a study. Confidential identifiers for the 25 individuals are given in the following table.
Table 1. Confidential identifiers for 25 volunteers A10 B33 C22 D61 E88 F91 G21 H42 I37 J19 K90 L30 M98 N48 O11 P77 Q07 R54 S18 T31 U45 V11 W71 X76 Y32(i) Write a SAS program to randomly assign 8 of these individuals to a treatment group and the remaining 17 individuals to a control group. (Hint: See the sample program
randsamp.sas
on the
Math475 Web site, and perhaps the use of @@
as in the sample
program ctable.sas
.)
(ii) Which 8 individuals did you (or your program) assign to the treatment group? List their (confidential) identifiers in alphabetical order.
3. The following data was gathered for the 47 current employees of Vaporlock Computer Services. The individuals working at this company are considered to be odd in some respects.
Table 2. Height (inches), Weight (pounds), and Sex (Gender) for 47 employees:
67 123 F 67 143 M 69 174 M 64 127 F 61 116 F 70 159 M 71 142 M 66 146 F 61 128 F 59 139 F 65 127 F 69 172 M 64 166 M 63 120 F 69 166 M 67 152 F 62 153 F 60 152 F 66 168 M 66 155 M 71 145 M 64 164 M 72 168 M 64 123 F 64 135 F 68 158 M 63 159 M 71 177 M 65 158 M 63 169 M 60 139 F 71 177 M 65 150 F 63 145 M 62 141 F 64 118 F 64 168 M 66 151 F 68 171 M 63 158 M 63 146 M 68 149 M 66 162 M 68 144 F 61 131 F 72 179 M 62 142 F(i) Construct a scatter plot of heights (Y-variable) by weights (X-variable) using sex as the plotting symbol. Do the heights and weights appear to be correlated? (That is, do taller individuals appear also to be heavier?) Do heights and weights appear to be correlated within each sex; that is, for Fs only or for Ms only?
proc plot;
in the sample
program List1.sas
on the Math475 Web site. The scatter plot
will look better if you precede the code with options ps=40;
,
as in List1.sas
.)
(ii) Are heights and weights significantly correlated for the 47
employees? What is the P-value? Are heights and weights significantly
correlated within each sex? What are the two P-values?
(Hint: You can use proc corr;
to find the Pearson
correlation coefficient between two SAS variables as well as the P-value
for H_0:rho=0, if rho is the true correlation coefficient. Use proc
corr; by sex; ....;
to find the same information within strata
defined by sex
. (WARNING: If you use proc corr;
with ``by sex;
;'' the data set MUST FIRST BE SORTED by sex.
See the example program randsamps.sas
on the Math475 Web site
an example of sorting.) See either Chapter 5 in the text or SAS
documentation for proc corr;
. The P-values in the SAS output
are based on the fact that sqrt(n-2) r/sqrt(1-r^2)
has a
Student-t distribution with n-2 degrees of freedom if r is the Pearson
correlation coefficient between two independent normal samples.)
4. A total of 2000 observations are made of individuals that can have any of three different levels of Zubricity (A,B,C) and any of four different levels of Income. The counts are
Income 1 2 3 4 A 66 98 127 180 Zubricity B 111 136 170 228 C 168 193 240 283(i) Is there an association between Zubricity and Income in this table? Have SAS do the Pearson chi-square test to find out. What is the number of degrees of freedom? What is the P-value? How did SAS calculate the number of degrees of freedom?
(ii) What is the P-value for the the Mantel-Haenszel (trend) chi-square test? What is its number of degrees of freedom? Why is the P-value different from part (i)? What is this test designed to detect? That is, what alternative H_1 is the test sensitive to?
5. Observations are made of individuals that can have any of three different levels of Ablativeness (A,B,C) and five different ranges of height, which are referred to as the Height Index. The counts are
Height Index 1 2 3 4 5 -------------------------------- A 29 27 25 39 24 Ablativeness B 20 20 21 21 22 C 27 21 34 10 11(i) Is there an association between Ablativeness and Height Index? Have SAS do the Pearson chi-square test on the 3 by 5 table to find out. What is the P-value? What is the number of degrees of freedom? How did SAS calculate the number of degrees of freedom?
(ii) If the P-value for the Pearson chi-square test in part (i) is
significant, is the significance due to deviations from independence among
all 15 cells in the table, or does the departure from independence appear
to due to deviations at only two or three cells? If so, which cells appear
to be responsible for the lack of independence?
(Hint: The Pearson chi-square statistic Q_P is the sum of
(Obs-Expec)^2/Expec over 3x5=15 cells. If Q_P is large, it could be
because all 15 summands are large, or it could be due to a few large terms
with the remainder less than 2-3 or so. Note that if the underlying
probabilities are consistent with independence of Ablativeness and
Height Index, then each of the summands (Obs-Expec)^2/Expec should
have a distribution that is approximately chi-square with one degree of
freedom.
To have SAS display the values of (Obs-Expec)^2/Expec for each cell, use
the option cellchi2
in a table
statement in
proc freq
(as in table A*B / chisq cellchi2
).