Click here for Math408 homework page
HOMEWORK #1 due Tuesday Jan 27
Text references are to Hollander and Wolfe,
``Nonparametric Statistical Methods'', 2nd ed.
NOTE: In the following, ^ means superscript, _ (underscore) means
subscript, and Sum(i=1,9) means the sum for i=1 to 9.
IN THE FOLLOWING: Do Problems 1-4 by hand. Problem 5 asks you to
write a computer program.
1. See Table 3.7, p71 text, for data about bleeding times of
14 individuals before (X_i) and after (Y_i) taking 600mg of aspirin.
(i) Use a Student-t-test to test the hypothesis that there is no
difference in the before and after clotting times. Is the resulting
P-value significant (P<0.05)? highly significant (P<0.01)?
(ii) Use the sign test to test the same hypothesis. Obtain two-sided
P-values using (a) the exact distribution of the binomial in
Table A2 and (b) the normal approximation. Is the resulting P-value
significant (P<0.05)? highly significant (P<0.01)? Is it the same
as in part (i)?
(iii) Use the Wilcoxon signed rank test to test the same hypothesis.
Obtain two-sided P-values using (a) the exact distribution of the signed
rank statistic in Table A4 and (b) the normal approximation. Is the
resulting P-value significant (P<0.05)? highly significant
(P<0.01)? Is it the same as in parts (i) and (ii)?
(iv) Which of the three tests would you consider most reasonable?
Why?
2. See Table 3.11, p83 of text, for the levels of
6-beta-hydrocortisol excreted in one 24-hour day by 10 chemical company
workers.
(i) Use the sign test to test the hypothesis that the median amounts
per day can be distinguished from 175 micrograms. Obtain two-sided
P-values using (a) the exact distribution of the binomial in Table A2
and (b) the normal approximation. (Hint: Subtract 175 from each of the
observations and see if the differences can be distinguished from zero.)
(ii) Find the nonparametric estimate of the median amount per day
using the Hodges-Lehmann sign-test estimator described in
Section 3.5.
(iii) A company executive states that while about as many values in
the original data are larger than 175mug/day, the differences from
175mug/day seem to be larger for the positive values. As an alternative,
find the nonparametric Hodges-Lehmann estimator of the median based on
the Wilcoxon signed-rank test described in Section 3.2. Is the resulting
estimator larger than in part (ii)?
(iv) Which is largest among the sample mean of the values Z_i-175, the
sample median, or the Hodges-Lehmann Wilcoxon signed-rank test estimator?
Which would you trust the most?
3. It is conjectured that tropical plants of a certain
genus tend to produce more flowers at higher altitudes than at lower
altitudes. Fifteen species in this genus are known to occur at both
altitudes in a particular country. To test the conjecture, one plant
from a lowland forest and one plant from higher altitudes were collected
from each of twelve species from this genus. The number of flowers on
each plant were counted, and the results were:
Species LowAlt HighAlt Species LowAlt HighAlt
1 5 19 7 3 17
2 4 10 8 4 14
3 12 4 9 6 3
4 7 10 10 15 3
5 17 17 11 9 9
6 4 12 12 7 10
(i) Use the Wilcoxon signed rank test to test whether or not plants
from higher altitude tend to have more (or fewer) flowers than plants from
lower altitudes. What is the value of the Wilcoxon statistic T^+? What is
the associated (two-sided) P-value? Use both (a) the tables and (b) the
normal approximation. (Be sure to handle ties correctly, in particular in
the estimate of the variance. Make sure that you have no records with
Z_i=0. Recall that ties between nonzero absolute values are ignored when
using the table.)
(ii) Even though the data is from 24 different plants, why would it
be incorrect to assume that the plants from the lowlands and the plants
from higher altitude form two independent samples?
4. Change the value of X_3 in Table 3.1 on p39 of the text
(Hamilton Depression Scale Factor values) from 1.62 to 16.2.
What effect does this have on the value of Zbar=(1/9)Sum(i=1,9) Z_i
for Z_i=Y_i-X_i? (That is, compare the values of Zbar before and after
the change.) What effect does this have on the value of the
Hodges-Lehmann estimator thetahat based on the Wilcoxon signed-rank
statistic? (See Example 3.3 on page 52.) Which estimator seems
to be more strongly affected by outliers?
5. Write a short program in MATLAB that uses the salary data
in Table 3.2 (p41) of the text that does the following. Attach hard copy of
the output of your program to your homework.
(i) Displays the data in Table 3.2 as a matrix with Pair number in the
first column, Private salaries in the second column, and Government
salaries in the third column.
(ii) Computes the 12 salary differences and stores them as the fourth
column Z_i of the matrix in part (i) and displays the matrix.
(iii) For the 12 salary differences Z_i, computes and displays
(a) the sample mean Zbar of the fourth column Z_i, (b) the
sample standard deviation ss , and (c) the one-sample
t-statistic T.
(iv) Assuming that the salary differences Z_i are normally distributed
with unknown mean mu and variance sigma^2 , then, given H_0:mu=0 , the
statistic T in part (iii) has a Student-t distribution with 11
degrees of freedom. Find the two-sided P-value of H_0:mu=0 based on a
one-sample t-test. (Hint: See comments in the function file pvtdist.m on
the Math408 Web site.)
(v) The text on page 41 concludes that the two-sided P-value P=0.078
using the Wilcoxon signed-rank test. How does this compare with the value
that you computed in part (iv)?
(Hint: See sample MATLAB programs on the Math408 Web site.)
Top of this page