TAKEHOME FINAL due Wednesday May 6 by 4:30 PM
Text references are to Hollander and Wolfe, ``Nonparametric Statistical Methods'', 2nd edn.
NOTES: Hand in your homework in the order
(a) Your written answers to all problems,
with page references as needed to part (c) below,
(b) The computer source for any computer
programs that you used
(c) All output from the programs in
part (b)
This will put the emphasis on what you think the answers
should be and on your evidence for this. If a reader thinks that your
answers are reasonable, then he or she may or may not want to look at your
actual output and computer programs.
Six (6) problems. Not all parts of problems are of equal weight.
1. Failure times Y along with a predictor X were recorded in the following table for 40 trials.
Table 1: Failure times Y and a predictor X ---------------------------------------------------- X Y X Y ----------- ----------- 1. 30 3 21. 40 168 2. 21 5 22. 54 170 3. 41 5 23. 27 180 4. 83 6 24. 77 197 5. 76 9 25. 90 217 6. 89 17 26. 97 235 7. 35 22 27. 93 250 8. 78 23 28. 80 354 9. 39 27 29. 73 368 10. 38 31 30. 67 441 11. 57 31 31. 72 486 12. 98 38 32. 88 622 13. 64 40 33. 94 642 14. 34 42 34. 84 659 15. 55 56 35. 86 773 16. 44 62 36. 60 850 17. 22 64 37. 99 902 18. 56 66 38. 46 1090 19. 74 142 39. 66 1658 20. 43 159 40. 75 4032
NonParmCorr
on the Math408 Web site.)
2. Measurements of responses to five stress conditions were measured for four different brands of a particular product. Three different measurements were made for each combination of brand and stress, for a total of 5x4x3=60 observations. (See Table 2.)
Table 2: Responses under Stress for Four Brands of Products --------------------------------------------------------------------------- Stress Brand1 Brand2 Brand3 Brand4 A: 3.01,3.04,3.03 3.47,3.10,3.37 3.85,3.87,3.47 3.41,3.11,3.09 B: 2.85,2.51,2.45 3.49,3.45,3.23 3.64,3.19,3.21 3.02,3.33,3.53 C: 2.62,2.60,2.67 3.11,2.88,2.97 3.52,3.49,3.44 3.08,3.11,3.06 D: 2.63,2.64,2.51 2.83,3.15,2.81 3.21,3.65,3.22 2.96,2.97,3.11 E: 2.58,2.60,2.62 3.12,2.71,2.66 3.28,3.25,3.25 2.67,3.12,3.22
3. Consider the paired (X,Y) data in Table 1.
(i) Find the coefficients beta and mu in the least-squares
regression line Y_i=beta*X_i+mu. What is the P-value for H_0:beta=0,
assuming that the data (X_i,Y_i) are normal, using Student-t methods?
(ii) Find the coefficients beta and mu in the regression line
Y_i=beta*X_i+mu using Theil's nonparametric procedure. Given beta from
Theil's method, estimate the intercept mu as the median of the n=820 Walsh
averages of the 40 residuals. Find the P-value for H_0:beta=0 using the
large-sample approximation described in Section 9.1.
(iii) Compare the two regression lines in parts (i)
and (ii) by computing
(A) the average absolute error, which is S_1/n for S_1=Sum(i=1,n)
|Y_i-beta*X_i-mu| and
(B) the RMS error, which is the square root of S_2/n for S_2=Sum(i=1,n)
(Y_i-beta*X_i-mu)^2
Which of the two regression lines does better under criterion (A)?
under criterion (B)?
(iv) Find the coefficients beta and mu using the rank regression
method discussed in Section 9.6 in the text. Given the slope beta,
estimate the intercept mu as the median of the n=820 Walsh averages of the
40 residuals.
(v) Compare the regression line in part (iv) with the two
regression lines in part (iii). How does it compare using
criterion (A)? Using criterion (B)?
(Hint: See the programs RankRegression
,
TheilRegression
, and NonParmCorr
on the Math408
Web site.)
4. Consider the paired (X,Y) data in Table 1.
5. The following 100 observations were made of a random variable X, where were rounded to the nearest integer:
Table 3: Observations of a random quantity ----------------------------------------------- 62 36 65 28 25 80 30 51 84 17 78 29 41 65 29 25 36 28 88 23 61 36 36 41 24 83 77 24 27 71 63 50 81 60 24 64 33 29 48 30 28 68 48 23 41 20 37 74 50 27 30 36 74 25 21 19 35 69 70 40 28 57 63 24 68 73 42 76 72 60 30 60 59 28 65 69 65 37 66 32 58 67 30 39 34 75 56 78 75 73 66 75 31 66 19 84 37 82 74 61Use these observations to estimate the density of X by using a kernel density estimator based on the standard Gaussian kernel. Calculate and plot the estimated density of X using bandwidths h=1, 4, 7, and 20. Assuming that the density that generated X was smooth to begin with, which bandwidth seems to give the most reasonable estimator of the density of X? Or would you prefer a bandwidth other than these four choices? (If so, which?) (Hint: See the program
DensEst.m
on
the Math408 Web site. Note that DensEst.m
does not use any of
the Y values in its dataset.)
6. For the paired (X,Y) data in Table 1, estimate the function mu(X) in the nonlinear regression
by using a kernel smoother based on the standard Gaussian kernel. Compute
and plot estimates of mu(X) based on the bandwidths h=4, 6, and 8.
Assuming that the true function mu(X) is smooth, which of these three
bandwidthw seems to give the most reasonable estimate of mu(X)? (Or
would you prefer to use a bandwidth other than h=4, 6, or 8? If so,
which?).
(Hint: See the program NonParRegr1
on the Math408 Web
site. HOWEVER, calculate and plot mu(X) for ALL VALUES of X in the range
(for example) X=1,...,100, in each case using all observations (X_j,Y_j),
instead of calculating mu(X) only at the values X=X_j, as was done in
NonParRegr1.m
. Note that the density f(X) in
DensEst.m
was estimated at X=1,...,100 instead of just at
X=X_i. Ignore the code in NonParRegr1.m
that constructs the
linear smoother.)