HOMEWORK #2 due 10-19
Arrange your answers in three parts in the following order:
Part I: Your answers to all questions, either written by hand or using a
word processor,
Part II: The SAS programs (*.sas files) that you used for all problems in
which you used SAS
Part III: The output from the SAS programs in Part II.
For all problems in which you use SAS, either copy or transcribe answers from the SAS output to Part I or else refer in Part I to specific pages in Part III by saying (for example) ``The scatterplot or matrix for Problem 3 is on page 17 of the SAS output (Part III).'' Make sure that you have consecutive page numbers on the SAS output in Part III by adding your own page numbers to the SAS output if necessary, so that (for example) you don't have several different page 1s in Part III. If you like, you can number pages as (for example) ``Page 3-2'' for the second page of output for Problem 3.
Do problems 1-4 by hand, and problems 5-6 using SAS.
1. (10) Let X_1 and X_2 be two real-valued random variables. Show that
Cov(X_1-X_2, X_1+X_2) = Var(X_1) - Var(X_2)
2. (10) Let A=A' be a 2 x 2 symmetric matrix with
tr(A)>0 and det(A)>0. Prove that A is positive definite.
(Hint: Use the spectral decomposition of A.)
3. (20) Let X = (X1 X2 X3)' be a random vector in R3 with (vector) mean E(X)=0 and covariance matrix
( 3 -4 1 ) Cov(X) = ( -4 10 -2 ) ( 1 -2 3 )Let Y = X1 + 2X2 + 3X3 and Z = X2 +4X3 . Recall that Var(Z) means the variance of Z.
4. (20) Assume that X = (X_1 X_2 X_3)' is a vector-valued normal random variable with distribution N(mu_X, B) where
( 5 ) ( 0 0 0 ) mu_X = ( -3 ) and B = ( 0 2 3 ) ( -2 ) ( 0 3 5 )Consider the random vector Y = A X for
A = ( 1 2 3 ) ( 0 1 2 )(i) What is the dimension of Y? That is, if Y is R^r-valued, what is r?
5. (20) Use Proc IML
in SAS to do the following:
(i) Define and display a 40x6 matrix X whose entries are realizations
of independent normally-distributed random variables with mean zero and
variance one. The 40x6=240 displayed values should be mostly in the range
-2 to 2 with a few values outside that range.
(Hints: To start proc iml
, just enter proc
iml;
and begin using proc iml
commands, as in
ThreeRegIml.sas
or MLizards.sas
on the
Math 439 Web site. In proc iml
, the command B=J(m,n)
generates a mxn matrix all of whose entries equal one. If Y is any mxn
matrix, W=normal(Y)
generates an mxn matrix W whose entries
are realizations of independent normally-distributed random variables with
mean zero and variance one. Thus W=normal(Y)
depends on Y
only on its dimensions, although it also uses the value Y[1,1] (which you
can change) as the starting seed of its random numbers.
That is, consecutive runs of the program with the same starting seed
Y[1,1] will yield identical random numbers. Setting Y[1,1]=0 tells SAS to
seed its random numbers from the system clock, so that consecutive runs
will give yield different results.)
(ii) Show theoretically that the matrix W=X'X is an instance of a Wishart
distribution W(6,40,I_6) (or W_6(40,I_6) in the textbook's notation).
(Hints: Do this directly, or else use Problem 7(ii) on
HomeWork #1. The beginning of Section 9, page 19, in the
Multivariate Linear Models handout on the Math 439 Web site has a
clearer definition of a Wishart distribution than does the text.)
(iii) Display the matrix W=X'X. As noted in part (ii), this is
a 6x6 matrix that is an instance of a Wishart distribution W(6,40,I_6). In
particular, the diagonal elements will be independent realizations of a
chi-square distribution with 40 degrees of freedom while the off-diagonal
terms will be generally smaller.
(iv) Find and display the 6x6 correlation matrix Q of the columns
of X.
(Hint: See the proc IML
code at the end of
ThreeRegIml.sas
or in MLizards.sas
on the
Math439 Web site. The file ThreeRegIml.sas
has been updated
since it was first handed out. As a check, the diagonal elements of Q
should be all 1s and the off-diagonal terms should be in the range of -1
to 1.)
(v) Find and display the 6 eigenvalues of Q. (Hint: Note the use of
the function eigen(evals,evecs,...)
at the end of the
proc IML
code in ThreeRegIml.sas
on the
Math 439 Web site.)
(vi) Compute la_max/la_min, where la_max is the largest and la_min is the
smallest of the 6 eigenvalues. (You can do this part by hand.)
(Remark: If you have done this correctly, then la_max/la_min should
be somewhere in the range of 2 to 8 or nearby. For many real data sets
with more than 3 or 4 covariates, the value of la_max/la_min is much
larger than this. This is an indication that the true dimensionality of
many multidimensional data sets is much smaller than the actual number of
covariates.)
6. (20) Table 5.5 (page 150) in the text has four measurements on
m=19 beetles from the flea beetle species Haltica oleracea and
corresponding measurements from n=20 beetles of another flea-beetle
species, H. carduorum. (See also the data file
FleaBeetles.dat
.)
(i) Use SAS to carry out the Hotelling T^2 test for all four measurements
y_1,y_2,y_3,y_4 to test the hypothesis H_0:E(X)=E(W), where X_1,...,X_m
(each in R^4) represent the measurements from m=19 beetles from
H. oleracea and W_1,...,W_n the measurements from the second
flea-beetle species. Do you accept or reject H_0?
(ii) From the output, what is the value of the associated F statistic
for the multivariate test? What is the number of degrees of freedom, both
in the numerator and in the denominator? How were these derived from the
number of components in the observations (d=4) and the sample
sizes (m,n)?
(iii) Carry out two-sample t-tests on the four measurements
y_1,y_2,y_3,y_4 individually. Which of these are significantly different
between the two samples? What are the two-sided P-values?
(Hints: See MLizards.sas
on the Math439 Web site. Do
not log-transform the data. If you use proc format
to assign
descriptive tags to the Species variable (=1,2), make sure that you use
the correct species names. See Section 5.4.2 (page 122) in the
text and Section 10 in the Multivariate Linear Models notes for the
relationship between a Hotelling T^2 statistic and its associated F
distribution.)
(Warning: Make sure that SAS reports that you have
measurements for 39 individual beetles.)