Homework #12, Math 320, Spring 2001
Name:____________________________
Section:____
Math 320 Homework #13 --- Due 4/27
Include your name, section number, and homework number on every page that
you hand in. Enter ``Section 1'' for the morning class (10-11AM) and
``Section 2'' for Professor Sawyer's class (12-1PM).
Begin the exposition of your work on this page. If more room is needed,
continue on sheets of paper of exactly the same size (8.5 x 11 inches),
lined or not as you wish, but not torn from a spiral notebook. You should
do your initial work and calculations on a separate sheet of paper before
you write up the results to hand in.
Output from Excel must have your name and the homework number in
cell A1.
All problems count 15 points.
1. Twenty employees in a factory are rated according to their job
satisfaction (Y) and years of service (X). Their job type ---
A for hourly employees and B for managerial --- was also recorded. The
data are
(Y) (JobType) (X)
25 A 10.3
28 A 11.3
22 A 10.0
26 A 8.6
23 A 9.9
26 A 10.7
22 A 10.3
25 A 11.7
24 A 10.2
27 A 11.7
15 B 9.1
28 B 11.1
25 B 9.7
29 B 11.6
32 B 11.1
29 B 10.5
29 B 11.3
32 B 11.7
31 B 10.9
31 B 11.2
(i) Use Excel to construct scatterplots for years of
service (X) versus job satisfaction (Y) for each job type (one
scatterplot for A and one for B). Do the slopes of the best
lines through the plots appear to be the same?
(ii) Introduce a dummy variable for job type and use Excel to
analyze a regression of Y on job type and X. What is the
model R2 for this regression? What is the P-value of the
model test? How many degrees of freedom does the F-statistic have for
the model test (both numerator and denominator)? Which of the two
variables have coefficients that are statistically significant? What are
their P-values?
(iii) Analyze the same regression with an interaction term added.
That is, use Excel to analyze the regression of Y on job type, X, and
job type * X. What is the model R2 for this regression?
What is the P-value of the model test? How many degrees of freedom does
the F-statistic have for the model test (both numerator and
denominator)? Which of the three variables have coefficients that are
statistically significant? What are their P-values?
(iv) Write down the estimated regression line for Y versus X within
group A and the estimated regression line within group B. (Each line
will be of the form Y = C0 + C1*X for particular numbers C0 and C1.)
2. This refers to the data about grandfather clocks on page 613 of
the text:
(i) For the regression in problem 13.35, what is the model
R2? What is the estimate of the standard deviation of the
error terms? Which variables in the model have coefficients that are
statististically significantly different from zero? What are their
P-values? (Hint: This can be answered from the Minitab output on
page 614.)
(ii) Answer the questions in problem 13.36 on page 613.
(iii) For the regression in problem 13.37, what is the model
R2? What is the estimate of the standard deviation of the
error terms? Is it smaller than before? Which variables in the model
have coefficients that are statististically significantly different from
zero? What are their P-values? (Hint: This can be answered from
the Minitab output on page 615.)
(iv) Answer the questions in problem 13.38 on page 613.
(v) For the regression in problem 13.39, what is the model
R2? What is the estimate of the standard deviation of the
error terms? Is it smaller than before? Which variables in the model
have coefficients that are statististically significant? What are their
P-values? (Hint: This can be answered from the Minitab output on
page 616.)
3. Consider a regression of a variable Y on four covariates X1, X2, X3,
and X4:
Y X1 X2 X3 X4
291 43 167 279 39
354 40 173 228 29
333 53 167 214 29
301 44 166 210 29
100 15 100 169 19
192 39 171 156 19
138 17 111 217 29
280 42 179 216 27
201 45 160 217 29
392 70 231 221 31
184 44 179 100 10
221 38 168 151 20
297 43 172 211 29
300 40 171 213 28
166 42 167 162 19
355 70 240 219 29
503 100 288 215 29
318 42 169 221 29
185 10 114 216 29
269 43 153 222 29
(i) Use Excel to analyze the regression of Y on the four covariates
X1, X2, X3, X4 together. What is the model R2 for the
regression on four variables? What is the P-value of the model test? How
many degrees of freedom does the F-statistic have for the model test?
What are the P-values of the coefficients in the regression function
corresponding to X1, X2, X3, and X4? Do these results seem paradoxical
to you?
(ii) Use the CORRELATION function in the Excel Data Analysis Toolpak
to find the correlation matrix for the five variables Y, X1, X2, X3,
and X4. Which pairs of the four variables X1, X2, X3, X4 are highly
correlated with one another? (For definiteness, say that two variables
X, Y are `highly correlated' if |r|>0.85 for their sample
correlation coefficient r, `nearly uncorrelated' if |r|<0.20,
and `moderately correlated' otherwise.)
(iii) Now use Excel to analyze the regression of Y on the two
covariates X1 and X4. What is the correlation coefficient between
X1 and X4? What is the model R2 for this regression on
two variables? What is the P-value of the model test? How many degrees
of freedom does the F-statistic have for the model test?
What are the P-values of the coefficients in the regression function
corresponding to X1 and X4? Are these results more consistent with the
model-test P-value than was the case for the regression on four
variables?
4. The lengths of trout caught in four mountain lakes were recorded in
millimeters as:
Blue Lake 139 149 157 159 162 182 206
Clear Lake 146 168 175 217 224
Crystal Lake 175 197 203 215 215 224 232
Black Lake 193 205 208 228 253
(i) Use Excel or a TI-83 to test the hypothesis H0 that
the means of the trout caught in the four lakes are the same. (That is,
H0: mu1=mu2=mu3=mu4.) What is the P-value? Do you accept or
reject H0 at alpha=0.05? How many degrees of freedom does the
F-statistic have that this test is based upon?
(ii) Which pairs of lakes out of the six possible pairs differ
significantly in the lengths of the trout caught? What are the
corresponding P-values? Use the test statistic
T = (Ximean-Xjmean/(root(MSError)*root(1/ni + 1/nj))
to test the differences between the ith and jth
sample means instead of pairwise Student t-tests. (The statistic T uses
the data for all four lakes to estimate the standard deviation, not just
the data in the two samples. If mui=muj, then T
has a Student's t distribution with the same number of degrees of
freedom as in MSError.)