Unfortunately, even a large number of sequences from a single
genetic locus or type of gene does not have sufficient statistical power,
so that one needs sequence data from the two species from many different
genetic loci or types of genes (for example, 34 loci or 54 loci).
Classical statistical methods (``maximum likelihoods'') are not well
behaved for this much data of this type. A newer statistical method
called Markov Chain Monte Carlo (MCMC) is effective and does produce
results. The disadvantage is that MCMC methods can take many hours of
computer time on a fast computer as opposed to milliseconds for classical
statistical methods, but classical statistical methods do not work in
this case.
Some references are
1. Sawyer, S. A. and D. L. Hartl (1992) Population genetics of
polymorphism and divergence. Genetics 132, 1161--1176.
PDF file
(This derives the basic statistical model: General speaking, biology
journals do not like mathematical derivations, but this one allowed us to
put a mathematical proof in an Appendix.)
2. Hartl, D. L., E. N. Moriyama, and S. A. Sawyer (1994)
Selection intensity for codon bias. Genetics 138, 227--234.
(This applies the same theory to a slightly different problem, namely
the tendency for different DNA variants to show the effects of selection
even though they produce exactly the same gene product.)
3. Bustamante, Carlos, Rasmus Nielsen, Stanley A. Sawyer,
Kenneth M. Olsen, Michael D. Purugganan, and Daniel L. Hartl (2002) The
cost of inbreeding in Arabidopsis. Nature 416, 531--534.
PDF file
(This applies the MCMC theory to a simple model of selection in which
all new mutations of genes of a particular type (for example, for a
particular enzyme) are either (i) immediately lethal or nearly lethal,
and so can be ignored, or else (ii) have exactly the same selective
advantage or disadvantage. This model is not realistic, but the single
estimated selection coefficient for a particular gene might be an average
selection coefficient of some kind for that gene. The conclusion was that
two Drosophila species appeared to be positively evolving but that two
weedy species (Arabidopsis) were going downhill.)
4. Sawyer, Stanley A, Rob J. Kulathinal, Carlos D. Bustamante,
and Daniel L. Hartl (2003) Bayesian analysis suggests that most amino
acid replacements in Drosophila are driven by positive selection. Journal
of Molecular Evolution 57, S154--S164.
PDF file
(This paper generalizes the model in the previous reference so that, as
the mutations occur, the selective advantages of nonlethal mutations are
normally distributed with a mean that depends on the particular enzyme.
The variance of the normal distributions is assumed to be the same for
all loci. The model was applied to a subset of the Drosophila data.
One of the conclusions was that while only about 20% of new,
nonlethal, mutations were beneficial, 48% of mutations that were
polymorphic in the sample were beneficial, and 94% of mutations that
became fixed in the entire population were beneficial. This suggests that
the more pessimistic view of the downhill evolution of large populations
is incorrect, at least for Drosophila.)
Last modified June 23, 2004