GENECONV: Statistical Tests for Detecting Gene Conversion -- Version 1.81
Table of Contents
1. INTRODUCTION AND OVERVIEW
(table of contents)
The theory behind GENECONV as well as the details of using GENECONV are
explained below, along with many examples. Most of the examples are
included on the Web site as example input and output files. See A quick start: program input and output below if you
want to starting using GENECONV immediately. However, you should also
look at Assessing significance: pairwise and global
P-values at least briefly to see the difference between global and
pairwise fragments. (Global fragments have P-values that are
multiple-comparison corrected for all possible sequence pairs, while
pairwise fragments do not. Global fragments are more important than
pairwise fragments. Low P-values for pairwise fragments might be due to a
large number of pairwise comparisons. Both P-values are naturally
corrected for sequence length.)
FINDING GENE CONVERSION EVENTS:
ASSESSING SIGNIFICANCE:
(Karlin and Altschul 1990, Altschul 1993). If the BLAST score is large,
this is approximately equal to exp(-BLAST score).
INNER AND OUTER FRAGMENTS
(table of contents)
A QUICK START: PROGRAM INPUT AND OUTPUT
(table of contents)
in a command-line window (sometimes called a ``Dos prompt'' window). The
program GENECONV must be in either the default directory (so that you
can see it if you enter ``dir'') or else in a directory on the
system path. The sequence file myseqfile.seqs must be in the
default directory. This command will analyze myseqfile.seqs
using GENECONV's default options, which are generally reasonable.
Options can be added on the command line with prefixes ``/'' or
``-'' . These can either precede or follow the sequence
file name and can be in any order. (See examples below.)
GROUP STRUCTURE: WITHIN-GROUP FRAGMENTS ONLY
(table of contents)
POLYMORPHISMS AND POWER
(table of contents)
COMPARISON WITH PREVIOUS PROGRAMS:
2. USING GENECONV
(table of contents)
A FIRST EXAMPLE
(table of contents)
(See A quick start: program input and output
above for the mechanics of using GENECONV and GENECONV_HELPER.)
PAIRWISE AND GLOBAL P-VALUES
(table of contents)
where /n0 means no permutations. With no permutations, the
program runs in around 0.10'' (a tenth of a second) on a P400 Pentium
machine, as opposed to 10'' with 10,000 permutations. The output in this
case is written to alm1.frags with log output to
alm1.sum . We used a different output file name here rather
than overwrite the previous output for alumw.asf in
alumw.frags . The option /w123 was not entered
here since GENECONV does no randomization in this case, so that
initializing the random number generator has no effect.
The long form for the last option is -MaxKAGlobalPval=0.15.
Conditions on KA P-values for global fragments can be removed entirely by
/mkg2, where 2 can be replaced by any number that is 1 or
greater. The option /mkg2 in long form is
-MaxKAGlobalPval=None. You can substitute ``Mc'' or
``Glob'' for ``Global'', where ``Mc'' stands
for ``Multiple-comparison (corrected)''. Thus GENECONV will also
understand -MaxKAMcPval=0.01. See options for
P-value bounds in the Options List below for
the options for specifying bounds for pairwise KA P-values and
permutation P-values.
The option -listbest (short form: /lb) tells GENECONV
to ignore P-values and list the 8 most significant global fragments
overall. If pairwise lists are specified, the 3 most significant
fragments are listed for each pair of sequences. This can be useful if
you are not sure how many significant fragments there are and if you
want to make sure that you obtain at least some output. It will be
obvious from the output whether any of the listed fragments are
significant.
CHECKING GENECONV
(table of contents)
The option -Randomize_sites tells GENECONV to permute the
polymorphic sites once before proceeding, which should destroy most
significance. (For safety, this option has no short form.) As before,
/w123 initializes the random-number seed at 123 for
reproducibility, and rantest names the output files. The main
output is written to rantest.frags and the log file is
rantest.sum .
The sample input file rand950.fasta has randomized data for a
nucleotide alignment with 500 bases, 9 sequences, and 50 polymorphic
sites. The output shows that the most significant global fragment has
permuation P-value P=0.380, which is not significant. There is only one
significant pairwise fragment with a pairwise permutation P-value of
P=0.0183. Since there are 9*8/2=28 sequence pairs with 9 sequences, this
is consistent with randomness.
AN EXAMPLE WITH GROUPS OF EQUAL SIZE
(table of contents)
Enter /b3 instead of /b2 to tell GENECONV to consider
consecutive triples of sequences in alumw.asf as groups and
look only for within-group fragments. (Since 3 does not divide 14, this
will cause GENECONV to exit with an error message.) The long form of
/b2 is -Blocsize=2. Output in this case is written to
alumwbc.frags and alumwbc.sum .
(The long-form of /h is -Homologous.) With the same
arrangement of sequences within alumw.asf, this tells GENECONV
to consider two groups of 7 sequences each (one for each locus) and to
look for within-locus gene conversion only. No globally significant inner
fragments are found.
CONFIGURATION FILES
(table of contents)
now produces exactly the same output as the last displayed GENECONV
command line. (The output is not quite the same, since the command line
and the contents of all configuration files are written to the heading
of alumwh2.frags . However, the contents of
alumwh2.frags are the same after the heading.)
(/Mig0.01 is the short form for
-MaxSimGlobalPvalues=0.01.) If you entered /mig0.01
before myofpts.cfg on the command line, then /mig0.01
would be overwritten by -MaxSimGlobalPvalues=0.25 in
myopts.cfg and would have no effect.
GROUPS OF UNEQUAL SIZE
(table of contents)
then GENECONV will analyze alumw.asf looking for within-group
fragments only. In this case, GENECONV finds one globally significant
within-group inner fragment and two pairwise significant within-group
fragments. The only globally significant fragment is the same
spider-monkey fragment, which now has global permutation P=0.0067 and
Bonferroni-corrected KA P=0.053. The multiple-comparison corrected
P-values are based on the 17 possible within-group sequence comparisons
for three groups of sizes 2, 5, and 4.
CAULIFLOWER MOSAIC VIRUSES
(table of contents)
where the configuration file camv.cfg is
MISMATCH PENALTIES
(table of contents)
Here MM is rounded up to the nearest integer, and both ``Ndiff''
and ``Npoly'' are assumed positive. If gscale=0, then mismatches
within fragments are not allowed, which is equivalent to assuming an
infinite mismatch penalty. The value of gscale must be either
zero or a positive integer. Smaller positive gscale values mean
smaller mismatch penalties, and larger positive gscale values
mean more severe penalties. If gscale>0, the entire alignment
(viewed as a fragment) has a negative or zero score.
The long form of /g1 is -gscale=1 . (See options for determining fragment lists below.) We
use a different name for the output and log files here so as not to
overwrite previous output.
FANCIER OUTPUT
(table of contents)
The options -ExpFormat and -WideCols have no effect on
the Sim Pvalue column. If both commands are entered, then
-ExpFormat is used.
then the global inner fragment list becomes
SPREADSHEET OUTPUT
(table of contents)
SILENT SITES AND AMINO ACIDS
(table of contents)
GENECONV finds two significant global inner fragments (P<0.05), both
with P<0.03. Only one of these fragments is significant (P<0.05)
using the global MCF P-values of Sawyer (1989). (See
Comparison with previous programs: VTDIST and VTDIST3). In addition,
GENECONV finds 6 inner fragments with pairwise P-values of 0.05 or less.
See the sample output file gnd7.frags .
tell GENECONV to look for significant fragments with mismatches and
mismatch penalties given by Gscale=1. GENECONV now finds
three (3) significant global inner fragments, all with P<0.03.
However, GENECONV finds 5 pairwise significant fragments as opposed to 6
with Gscale=0 (P<0.05), so that allowing mismatches does not
always guarantee finding more significant fragments.
the most significant global inner fragment has P=0.170. When the analysis
is repeated with Gscale=1 --- that is, with the additional
command-line option /g1 --- there is one marginally significant
global inner fragment (P=0.076). See the sample output files
gnd7.frags, gnd7g1.frags,
coxprot.frags, and coxprotg1.frags for the
details.
AN IMMUNE-SYSTEM EXAMPLE:
then GENECONV finds one globally significant inner fragment (P=0.0011).
This is the fragment identified by Takahata (1994). This fragment is also
the only pairwise significant inner fragment, also with P=0.0011. If the
MCF score is used, the most significant global fragment has P=0.263.
Both the distantly related pair S1;S2 and the close pair
S2;S3 have fragments with 6 consecutive polymorphic sites. The
sequence pair S1;S2 differs overall at 17 sites and
S2;S3 at 8 sites. The uncorrected fragment-length score is 6 for
both fragments, but a run of 6 concordant sites out of 8 sites is not
significant for S2;S3. Thus neither fragment can be significant
for the uncorrected MCF score.
then the global score table written to mhc3g1.frags is
The long fragment between S1 and S2 has the same global
and pairwise permutation P-values as with gscale=0 (P=0.0011).
With gscale=1, the global scores of a fragment with 6 matches
and no mismatches are now 9.528 for S1;S2 and 1.537 for
S2;S3. This places an even greater premium on fragments between
the more diverse sequence pair S1;S2.
but none of the four fragments individually have a pairwise permutation
P-value below 0.500. By default, GENECONV ignores fragments whose
pairwise score or polymorphism length is less than 2. The program
options above relax these constraints. There are a total of seven
fragments with these options. See Fragment
lists: Other restrictions or the Options List
below for more detail about options for controlling fragment lists.
WHEN GLOBAL (BLAST-LIKE) SCORES ARE NOT ENOUGH
(table of contents)
tells GENECONV to analyze this alignment by permuting silent codon
positions with mismatch penalties set by Gscale=1 . (See Silent sites and amino acids and Mismatch penalties.) The command -listbest
tells GENECONV to list the 8 most significant global fragments regardless
of P-value. The resulting global output in maz8act.frags is
ALIGNMENTS AT GENE CONVERSION ENDPOINTS
(table of contents)
writes an output file camvg0.frags with 4 significant global
inner fragments (P<0.05; see Cauliflower mosaic
virus above). The bases immediately adjacent to the endpoints
of both sets of fragments are written to the file
camvg0.jseqs . The file camvg0.jseqs
contains the following output for the 4 global fragments:
For each significant fragment, the first line begins with GI
(for ``Global Inner'' fragment), then lists the sequence names, then the
aligned offsets (beginning and end), the aligned length, and the global
P-value for that fragment. The next two lines display 10 bases before and
10 bases after the endpoints of the fragments, excluding indels. Here
< denotes the left endpoint of the fragment and >
the right endpoint.
Five of the 7 globally significant fragments begin with either
AAT or AAG . This may indicate a tendency for gene
conversion fragments to insert preferentially at AA sequences
in the CaMV chromosome. Chenault and Melcher (1994) remark that many
observed junction sequences in this alignment resemble junctions
observed in experimentally generated recombinants or initiation sites
for DNA or RNA reverse transcription.
3. MORE ABOUT USING GENECONV
(table of contents)
INDELS AND MISSING DATA
(table of contents)
OUTER FRAGMENTS
(table of contents)
rounded up to the nearest integer, where gscale is the gscale
value and ``Npoly'' is the number of polymorphic sites. However, ``Ndiff''
is now the total number of polymorphic sites at which the given sequence
agrees with at least one other sequence in the alignment.
SEQUENCE FILE FORMATS
(table of contents)
2. Using GENECONV
3. More about using GENECONV
4. List of GENECONV options
Literature cited
How to cite GENECONV
FRAGMENTS AND HSAPs
(table of contents)
PAIRWISE AND GLOBAL P-VALUES
(table of contents)
VTDIST AND VTDIST3
(table of contents)
# Seq Sim BC KA Aligned Offsets Num Num Tot MisM
# Names Pvalue Pvalue Begin End Len Poly Dif Difs Pen.
GI S1;S2 0.0213 0.28156 266 296 31 21 0 34 None
#
# One inner fragment listed.
This says that the only significant global fragment is between the two
sequences S1 and S2 (the two spider monkey sequences)
at offsets 266-296 in the alignment. The column heading BC KA
stands for ``Bonferroni-corrected Karlin-Altschul'' Pvalue. Note that it
is more conservative that the permutation-test P-value in this case.
Both P-values are multiple-comparison corrected. The rest of the
GI line says that the fragment contains 21 sites that are
polymorphic in the alignment and that the two sequences differ at 34
sites overall. The final three columns say that the fragment contains no
internal mismatches and that there is no mismatch penalty, which is the
GENECONV default.
# Seq Sim KA Aligned Offsets Num Num Tot MisM
# Names Pvalue Pvalue Begin End Len Poly Dif Difs Pen.
PI H1;G1 0.0491 0.13001 218 280 63 33 0 12 None
PI I1;G2 0.0476 0.06919 282 296 15 9 0 49 None
PI G1;G2 0.0299 0.04969 281 296 16 10 0 47 None
PI O1;S1 0.0213 0.04325 160 212 53 15 0 34 None
PI S1;S2 0.0010 0.00309 266 296 31 21 0 34 None
#
# 5 inner fragments listed.
The last entry has the pairwise permutation P-value for the
spider-monkey fragment. There were no significant global or pairwise
outer fragments. (See Outer fragments below.)
geneconv alumw.asf alumwh2 /b2 /h /lp /n1000 /w123
/mip2 /mig0.25 /mkp0.08
This command line is also quite cryptic due to the abbreviated short-form
options. Using the equivalent long-form expressions for the options would
be clearer, but might overflow the system command-line buffer. A solution
to both problems is to write a configuration file with these options in
their long forms and then refer to the configuration file.
#GCONV_CONFIG
alumw.asf alumwh2
-Options Blocsize=2 Homologous ListPairs
Numsims=1000 Startseed=123
% P-value conditions on fragments
MaxSimPairwisePval=none MaxSimGlobalPval=0.25
MaxKAPairWisePval=0.08 Endoptions
The name of a configuration file must end with .cfg . The
first word in a configuration file must be #GCONV_CONFIG .
That is, the file must begin with zero or more spaces, then these 13
characters, then one or more additional spaces or end-of-line
characters. The matching is case insensitive, so that
#gconv_config or #Gconv_Config will also work. The
character % in a configuration file tells GENECONV to treat the
rest of that line as a comment and to ignore it. The character
% need not be the first character on that line. If the first
character after the % is ! (as in %!...),
then the comment is also written to the output file.
#GCONV_CONFIG
-Startseed=123 -MaxSimGlobalPval=0.05
-group GRPI S1 S2
-group GRPII H1 C1 I2
-group GRPII O1 O2
-group GRPIII G1 G2 R1 R2
The two-letter names are the names of the individual Alu sequences in the
alignment alumw.asf . For example, S1 and
S2 are the two spider monkey Alu sequences, and G1 and
G2 are from a gorilla chromosome.
#GCONV_CONFIG
camv9.asf -Seq_skip C4 CX
-Options Circular Startseed=123 DumpJseqs
Endoptions
The option -Circular (short form /c) tells GENECONV
that the endpoints of the alignment are physically adjacent, and that
GENECONV should look for ``round-the-corner'' fragments as well as
fragments that are contiguous within the alignment. Two of the nine
sequences, named C4 and CX in camv9.asf ,
are discussed by Chenault and Melcher (1994) but are not considered here.
The sequence C4 is very similar to another sequence in the
alignment, and CX is very divergent from the others. The command
-Seq_skip C4 CX tells GENECONV to ignore these two sequences,
so that only seven of the nine sequences are used in the analysis. See skipping and listing sequences below for the syntax
of Seq_skip and the related command Seq_list .
# Seq Sim BC KA Aligned Offsets Num Num Tot MisM
# Names Pvalue Pvalue Begin End Len Poly Dif Difs Pen.
GI CN;CD 0.0000 0.00006 8018 CA 18 111 28 0 420 None
GI CD;CS 0.0000 0.00011 7564 7963 400 33 0 361 None
GI CD;CS 0.0157 0.02561 6734 6943 210 23 0 361 None
GI CM;CD 0.0165 0.02933 7564 7756 193 19 0 415 None
Here ``GI'' stands for ``global inner (fragment)'' and
``BC'' means ``Bonferroni-corrected''. ``CA'' is
``circle around'' for ``round-the-corner'' fragments that overlap the
endpoints. Note that the most significant fragment encircles the
alignment endpoints. For any two sequences, Tot Difs is the
total number of sites at which the two sequences differ. (These are the
``discordant sites'' in finding gene conversion
events above.) ``Num Poly'' is the number of polymorphic
sites within the fragment.
# Seq Sim BC KA Aligned Offsets Num Num Tot MisM
# Names Pvalue Pvalue Begin End Len Poly Dif Difs Pen.
GI CD;CS 0.0000 0.00000 7564 CA 18 565 67 4 361 3
GI CB;C1 0.0002 0.00035 5058 7039 1982 207 25 238 4
GI CN;CD 0.0003 0.00039 8018 CA 18 111 28 0 420 3
GI CN;CS 0.0056 0.01302 7290 CA 26 847 99 16 334 3
GI C1;CS 0.0244 0.05402 7290 7731 442 46 5 362 3
GI CM;CD 0.0486 0.09692 7564 7756 193 19 0 415 3
Note that the number of globally significant fragments has increased
from 4 to 6, and the number of highly significant global fragments
(P<0.01) from 2 to 4. The most significant global fragment has 4
internal mismatches and a mismatch penalty of 3 per mismatch. The next
most significant global inner fragment has 25 internal mismatches. There
is a new, highly significant, fragment involving two new sequences that
overlaps the endpoints. Some of the significant fragments are also
longer. Other significant fragments have disappeared because they are no
longer significant once mismatch penalties are allowed.
# Seq Sim BC KA Aligned Offsets Num Num Tot MisM
# Names Pvalue Pvalue Begin End Len Poly Dif Difs Pen.
GI CD;CS 0.0000 1.56e-06 7564 CA 18 565 67 4 361 3
GI CB;C1 0.0002 3.48e-04 5058 7039 1982 207 25 238 4
GI CN;CD 0.0003 3.91e-04 8018 CA 18 111 28 0 420 3
GI CN;CS 0.0056 1.30e-02 7290 CA 26 847 99 16 334 3
GI C1;CS 0.0244 5.40e-02 7290 7731 442 46 5 362 3
GI CM;CD 0.0486 9.69e-02 7564 7756 193 19 0 415 3
This gives more information about highly significant KA P-values. If an
additional two digits of accuracy in the non-exponential format would be
enough, you can enter -WideCols:
# Seq Sim BC KA Aligned Offsets Num Num Tot MisM
# Names Pvalue Pvalue Begin End Len Poly Dif Difs Pen.
GI CD;CS 0.0000 0.0000016 7564 CA 18 565 67 4 361 3
GI CB;C1 0.0002 0.0003481 5058 7039 1982 207 25 238 4
GI CN;CD 0.0003 0.0003912 8018 CA 18 111 28 0 420 3
GI CN;CS 0.0056 0.0130236 7290 CA 26 847 99 16 334 3
GI C1;CS 0.0244 0.0540173 7290 7731 442 46 5 362 3
GI CM;CD 0.0486 0.0969171 7564 7756 193 19 0 415 3
# Seq Sim BC KA Aligned Offsets In Seq1 In Seq2 Num Num Tot MisM
# Names Pvalue Pvalue Begin End Len Begin End Len Begin End Len Poly Dif Difs Pen.
GI C1;CB 0.0002 0.00035 6000 7981 1982 5943 7924 1982 5940 7920 1981 207 25 238 4
GI C1;CS 0.0244 0.05402 8232 8673 442 8175 8596 422 8177 8589 413 46 5 362 3
GI CD;CM 0.0486 0.09692 8506 8698 193 8415 8605 191 8430 8620 191 19 0 415 3
GI CD;CN 0.0003 0.00039 8960 CA 960 111 8867 CA 960 109 8882 CA 960 109 28 0 420 3
GI CD;CS 0.0000 0.00000 8506 CA 960 565 8415 CA 960 561 8424 CA 960 561 67 4 361 3
GI CN;CS 0.0056 0.01302 8232 CA 968 847 8174 CA 968 825 8177 CA 968 816 99 16 334 3
CORRECTED SCORES CAN MAKE A DIFFERENCE
(table of contents)
# BLAST-like scores (GI) used in global comparisons (gscale=0):
# S1;S2 BLAST = 1.2321*NPOLYS + 2.8332 MaxNpolys= 6
# S1;S3 BLAST = 3.1781*NPOLYS + 3.1355 MaxNpolys= 1
# S2;S3 BLAST = 0.4055*NPOLYS + 2.0794 MaxNpolys= 6
#
# where NPOLYS is the number of polymorphisms in the fragment.
# BLAST-like scores (GI) used in global comparisons (gscale=1):
# S1;S2 BLAST = 1.1600*SCORE + 2.5677 MaxScore= 6
# S1;S3 BLAST = 3.1764*SCORE + 3.1304 MaxScore= 1
# S2;S3 BLAST = 0.2101*SCORE + 0.2766 MaxScore= 6
#
# where SCORE = (npolys-ndifs)*1 + ndifs*(-mismatpen)
# Seq Sim BC KA Aligned Offsets Num Num Tot MisM
# Names Pvalue Pvalue Begin End Len Poly Dif Difs Pen.
GI Maz81;Maz56 0.0000 0.00041 874 1008 135 27 0 103 3
GI Maz56;Maz63 0.1649 0.56483 1 879 879 187 3 20 11
GI Maz95;Maz89 0.3121 > 1.0 271 444 174 40 10 126 2
GI Maz83;Maz56 0.6474 > 1.0 859 1008 150 31 5 106 3
GI Maz56;Maz89 0.9555 > 1.0 655 867 213 40 4 63 4
GI Maz87;Maz89 0.9621 > 1.0 289 321 33 7 0 133 2
GI Maz81;Mac1 0.9948 > 1.0 163 264 102 15 2 114 2
GI Maz87;Maz63 0.9998 > 1.0 283 309 27 6 0 139 2
#
# 8 inner fragments listed.
# 895 overlapping fragments discarded.
#
# ADDITIONAL PAIRWISE FRAGMENTS with BC Pairwise SimPval < 0.05
# OR LISTED GLOBAL FRAGMENTS with significantly better BC SimPval
# Here BC SimPval is the Pairwise Sim Pvalue multiplied by 28.
# (Three pairwise fragments considered per pair.)
#
# Seq BC Sim BC KA Aligned Offsets Num Num Tot MisM
# Names Pvalue Pvalue Begin End Len Poly Dif Difs Pen.
AI Maz56;Maz63 0.0028 0.56483 1 879 879 187 3 20 11
#
# One inner fragment listed.
The second fragment listed is between sequences Maz56 and
Max63 at offsets 1-879 and has 187 silent polymorphic codon
positions with three mismatched silent codons. The remaining bases in the
range 1-879 are in amino-acid polymorphic codon positions. The final
codon position is amino-acid polymorphic, so that the fragment base range
is actually 1-876. If Seqtype=SILENT , GENECONV
treats monomorphic and amino-acid polymorphic codon positions in the same
way.
# Seq BC Sim BC KA Aligned Offsets Num Num Tot MisM
# Names Pvalue Pvalue Begin End Len Poly Dif Difs Pen.
GI Maz81;Maz56 0.0028 0.00001 878 1008 131 43 1 164 3
GI Maz56;Maz63 0.0028 0.10375 117 876 760 280 5 33 12
GI Maz83;Maz89 0.2912 0.41592 181 218 38 16 0 173 3
GI Maz56;Maz89 0.3724 > 1.0 655 869 215 74 7 95 4
# Global inner fragments (8110 aligned bases):
# (Sites with indels in both sequences are ignored.)
# Display: Before <Frag...ment> After
#
GI CN v CD (8018,CA18, len=111, Sim Pvalue=0.0000):
CN: TATACTATAA <GCTAAGGGAA....AGCCATGAAT> CGGTTTAAAG
CD: TATACTATAT <GCTAAGGGAA....AGCCATGAAT> AGGTCTATGA
GI CD v CS (7564, 7963, len=400, Sim Pvalue=0.0000):
CD: ACATTTCCAT <AATAATGTGT....AGGCCCTGTG> TAAGGTAAGA
CS: AT******** <AATAATGTGT....AGGCCCTGTG> CAAGGTAAGA
GI CD v CS (6734, 6943, len=210, Sim Pvalue=0.0157):
CD: AGGACTCATT <AAGACGATCT....TAAAAAGGTA> ATTCCTACAG
CS: AGGTCTCATC <AAGACGATCT....TAAAAAGGTA> GTTCCCACTG
GI CM v CD (7564, 7756, len=193, Sim Pvalue=0.0165):
CM: TTTTCTCCGT <AATAATGTGT....AGATCTTTGT> CGTGAATATA
CD: CATT*TCCAT <AATAATGTGT....AGATCTTTGT> GGTGAATATA
# Global outer-sequence fragments (8110 aligned bases):
# (Sites with indels are ignored.)
# Display: Before <Frag...ment> After
#
GO CJ (7564, 7602, len=39, Sim Pvalue=0.0000):
CJ: TTTTCTCCAT <AAATAATGTG....GGAAATTAGG> GTTCTTATAG
GO CN (6755, 6819, len=65, Sim Pvalue=0.0002):
CN: CCCGAGTAAT <AATCTCCAGG....TAGGACCTAA> CTGCATCAAG
GO CS (6139, 6191, len=53, Sim Pvalue=0.0052):
CS: TAAAGCCATC <GGACTTCTTA....CCTGAACCTA> GCAGTTCAGT