ABSTRACT
DNA translation frames can be disrupted for several reasons, including: (i)
errors in sequence determination; (ii) RNA processing, such as intron removal
and guide RNA editing; (iii) less commonly, polymerase frameshifting during
transcription or ribosomal frameshifting during translation. Frameshifts
frequently confound computational activities involving homologous sequences,
such as database searches and inferences on structure, function or phylogeny
made from multiple alignments. A dynamic alignment algorithm is reported here
which compares a protein profile (a residue scoring matrix for one or more
aligned sequences) against the three translation frames of a DNA strand,
allowing frameshifting. The algorithm has been incorporated into a new package,
WiseTools, for comparison of biological sequences. A protein profile can be
compared against either a DNA sequence or a protein sequence. The program
PairWise may be used interactively for alignment of any two sequence inputs.
SearchWise can perform combinations of searches through DNA or protein
databases by a protein profile or DNA sequence. Routine application of the
programs has revealed a set of database entries with frameshifts caused by
errors in sequence determination.
Comparative analysis of shared characters is undertaken in all biological
fields. It is the basic activity which underpins our current understanding of
evolutionary processes. As a consequence of the growth of nucleotide and
protein sequence databases, a huge expansion has now occurred in the specific
area of molecular comparative analysis. This is deployed in the hope of
revealing functional residues in protein families, inferring function for new
genes by detecting homology with better characterized genes and for
establishing phylogenies.
Database searches to extract homologous sequences are at the heart of sequence
analysis, hence a variety of methods have been developed and applied in widely
available packages or as network servers. In general, there is a trade-off between speed and sensitivity of the algorithms. The quick wordsearch
program FASTA (
1
) and the more recent and even faster BLAST (
2
) are now the workhorses of database searching. However, because of restrictions
on opening gaps, they are found to be less sensitive than the exhaustive but
slow Smith-Waterman algorithm (
3
), which finds a local alignment between two sequences that is mathematically
optimal for a given scoring scheme. On current workstations, one can compare a
protein sequence against a protein database by Smith-Waterman. However, without a fairly powerful machine, it is impractical
to do the most exhaustive search, i.e. against all possible translations of the
DNA databases. This is highly desirable, because protein searches are more
sensitive than DNA searches, yet the protein databases have consistently under-represented the data in the DNA databases.
Insertion or deletion of one or more bases in a DNA sequence causes shifts
between the translation frames (see
4
for a review). The possibility of frameshifts in DNA means that searching all
translation frames individually, as do TFASTA and TBLASTN, is suboptimal. The
problem is severe for genomic DNA from metazoans, where introns abound: the
next exon may be in any frame and an indeterminate distance away. It also
arises whenever a base is erroneously inserted or deleted. A number of studies
(
5
-
8
) have reported that frameshift errors are uncomfortably common. For example, a
systematic study of the SWISS-PROT database (
9
) revealed that at least one in 200 sequences had severe frameshifts (
7
). The current activity in generating libraries of randomly sequenced short cDNA
sequences, known as Expressed Sequenced Tags or ESTs (
10
), has exacerbated this problem, since single gel readings are unreliable.
Undetected frameshift errors can have dramatic and deleterious consequences for
comparative sequence analysis. During multiple alignment, frameshifts cause
erroneous INDEL assignment, forcing local misalignment of other sequences.
Falsely truncated sequences may lead to domain boundary misassignment. Catalytic residues, normally absolutely conserved, may be erroneously ruled out due to
false substitution. Phylogenetic trees may acquire incorrect branching orders
and improbable branch lengths.
For all these reasons, we felt that there was a need for a versatile program
which could trace segments of amino acid sequence similarity when they were
present in more than one translation frame. Frameshift errors would be revealed
early in searches so that they would not degrade subsequent analyses. The
application should also be able to use multiple alignment-based protein profiles (reviewed in
11
), as well as single sequences.
In this manuscript, we present an algorithm for finding the optimal alignment of
a protein sequence or profile against all three DNA translation frames,
allowing frameshifts. In addition, the WiseTools program package is described,
which allows for the routine application of the algorithm in sequence alignment
and database searches. The WiseTools programs perform generalized sequence
comparisons among protein sequences, protein profiles and DNA sequences. The
use of the programs PairWise and SearchWise for dealing with frameshifts caused
either by errors or by introns is then outlined. The utility of these programs
is illustrated with a set of newly revealed database entries containing
frameshift errors.
The algorithm for protein-protein comparison is similar to the dynamic programming routines
employed by many sequence analysis programs (
12
,
13
), being a variant of the Smith-Waterman best local alignment algorithm (
3
). These algorithms all belong to the class known as minimal string edit
algorithms.
To standardize the program operations for comparisons using either a protein
sequence or an aligned sequence set, the profile concept (
14
) is employed. A profile of length
N
is a set of 20 scores, for all possible amino acids for each position 1 to
N
, in a set of one or more aligned protein sequences. Two additional scores per
position provide position-specific gap opening (GOP) and gap extension penalties (GEP). Typically,
gap penalty reductions are supplied for positions where gaps are already
observed (
14
,
15
).
The Waterman-Eggert algorithm (
16
) to extract the top
k
subalignments has also been incorporated into PairWise. This allows the program
to report repeated domains.
DNA forward frames.
For the comparison against DNA, the protein is back-translated. The concept of a
codon
profile
for a protein (or alignment) is introduced. This is a set of 64 scores for all
possible codons for each position 1 to
N
, plus the gap penalty and gap extension scores. A dynamic programming matrix is
then constructed from a DNA sequence against the codon profile. The scheme in
Figure
1
illustrates how the algorithm chooses between in-frame and jumped frame paths. The core of the algorithm is the iterative
calculation for the cell in the
i
th position down the profile versus the
j
th position along the DNA sequence (Fig.
1
). Each matrix cell has a score and a state which can be either MATCH,
PROFILEGAP, SEQGAP or FRAMEGAP. The state for each cell is the appropriate
state for the max calculation. The first four expressions of the max are the
standard in-frame start, match and two gap calculations, but with an offset of three
in the DNA dimension. Other features of this algorithm differ from more
standard dynamic matrices. First, the
j
- 2, and
j
- 1 movements cause frameshifting in the alignment. These frameshifts do
not count the shifted bases/codons in the overall score. Second, only one score
is calculated per cell, rather than a score for each different state for which
the max is then taken. This single score regime prevents the fortuitous
stringing together of matching segments with the large frame gaps allowed by
the low frame extension penalty required to jump introns. The frame jumping
behaviour is controlled by a frame opening penalty (FOP) and a frame extension
penalty (FEP). These penalties can be customized depending on the particular
alignment task.
The weighted sample variance, together with a measure of the total information
in the sequences, is used to lower GOP and GEP at INDELS as follows.
GOP and GEP are set at 100 if there is no INDEL. Otherwise,
G O P = G E P = {{1 0 0} over {{s sub {w t d}} * L o g ( 1 + sum S e q W e i g h
t s )}}
2
but if GOP > 50, it is reset to 50 and if GEP > 100, it is reset to 100.
Although heuristic, these gap penalty settings have the following desirable
properties. Gap penalties are high for short INDELs, low for long INDELs and
are lowered for alignments with many divergent sequences (where it becomes less
likely that gaps will open at novel sites). Importantly, the GEP is not lowered
at INDELs, where insertions are both rare and short. For example, single
residue gaps frequently correspond to a bulged [beta]-strand, an over- or underwound turn of [alpha]-helix or a sequencing error: in each of these
cases, it would be wrong to lower the GEP.
The profiles also include a suggested setting for the overall gap penalties that
works quite well with the automatic INDEL calculation above (for globular
protein sequences). The overall values are estimated from the mean range of the
amino acid exchange scores per position. This is helpful, as the profile matrix
values drop with increasing alignment divergence, as well as being dependent on
the given exchange matrix used. However, these suggestions should be treated as
a rough guide and the user should still fine tune the overall gap penalties for
a given family, e.g. testing 1.5-2-fold higher and lower values. For non-globular sequences, especially with a strong residue composition
bias, the defaults will be a poor guide to optimal penalties.
Modified relative mutabilities.
The amino acid exchange matrix used to build the profiles in PairWise may be
normalized so that all self-scores are the same. Each position in the matrix is normalized.
{{N o r m a l i s e d _ v a l u e} sub {i , j}} = {{v a l u e} sub {i , j}} *
{{m e a n _ o f _ i d e n t i t y _ s c o r e s} over sqrt {{{i d e n t i t y _
s c o r e} sub i} {{* i d e n t i t y _ s c o r e} sub j}}}
3
Differences in the identity scores provide a measure of the relative mutability
of each amino acid. However, given a multiple alignment, comparing the columns
shows which amino acids are conserved and which not: the column mutabilities
are in conflict with the relative mutabilities. Therefore, profiles for use in
alignment and for dotplots with PROPLOT (
15
) perform better with this normalization. However, the normalization does not
improve database searches: in this case the normalization would introduce noise
by biasing in favour of those amino acids that are both frequent and mutable,
such as asparagine, while penalizing those that are rare and poorly mutable,
such as tryptophan. Also, the normalization should not be applied to single
sequences, where there is no column mutability information. This normalization
is similar to one applied earlier (
17
) to the Dayhoff PAM 250 matrix (
18
).
The name of the package and programs reflects the concept of generalized
pairwise
comparisons between proteins, alignments and DNA translations. The package
currently consists of two components: PairWise for interactive sequence and
profile comparisons and SearchWise for database searches. SearchWise is
actually two interlinked programs, the SearchWise menus program for parameter
set up, which submits the actual database search program SWise to a batch
queue. Note that the package provides no new tools for comparing nucleotide
sequences against each other and this is not currently supported.
PairWise.
PairWise is an interactive, menu-driven program for aligning a protein profile (which may be a single
sequence) against a protein or DNA sequence. The program can also find repeats
using the Waterman-Eggert algorithm (
16
). Online help is available in all menus. In the MAIN menu, a sequence and
profile are read in and sequence ranges can be set. Moving to the ALIGNMENT
menu, there are options for the gap, frameshift and stop codon penalty
settings, screen or file output, the number of top alignments to be shown and
choice of DNA strand to perform the alignment. In the CONFIG menu, there are
options to change the residue substitution matrix and codon table and to choose
between default parameter settings for genomic, cDNA or EST, for which
different frameshift penalties are appropriate (Table
1
).
A BUILD menu allows protein alignments to be used to build new profiles
essentially as for the PROFILEWEIGHT program (
15
), but with two new options. Profiles built in the BUILD menu can be immediately
used for alignment. By default, profiles are calculated with the BLOSUM62
matrix (
19
), sequence weighting and automatic gap penalty reduction based on INDEL
variability. Submenus in BUILD allow parameters to be modified. The MATRIX menu
allows the amino acid exchange matrix to be varied and, if appropriate, to be
normalized for relative mutability. The WEIGHTING menu allows a choice between
several weighting schemes or none. The GAP PENALTY menu allows the type of gap
penalty at INDELs to be varied, as well as how to treat end gaps in alignments.
PairWise can be linked at compile time to GCG8 (
20
), whereupon it can extract sequences directly from GCG databases. Where the GCG
package is used as the main database management facility, PairWise can be used
as a closely integrated tool.
The SearchWise front end.
SearchWise provides a menu-driven front end for managing job submission to batch queues (or UNIX
background). The menus are aimed at simplifying the use of SearchWise for the
occasional computer user. The job parameters, sequence, database, penalties,
desired outputs, etc., are set up in menu options, then the batch job is
submitted to the specified queue. Defaults for all parameters are read from
file, so that the minimum input to initiate a search are a sequence and a
database. SearchWise is only appropriate for machines with batch queues or with
background operation, e.g. OpenVMS and UNIX. Online help is provided in each
menu.
The SearchWise search program SWise.
SWise is a command line-driven program to perform database searches which would normally be run on
batch queues or in background. It can be submitted from the SearchWise menus or
in an edited script or command file. The command line options are listed when
the user simply gives the command SWise. SWise allows various permutations of
search sequence and database.
Query
Database
DNA
PROTEINDNA Seq
N
YPROTEIN Seq
Y
YPROTEIN Profile
Y
Y
Table 1
SWise output consists of one obligatory file, the high score list, and two
optional files, the corresponding top alignments and a list of HTML links to
the ENTREZ WWW facility (
21
). The latter allows the user to peruse the entries and take advantage of
further links in exploring the hits.
WiseTools programming, distribution and information.
WiseTools is written in ANSI C as a series of modules linked by front end
programs (also ANSI C), hence it is in principle portable to any suitable
hardware platform. WiseTools programs have been run on the following platforms:
DEC alpha and OpenVMS v. 6.1; DEC Vax 3000 and OpenVMS v. 6.1; DEC alpha and
OSF/1, SGI and IRIX 5.2; Sun and Solaris 2.3. Binaries are provided for these
platforms. For the alignments, at least 32 MB of main memory should be
available. Mac and PC versions are considered for the future, but are not
currently supported (multitasking is highly desirable). SearchWise can read the
following databases files: GCG binary .seq files, GCG ASCII .seq files, Fasta
format and EMBL/SWISS-PROT .dat format.
The C files for WiseTools v. 1.5 are available via anonymous ftp to
nmrz.ocms.ox.ac.uk in the directories /pub/wise. A comprehensive help
(including installation instructions) are provided on the WWW at
http://www.sanger.ac.uk/~birney/wise/topwise.html.
Profiles for the PH domain, PHD finger and RNP domain were prepared with the
PairWise build menus using alignments based on those reported (
22
-
24
). The Gonnet PAM250 matrix (
25
) was used together with sequence weighting (
15
) and default INDEL penalties. The recommended gap penalty settings were then
used for preparing the alignment figures. See Bork and Gibson (
11
) for some guidelines on residue exchange matrix choice and parameter set-up in profile searches.
Frameshift errors are usually detected by comparing homologous sequences with
each other. With PairWise, a DNA sequence is compared to the protein sequences
of homologues. If one sequence consistently jumps frame in a particular region
when compared to the related sequences, majority rule assigns it to be the
guilty sequence. This verdict should be safe when aligning proteins with >50%
identity, but should be issued with caution when comparing highly divergent
homologues (e.g. <25% identity): in such cases, short random matches in other frames may
occasionally score higher than correct but low scoring sequences. Therefore, it
is particularly important not to set the FOP too softly. For highly divergent
proteins, a profile prepared from the rest of the family is a more reliable
probe for frameshifting than a straight sequence comparison.
The routine application of SearchWise in database screens for domains and
proteins of interest at EMBL has resulted in the detection
en passant
of a number of frameshifted entries, almost all of which can be ascribed to
sequencing error. Figure
3
shows a PairWise comparison of two closely related G2-specific cyclins B from starfish species. For much of the cyclin box, the
sequences are >90% identical. However, the C-terminus in particular makes multiple frameshifts and there are at least
13 sites of nucleotide insertion/deletion, probably rather more. By comparing
both sequences to other cyclins B, the EMBL entry APCYCLI (
31
) has all the sequencing errors. Before we noticed the frameshifts, this entry
caused us severe problems. Not only did it hinder our attempts to align the
cyclin family, but, until its removal from the search set, the introduced noise
precluded profile searches from revealing the cyclin/TFIIB/RB multidomain
family (
32
).
The Waterman-Eggert algorithm (
16
) returns the top
k
non-overlapping alignments in a sequence comparison. In a self-comparison of a highly repeated sequence, the algorithm does not
yield alignments of the individual repeats, but returns instead alignments
between subsets of the repeats. Thus the second best alignment is the set
consisting of repeat 1 to repeat
n
- 1 aligned with repeats 2 to
n
, and so on. Therefore, the algorithm reveals the existence of the repeats, but
does not return them.
In a comparison of a domain profile against a sequence containing multiple
domains, the algorithm works well. In this case, the top
k
non-overlapping alignments should correspond to the top
k
individual repeats (except in extremely awkward cases, such as very long
insertions). This facility in PairWise was used extensively in the analysis of
the PHD finger, which often occurs multiply in a protein sequence (
23
). Figure
4
illustrates the four PHD fingers in
Drosophila
Trx protein as detected by PairWise comparison with a PHD finger profile.
PairWise can detect these repeats equally well in the Trx DNA.
We have outlined the development and application of a general purpose sequence
comparison package, WiseTools, that is suitable for a range of comparative
analyses using amino acid sequence information, whether this is in the form of
protein sequence, protein multiple alignment or encoded in DNA. We now discuss
some general issues arising from this work.
SearchWise has been applied in profile searches with several protein domain
families (
22
,
23
,
36
-
38
). The ability to detect domains in newly submitted DNA entries helped to
provide up-to-date lists of the domains, including domains detected in unannotated
regions of DNA sequences. The frameshifting capability allowed a number of
frameshifted ESTs to be detected, while PairWise comparisons of problematic
sequences alerted us to a number of frameshifted cDNA entries (Table
2
).
SearchWise allows a DNA query to be compared to a protein database. This
facility should be useful for elucidating the coding contents of cDNAs
(including ESTs) and short regions of genomic DNA. As well as revealing
similarities to encoded proteins, we anticipate that routine application with
newly generated sequence data would be useful in revealing frameshifts before
the sequences reach the databases. Nevertheless, this is a last line of
defence, not a panacea: it cannot replace proper and conscientious sequence
determination, most particularly the full determination of a sequence on both
DNA strands.
The aim of the WiseTools package is to provide a general purpose and sensitive
approach to routine sequence comparison, which will reveal frameshifts as they
occur in matching sequences. Other methods are available which may do some of
these tasks comparably or do them faster but with less sensitivity. For
example, standard database searches with both TBLASTN and (to a lesser extent)
TFASTA are capable of detecting obvious high scoring frameshifts. In a
complementary approach to error detection by homology, there are several
programs that use coding preference statistics to assign likely reading frames.
Claverie (
7
) has developed special substitution matrices representing frameshifted codons
for protein-protein comparison using BLAST. This is a very quick method to detect
relatively long frameshifts in protein databases. The speed of BLASTP enabled
this approach to be used in an exhaustive search for frameshifts in the SWISS-PROT database.
The DETECT program of Posfai and Roberts (
6
) does a fast pattern search over all reading frames separately. The method is
sensitive to fairly short frameshifts with high identity or longer frameshifts
with lower identity and is applicable with introns. New DNA sequence can be
submitted as the query against a protein database.
States and Botstein (
5
) have developed a method for pairwise comparison in which a frameshifting Smith-Waterman algorithm uses probability tables for bases in each codon
position. Recently, Guan and Uberbacher (
39
) have implemented a jumping, three frame Smith-Waterman comparison. These methods should be comparably sensitive to
PairWise in detection of short frameshifted regions, as in the severe multiple
frameshifts of the cyclin in Figure
3
. They are not designed for intron-containing genomic sequences, having no frame extension penalty.
Several methods have been developed which use codon preferences to predict
translation frames, providing independent means of detecting frameshifts. The
Staden package has for some time provided a useful graphical representation of
reading frame codon preference (
40
) which we have used for frameshift error analysis (
41
). GRAIL II (
42
) applies coding statistics in a dynamic programming routine to find
frameshifts. Codon frequency information has also been applied in a wordsearch
algorithm (
43
). Due to the relatively weak statistical signal, these methods are likely to be
less sensitive than WiseTools comparisons under many conditions, but do not
require a homologous sequence. Therefore, they provide independent ways to
analyse frameshifts in situations where WiseTools cannot be applied or provides
ambiguous output.
Two studies have come up with estimates of the frameshift error frequencies in
database entries by comparison of homologous sequences. Claverie (
7
) reported a frequency of >= 0.5% frameshifts in SWISS-PROT v. 24 (28 154 sequences). Posfai and Roberts (
6
) found 156 problems in a set of 6000 bacterial unidentified ORFs which were
thought likely to be of below average reliability.
We intend to apply WiseTools in a systematic analysis of frameshifts. So far we
have found that virtually every protein family we look at throws up problem
sequences, for example, two frameshifts in Jak kinases, one in axl-like RPTKs (
44
), one in spectrins and one in each of the galactose utilization enzymes, which
are all small database groupings. From the average size of the various families
in Table
2
, it appears that problems are occurring in at least one in 20 protein
sequences, which implies that SWISS-PROT v. 32 has ~2000 frameshifted entries. While this is hopefully an upper limit due
to sampling bias in Table
2
, it highlights the severity of the problem in protein databases.
The majority of the sequences in SWISS-PROT come from small scale, targeted gene sequencing. However, some of the
large scale sequencing projects have been shown to have high error rates. In
particular, substantial problems were found for the first sequenced yeast
chromosome, the partially sequenced
Escherichia coli
chromosome and recently for the
Haemophilus influenzae
genome (
45
). Therefore, the error rate in some genomic DNA sequence projects may be even
higher than for the derived protein databases.
There are several clear improvements which could enhance the usefulness of the
software as described here. Performance improvements would be gained by
incorporating memory-efficient alignment into PairWise and by parallelization of SearchWise.
A frequent suggestion is to lower the FOP at splice recognition sequences. So
far, we have avoided taking this step for the following reasons. Since splice
site consensi vary between organisms, and even between cellular differentiation
states, it is impossible to conduct comprehensive database searches with
accurate frame opening costs at splice sites. Furthermore, in organisms such as
yeast, the splice junction validity is contingent upon a third motif within the
intron, at the branch site. The latter motif, based on the hexamer ACTAAC, is
much more highly conserved than the splice junctions themselves. Clearly, the
logic needed to accurately pre-screen sequences for splice junctions would add a considerable
computational load. Nevertheless, penalty reduction with appropriate splice
consensi would be very useful for genomic sequence analysis with PairWise.
A desirable upgrade for SearchWise would be a capability to search libraries of
protein family profiles. Searching calibrated profile databases provides a fast
and extremely sensitive way to describe the protein and domain classes in a
sequence. (For example, see the experimental ProfileScan server at ISREC - http://ulrec3.unil.ch/.) Adding this facility would be timely, since
future releases of the PRO-SITE database (
46
) will supply profile matrices.
The ideas and programs presented here benefited greatly from discussions and
tests involving many colleagues. We particularly wish to thank Adrian Krainer,
Des Higgins, Rein Aasland, Kay Hoffman, Liz Cowe, Jasper Rees, Thure Etzold and
everyone on the WiseTools mailing list. We also thank Rolf Apweiler and Amos
Bairoch for prompt annotation of frameshifted SWISS-PROT entries. Finally we thank Kevin Leonard for supporting this project.
Gap penalties based on observed INDEL length in multiple alignments.
The gap penalties are variable at two points: position-specific relative values are provided with each position in the profiles,
while the overall gap parameters are set in the menus. By default, profiles
prepared with the PairWise build menu supply local gap penalties varied
according to the observed tolerance for insertions and deletions in an
alignment. These penalties are suggested for use with globular proteins, where
INDEL behaviour can be understood in the light of structural and functional
restraints: these penalties might not perform well with other classes of
protein. At each INDEL (site of insertion and deletion) the sample variance of
the INDEL lengths is obtained, correcting for sequence bias by weighting
according to sequence divergence.
{s sub {w t d}} = {sqrt {{sum from {i = 1} to n} {{{{S e q W e i g h t} sub i}
{{( L e n g t h} sub i} - {{L e n g t h} bar} {) sup 2}} over {sum S e q W e i g h t s - {{S e q W e i g h t} bar}}}}}
1
Penalty
Single sequence
Profile from a multiple alignment
Default starting
Eukaryotic
Bacterial genomic
High error
set-up
genomic DNA
or cDNA
(ESTs etc.)
GOP
1000
1000
1000
1000
GEP
100
100
100
100
FOP
1200
750
850
600
FEP
2
1 (or 0)
600
200
Stop codon
500
500
500
150
Genomic DNA from eukaryotes.
Most genes from multicellular organisms are interrupted by several, often many,
introns. These are less common in simpler eukaryotes, such as yeast (although
here they have been consistently under-reported). Some 20 million bases of genomic DNA from
Candida elegans
alone have now been deposited in the EMBL database and most genes are heavily
spliced (
26
). The optimal settings for penalties are FOP > GOP, but FEP < GEP (if necessary, set FEP = ZERO for organisms with long introns) as
illustrated in Table
1
. With these settings, the alignment stays in-frame for moderate in-frame INDELS, yet it can jump frame and extend an arbitrary distance
to the next exon, almost regardless of intron length. Figure
2
shows a profile, made with the build menu from the collection of RNP domains (
24
), aligned by PairWise to the first RNP domain of the human hn
RNPA1
gene, which is split by an intron. The score for the individual exons are 4590
and 4969, compared with the score for the whole alignment, 8491. The latter
score, but not the subscores, allows the sequence to be detected in a database
search. Note that the algorithm jumps near to, but not exactly at, the splice
junctions in the RNP domain. The program currently has no intrinsic knowledge
of splice junctions. Therefore, the splice sites should be verified by
reference to splice consensi for the relevant organism. This can be done by
inspection or algorithmically, e.g. with the Staden analysis package (
27
), GRAIL (
28
), GeneParser (
29
), etc.
REFERENCES
Return



