Skip Navigation

This Article
Right arrow Full Text Freely available
Right arrow Print PDF (674K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (17)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Rigoutsos, I.
Right arrow Articles by Platt, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rigoutsos, I.
Right arrow Articles by Platt, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2002, Vol. 30, No. 17 3901-3916
© 2002 Oxford University Press

Dictionary-driven protein annotation

Isidore Rigoutsos*, Tien Huynh, Aris Floratos1, Laxmi Parida and Daniel Platt

Bioinformatics and Pattern Discovery Group, IBM TJ Watson Research Center, Yorktown Heights, NY 10598, USA and 1 First Genetic Trust Inc., 9 Polito Avenue, Lyndhurst, NJ 07071, USA

*To whom correspondence should be addressed. Tel: +1 914 945 1384; Fax: +1 914 945 4104; Email: rigoutso{at}us.ibm.com

Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were released publicly after we built the Bio-Dictionary that is used in our experiments. Finally, we have computed the annotations of more than 70 complete genomes and made them available on the World Wide Web at http://cbcsrv.watson.ibm.com/Annotations/.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
S. Rajasekaran, S. Balla, P. Gradie, M. R. Gryk, K. Kadaveru, V. Kundeti, M. W. Maciejewski, T. Mi, N. Rubino, J. Vyas, et al.
Minimotif miner 2nd release: a database and web system for motif search
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D185 - D190.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Huynh and I. Rigoutsos
The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W10 - W15.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. Rigoutsos, P. Riek, R. M. Graham, and J. Novotny
Structural details (kinks and non-{alpha} conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors
Nucleic Acids Res., August 1, 2003; 31(15): 4625 - 4631.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Huynh, I. Rigoutsos, L. Parida, D. Platt, and T. Shibuya
The web server of IBM's Bioinformatics and Pattern Discovery group
Nucleic Acids Res., July 1, 2003; 31(13): 3645 - 3650.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
I. Rigoutsos, J. Novotny, T. Huynh, S. T. Chin-Bow, L. Parida, D. Platt, D. Coleman, and T. Shenk
In Silico Pattern-Based Analysis of the Human Cytomegalovirus Genome
J. Virol., April 1, 2003; 77(7): 4326 - 4344.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.