| Nucleic Acids Research | Pages |
New features of the Blocks Database servers
Introduction
Blocks-Based Analysis Of Families Not In The Blocks Database
Codehop PCR Primer Design
Access
Acknowledgement
References
New features of the Blocks Database servers
ABSTRACT
INTRODUCTION
The Blocks Database was originally introduced to aid in the family classification of proteins (1). Blocks are ungapped multiple alignments corresponding to the most conserved regions of proteins. To construct the Blocks Database, lists of family members obtained from Prosite (2) are used to find representative sets of blocks, employing a fully automated motif-finding method, which does not use Prosite patterns. Blocks v. 11.0 contains 4034 blocks representing 994 protein families. The Blocks Database can be searched for sequence similarities using protein or nucleic acid queries. For searching, the Blocks Database is augmented with 1781 blocks that correspond to the ungapped multiple alignments for 287 families from the PRINTS fingerprint database (3) that are absent from Prosite. In recent years, the Blocks Database has been enhanced with the addition of searching and analysis tools (summarized in Fig.
BLOCKS-BASED ANALYSIS OF FAMILIES NOT IN THE BLOCKS DATABASE
Currently, the Blocks Database is keyed to curated catalogs of protein families: the Prosite and PRINTS databases. The families chosen for inclusion in these catalogs are chosen manually, and many known protein families are absent because of the rigors of curation. Often a biologist is interested in a protein family that has not been included in the Blocks Database. Undoubtedly, the recent explosion of data resulting from sequencing of whole genomes has contributed to the difficulty in curation. Yet not all of the missing families are newly-discovered ones: some families that have been studied and documented for many years have inexplicably been omitted from these compendiums. For example, the enzyme catalyzing the fifth step in the universal pathway for purine de novo biosynthesis, aminoimidazole ribonucleotide synthetase (AIRS, EC 6.3.4.13), is not represented, even though other enzymes in the pathway that are often present on multi-enzyme proteins containing AIRS are represented in the Blocks Database. Soon we expect to expand the Blocks Database by inclusion of uncurated families obtained using fully automated methods. Meanwhile, the Blocks Multiple Alignment Processor can provide tools for analysis and searching of families that are not represented in the Blocks Database. User-provided multiple alignments generated by any means are converted by the Processor to blocks for analysis.
Figure 1. Overview of the Blocks Database. Input sources (left) and applications (right) of the Blocks Database. We use AIRS as an example of a protein family that can be analyzed with blocks-based tools available on the Blocks WWW site. Protein family databases that are constructed using fully automated methods, such as ProDom (7) and Domo (8), provide multiple sequence alignments representing the AIRS domain; these alignments can be accessed via hypertext links from database entries of sequences with the AIRS domain, for example, the SwissProt sequence entry for Escherichia coli AIRS: PUR5_ECOLI. Pasting the AIRS multiple alignment from Domo into the Blocks Multiple Alignment Processor window and submitting the alignment, six blocks are returned. The blocks have been carved out from the Domo multiple alignment by removal of all alignment columns with a gap (-) character in one or more sequences. The resulting blocks are retained if they are at least 10 columns wide. These blocks can be examined directly in their conventional text representation, or more informatively by displaying them as sequence logos (Fig. Figure 2. Sequence logo of AIRS Block B and location of CODEHOP-designed primers. In the logo, the height of each amino acid is scaled in bits of information and is proportional to its degree of conservation. A pair of primers is schematically aligned with the two block segments from which they were designed. For each primer, the 5[prime] consensus clamp is depicted as an open line (corresponding to the sequence in standard text), and the 3[prime] degenerate core is depicted as as a solid line (corresponding to the sequence in white on black letters, using the IUB-PAC code for degenerate positions). These primers were found for regions in the second AIRS block corresponding to positions 59-67 and 98-109 in SwissProt entry PUR5_ECOLI. Default parameters were used for maximum degeneracy (128) and melting temperature of the consensus clamp (60°C). A core strictness value of 0.05 was chosen, where the range is 0 to 1 (a setting of 0 stipulates that all possible coding sequences for residues in the core region of the block are present in the pool of primers). The zebrafish codon usage table was chosen. Searching options that are available for the Blocks Database are also available for the user-submitted families, including COBBLER [COnsensus Biasing By Locally Embedding Residues (10)], MAST [Multiple Alignment Searching Tool (11)] and LAMA [Local Alignment of Multiple Alignments (12)]. COBBLER-based searches are carried out by automatically embedding the blocks into a single sequence and then sending the resulting sequence to either the BLAST or PSI-BLAST server. In the case of AIRS, the six Domo-derived blocks were embedded into PUR5_ECOLI and sent to PSI-BLAST. In addition to detection of several more AIRS-containing proteins not among the 10 bacterial and eukaryotic members present in Domo, PSI-BLAST finds likely AIRS domains in many other organisms, including archaea, and in multifunctional proteins found in eukaryotes, pinpointing the location of single or tandem AIRS domains in each one. PSI-BLAST also identifies statistically significant sequence similarities between the AIRS domain and hydrogenases in bacteria and archaea, between the first half of the AIRS domain and bacterial selenophosphate synthetases and between AIRS and part of the previous enzyme in the pathway, FGAR synthetase. Are these true homology relationships or chance similarities? One way of deciding is to choose the MAST option, which sends position-specific scoring matrices corresponding to each of the six AIRS blocks to the MAST server (http://www.sdsc.edu/meme/meme.2.0/website/mast.html ). In general, searching a sequence database with such a block-based query provides better separation of true from false positives than searching with a query that consists of a single sequence representing the family (10,13). MAST detects the same AIRS homologs as PSI-BLAST with expected values E <10-11, but the highest-scoring dehydrogenase is detected at only E = 0.4, the highest-scoring selenophosphate synthetase at E = 6.6 and FGAR synthetase is not detected at all. The excellent separation of AIRS from non-AIRS by MAST is useful for identifying homologous proteins with the same enzymatic activities. Because the MAST results do not confirm the COBBLER/PSI-BLAST hits to non-AIRS enzymes, these hits should be viewed with extreme caution, although it is possible that further evidence will provide confirmation of structural similarity.
CODEHOP PCR PRIMER DESIGN
Short regions of proteins with high conservation are frequently used for the isolation of homologs in genomes of interest by designing PCR primers from blocks. Over the years, various rules of thumb have been applied to the design of degenerate primers for this purpose; however, development of systematic methods have been stymied by unknown factors, such as the unknown effect of mismatches in various positions of a primer on annealing temperature. Recently, our group introduced a new method for PCR primer design in which degeneracy is confined to the short 3[prime] `core', while a non-degenerate 5[prime] `clamp' stabilizes annealing of the core to the starting template (14). To maximize stabilization, the clamp consists of a consensus sequence that is designed from the region of the block immediately upstream of the region used to design the core. In subsequent rounds, when primer must anneal to product molecules that have incorporated the primer, high stringency priming will occur because the clamp is a single non-degenerate sequence. This differs from degenerate PCR, where low annealing temperatures are utilized to involve all of the primers in annealing to product templates that have incorporated different degenerate primers. Moreover, the use of a short degenerate core of only 11-12 bp minimizes the length of conservation needed for successful amplification, thus permitting the design of primers from blocks that are too diverged for the practical design of conventional primers. The CODEHOP (for COnsensus-DEgenerate Hybrid Oligonucleotide Primers) method has been validated by the successful amplification of products that have proven challenging using conventional methods (14).
CODEHOPs are designed automatically by a program that predicts optimal primers given a set of blocks. Hypertext links to the CODEHOP designer are provided from the Blocks and PRINTS Databases, BlockMaker and the Multiple Alignment Processor. Several options are available for customizing primer design, including choice of codon usage table for the target genome, choice of annealing temperature, which determines the length of the clamp, choices for the degree of degeneracy and stringency of matches to the block in the core region, and the ability to change the weights of input sequences to favor a subset of interest.
The use of the CODEHOP designer is illustrated for the AIRS blocks described above. Optimal CODEHOP primers are found for what appear to be the most highly conserved regions, which are found in the second block (Fig.
ACCESS
The Blocks WWW server at http://blocks.fhcrc.org/ implements all of the routines described in this article, which should be cited when the Blocks Database servers are used. The Blocks Database can also be searched via Email by submitting a DNA or protein sequence in FASTA or other common formats to blocks{at}blocks.fhcrc.org.
ACKNOWLEDGEMENT
This work is supported by a grant from the NIH (GM29009).
REFERENCES
This article has been cited by other articles:
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
![]()
![]()

![]()
![]()
![]()
B. Contreras-Moreira, B. Sachman-Ruiz, I. Figueroa-Palacios, and P. Vinuesa
primers4clades: a web server that uses phylogenetic trees to design lineage-specific PCR primers for metagenomic and diversity studies
Nucleic Acids Res.,
July 1, 2009;
37(suppl_2):
W95 - W100.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
P. Pagel, M. Oesterheld, O. Tovstukhina, N. Strack, V. Stumpflen, and D. Frishman
DIMA 2.0 predicted and known domain interactions
Nucleic Acids Res.,
January 11, 2008;
36(suppl_1):
D651 - D655.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
O. Krishnadev, N. Rekha, S. B. Pandit, S. Abhiman, S. Mohanty, L. S. Swapna, S. Gore, and N. Srinivasan
PRODOC: a resource for the comparison of tethered protein domain architectures with in-built information on remotely related domain families
Nucleic Acids Res.,
July 1, 2005;
33(suppl_2):
W126 - W129.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. T. Hannich, A. Lewis, M. B. Kroetz, S.-J. Li, H. Heide, A. Emili, and M. Hochstrasser
Defining the SUMO-modified Proteome by Multiple Approaches in Saccharomyces cerevisiae
J. Biol. Chem.,
February 11, 2005;
280(6):
4102 - 4110.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. K. Cusick, E. Hager, and R. E. Gill
Characterization of bcsA Mutations That Bypass Two Distinct Signaling Requirements for Myxococcus xanthus Development
J. Bacteriol.,
September 15, 2002;
184(18):
5141 - 5150.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
Y. Nakamura, H. Tanaka, M. Koga, Y. Miyagawa, N. Iguchi, C. E. de Carvalho, K. Yomogida, M. Nozaki, H. Nojima, K. Matsumiya, et al.
Molecular Cloning and Characterization of oppo 1: A Haploid Germ Cell-Specific Complementary DNA Encoding Sperm Tail Protein
Biol Reprod,
July 1, 2002;
67(1):
1 - 7.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
E. P. Skaar, M. P. Lazio, and H. S. Seifert
Roles of the recJ and recN Genes in Homologous Recombination and DNA Repair Pathways of Neisseria gonorrhoeae
J. Bacteriol.,
February 15, 2002;
184(4):
919 - 927.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
K. E. Shearwin, I. B. Dodd, and J. B. Egan
The Helix-Turn-Helix Motif of the Coliphage 186 Immunity Repressor Binds to Two Distinct Recognition Sequences
J. Biol. Chem.,
January 25, 2002;
277(5):
3186 - 3194.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
T. MAEGAWA, T. KARASAWA, T. OHTA, X. WANG, H. KATO, H. HAYASHI, and S. NAKAMURA
Linkage between toxin production and purine biosynthesis in Clostridium difficile
J. Med. Microbiol.,
January 1, 2002;
51(1):
34 - 41.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. F. Smothers and S. Henikoff
The Hinge and Chromo Shadow Domain Impart Distinct Targeting of HP1-Like Proteins
Mol. Cell. Biol.,
April 1, 2001;
21(7):
2555 - 2569.
[Abstract]
[Full Text]
![]()
![]()
![]()

![]()
![]()
![]()
M. I. Lerman and J. D. Minna
The 630-kb Lung Cancer Homozygous Deletion Region on Human Chromosome 3p21.3: Identification and Evaluation of the Resident Candidate Tumor Suppressor Genes
Cancer Res.,
November 1, 2000;
60(21):
6116 - 6133.
[Abstract]
[Full Text]
![]()
![]()
![]()

![]()
![]()
![]()
L. S.B. Goldstein and S. Gunawardena
Flying through the Drosophila Cytoskeletal Genome
J. Cell Biol.,
July 24, 2000;
150(2):
63 - 68.
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
I. M. B. Francischetti, J. M. C. Ribeiro, D. Champagne, and J. Andersen
Purification, Cloning, Expression, and Mechanism of Action of a Novel Platelet Aggregation Inhibitor from the Salivary Gland of the Blood-sucking Bug, Rhodnius prolixus
J. Biol. Chem.,
April 21, 2000;
275(17):
12639 - 12650.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
M. G. Reese, G. Hartzell, N. L. Harris, U. Ohler, J. F. Abril, and S. E. Lewis
Genome Annotation Assessment in Drosophila melanogaster
Genome Res.,
April 1, 2000;
10(4):
483 - 501.
[Abstract]
[Full Text]
![]()
![]()
![]()

![]()
![]()
![]()
T. Gaasterland, A. Sczyrba, E. Thomas, G. Aytekin-Kurban, P. Gordon, and C. W. Sensen
MAGPIE/EGRET Annotation of the 2.9-Mb Drosophila melanogaster Adh Region
Genome Res.,
April 1, 2000;
10(4):
502 - 510.
[Abstract]
[Full Text]
![]()
![]()
![]()

![]()
![]()
![]()
B. Lin, J. T. White, C. Ferguson, R. Bumgarner, C. Friedman, B. Trask, W. Ellis, P. Lange, L. Hood, and P. S. Nelson
PART-1: A Novel Human Prostate-specific, Androgen-regulated Gene that Maps to Chromosome 5q12
Cancer Res.,
February 1, 2000;
60(4):
858 - 863.
[Abstract]
[Full Text]
![]()
![]()
![]()

![]()
![]()
![]()
M. Villain, P. L. Jackson, M. K. Manion, W.-J. Dong, Z. Su, G. Fassina, T. M. Johnson, T. T. Sakai, N. R. Krishna, and J. E. Blalock
De Novo Design of Peptides Targeted to the EF Hands of Calmodulin
J. Biol. Chem.,
January 28, 2000;
275(4):
2676 - 2685.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
S. Henikoff, K. Ahmad, J. S. Platero, and B. van Steensel
From the Cover: Heterochromatic deposition of centromeric histone H3-like proteins
PNAS,
January 18, 2000;
97(2):
716 - 721.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
C. A. Pereira, G. D. Alonso, M. C. Paveto, A. Iribarren, M. L. Cabanas, H. N. Torres, and M. M. Flawia
Trypanosoma cruzi Arginine Kinase Characterization and Cloning. A NOVEL ENERGETIC PATHWAY IN PROTOZOAN PARASITES
J. Biol. Chem.,
January 14, 2000;
275(2):
1495 - 1501.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
G. Yona, N. Linial, and M. Linial
ProtoMap: automatic classification of protein sequences and hierarchy of protein families
Nucleic Acids Res.,
January 1, 2000;
28(1):
49 - 55.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. G. Henikoff, E. A. Greene, S. Pietrokovski, and S. Henikoff
Increased coverage of protein families with the Blocks Database servers
Nucleic Acids Res.,
January 1, 2000;
28(1):
228 - 230.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. Murvai, K. Vlahovicek, E. Barta, B. Cataletto, and S. Pongor
The SBASE protein domain library, release 7.0: a collection of annotated protein sequence segments
Nucleic Acids Res.,
January 1, 2000;
28(1):
260 - 262.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
H. Huang, C. Xiao, and C. H. Wu
ProClass protein family database
Nucleic Acids Res.,
January 1, 2000;
28(1):
273 - 276.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
A. Porta, A. M. Ramon, and W. A. Fonzi
PRR1, a Homolog of Aspergillus nidulans palF, Controls pH-Dependent Gene Expression and Filamentation in Candida albicans
J. Bacteriol.,
December 15, 1999;
181(24):
7516 - 7523.
[Abstract]
[Full Text]
![]()
![]()
![]()

![]()
![]()
![]()
J.-M. Claverie
Computational methods for theidentification of differential and coordinated gene expression
Hum. Mol. Genet.,
September 1, 1999;
8(10):
1821 - 1832.
[Abstract]
[Full Text]
[PDF]
![]()
This Article ![]()
![]()
Abstract
![]()
Print PDF (61K)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Similar articles in ISI Web of Science
![]()
Similar articles in PubMed
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Search for citing articles in:
ISI Web of Science (64)
![]()
Request Permissions ![]()
Commercial Re-use Guidelines
for Open Access NAR Content
![]()
Google Scholar ![]()
![]()
Articles by Henikoff, J. G.
![]()
Articles by Pietrokovski, S.
![]()
Search for Related Content
![]()
PubMed ![]()
![]()
PubMed Citation
![]()
Articles by Henikoff, J. G.
![]()
Articles by Pietrokovski, S.
![]()
Social Bookmarking ![]()
![]()
What's this?