Skip Navigation



Nucleic Acids Research Advance Access published online on October 21, 2009

Nucleic Acids Research, doi:10.1093/nar/gkp874
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (233K) Freely available
Right arrow Screen PDF (138K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Roberts, R. J.
Right arrow Articles by Macelis, D.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Roberts, R. J.
Right arrow Articles by Macelis, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author(s) 2009. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Database Issue

REBASE—a database for DNA restriction and modification: enzymes, genes and genomes

Richard J. Roberts*, Tamas Vincze, Janos Posfai and Dana Macelis

New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA

*To whom correspondence should be addressed. Tel: +1 978 380 7405; Fax: +1 978 380 7406; Email: roberts{at}neb.com

Received September 14, 2009. Revised September 29, 2009. Accepted September 30, 2009.


    ABSTRACT
 TOP
 ABSTRACT
 OVERVIEW
 FUNDING
 REFERENCES
 
REBASE is a comprehensive database of information about restriction enzymes, DNA methyltransferases and related proteins involved in the biological process of restriction–modification (R–M). It contains fully referenced information about recognition and cleavage sites, isoschizomers, neoschizomers, commercial availability, methylation sensitivity, crystal and sequence data. Experimentally characterized homing endonucleases are also included. The fastest growing segment of REBASE contains the putative R–M systems found in the sequence databases. Comprehensive descriptions of the R–M content of all fully sequenced genomes are available including summary schematics. The contents of REBASE may be browsed from the web (http://rebase.neb.com) and selected compilations can be downloaded by ftp (ftp.neb.com). Additionally, monthly updates can be requested via email.


    OVERVIEW
 TOP
 ABSTRACT
 OVERVIEW
 FUNDING
 REFERENCES
 
The previous description of REBASE in the 2007 NAR Database Issue (1) described 3805 biochemically or genetically characterized restriction–modification (R–M) systems and included an analysis of approximately 400 bacterial and archaeal genomes that had been deposited in the RefSeq Database of GenBank (2,3). Analysis of the available sequence information in GenBank led to the prediction of 2709 restriction enzyme (R) genes and 4485 DNA methyltransferase (M) genes. These numbers have now risen to 4990 R genes and 8080 M genes of which 3511 R and 5497 M genes have arisen from the 1050 completely sequenced bacterial and archaeal genomes. These putative R–M system genes are given systematic names according to the agreed upon nomenclature rules (4). The names all carry the suffix ‘P’ to indicate their putative status. In many cases, the recognition specificity of these systems can be assigned with some degree of confidence because of their similarity to biochemically well-characterized enzymes.

The REBASE web site (http://rebase.neb.com) summarizes all information known about every restriction enzyme and any associated proteins. This includes the recognition sequences, cleavage sites, source, commercial availability, sequence data, crystal structure information, isoschizomers and methylation sensitivity. Within the reference section of REBASE, links are maintained to the full text of all papers whenever they are readily available on the web. Also, there is extensive reciprocal cross-referencing between REBASE and NCBI, including links to GenBank and PubMed and NCBI’s LinkOut utility. Links to other major databases such as UniProt (5), PDB (6) and Pfam (7) are also maintained. There are currently 3945 biochemically or genetically characterized restriction enzymes in REBASE and of the 3834 Type II restriction enzymes, 299 distinct specificities are known. Six hundred and forty one restriction enzymes are commercially available, including 235 distinct specificities.

As shown in Figure 1, the rate of discovery of new putative restriction and modification genes is rising rapidly. In contrast, the rate at which candidates are being characterized biochemically has actually dropped to the level it was three decades ago. Nevertheless, because of the large number of sequenced examples of biochemically characterized restriction systems, the putative recognition sequences of predicted restriction enzymes and DNA methyltransferases can be inferred. Currently, all new sequences entering GenBank are checked using data mining techniques for the presence of R–M systems and, following extensive manual checking, the resulting inferences are all included within REBASE where they are clearly marked as predictions. When analyzing DNA sequence data, it is the DNA methyltransferase genes that are the more reliable indicators of an R–M system and the presence, proper order and characteristic spacing of well-conserved motifs that are used to suggest likely candidates.


Figure 1
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. The graph shows the numbers of R–M systems entering REBASE since its inception in 1975. The open bars show systems that have been characterized either biochemically or genetically. The black bars show the increasing accumulation of potential R–M systems that have been found by bioinformatic analysis of sequences in GenBank. The surge in 2004 represents the addition of metagenomic sequences from the Sargasso Sea collecting expedition (9).

 
It should be noted that at the present time it is not possible to distinguish DNA methyltransferases reliably enough to be completely confident in the assignments. Some RNA and protein methyltransferases can sometimes be confused for DNA methyltransferases as is widely reflected by the annotations found in GenBank files. In general, REBASE takes a liberal approach and includes all likely candidates until it becomes clear that non-DNA methyltransferases have been included erroneously and then these are culled from the database. The more widely divergent genes that encode the restriction enzymes always reside close to the genes for their cognate methyltransferases, but often they cannot be recognized directly because they are a rapidly evolving set of genes and frequently lack any sequence similarity to any other genes in GenBank. However, other methods can sometimes be used to infer their presence such as the analysis of shotgun sequence data from which missing clones can be inferred to be caused by the presence of active restriction enzyme genes (8).

Given the wealth of experimental data, both published and unpublished, contained within REBASE, it can be an especially valuable resource during the annotation of bacterial and archaeal genomes. With the plethora of restriction systems that occur in all sequenced microbial genomes, annotators are encouraged to use the resources of the REBASE database or to contact the REBASE staff if help is needed. Custom analyses of unpublished genome sequence data are carried out upon request.

From the REBASE web site users have a variety of resources available that facilitate the analysis of sequence information including tools for analyzing sequences (REBASE tools) that allow restriction enzyme recognition sites to be found in submitted sequences (NEBcutter) and an implementation of BLAST to allow searching against all sequences in REBASE. Specialty lists of sequence data (REBASE lists) such as all known Type II restriction enzyme genes, all known Type I specificity subunit genes, etc., are available for download.

The coming year will see some major additions to REBASE in terms of new sequence acquisitions, such as the inclusion of all metagenomic sequence data (only partially analyzed to date) and a tool to permit users to perform their own analysis of newly sequenced genomes.


    FUNDING
 TOP
 ABSTRACT
 OVERVIEW
 FUNDING
 REFERENCES
 
National Library of Medicine (LM04971); New England Biolabs, Inc. Funding for open access charge: New England Biolabs; National Institutes of Health grant.

Conflict of interest statement. None declared.


    ACKNOWLEDGEMENTS
 
Special thanks are due to the many individuals who have so kindly contributed their unpublished results for inclusion in this compilation and to the REBASE users who continue to guide our efforts with their helpful comments. We are especially grateful to Karen Otto for administrative help.


    REFERENCES
 TOP
 ABSTRACT
 OVERVIEW
 FUNDING
 REFERENCES
 

  1. Roberts RJ, Vincze T, Posfai J, Macelis D. REBASE—enzymes and genes for DNA restriction and modification. Nucleic Acids Res. (2007) 35:D269–D270.[Abstract/Free Full Text]

  2. Benson DA, Karsch-Mizrachi I, Lipmann DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. (2009) 37:D26–D31.[Abstract/Free Full Text]

  3. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. (2007) 35:D61–D65.[Abstract/Free Full Text]

  4. Roberts RJ, Belfort M, Bestor T, Bhagwat AS, Bickle TA, Bitinaite J, Blumenthal RM, Degtyarev SK, Dryden D.TF, Dybvig K, et al. A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes. Nucleic Acids Res. (2003) 31:1805–1812.[Abstract/Free Full Text]

  5. The UniProt Consortium. The Universal Protein Resource (UniProt). Nucleic Acids Res. (2009) 37:D169–D174.[Abstract/Free Full Text]

  6. Kouranov A, Xie L, de la Cruz J, Chen L, Westbrook J, Bourne PE, Berman HM. The RCSB PDB information portal for structural genomics. Nucleic Acids Res. (2006) 34:D302–D305.[Abstract/Free Full Text]

  7. Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, et al. The Pfam protein families database. Nucleic Acids Res. (2008) 36:D281–D288.[Abstract/Free Full Text]

  8. Zheng Y, Posfai J, Morgan RD, Vincze T, Roberts RJ. Using shotgun sequence data to find active restriction enzyme genes. Nucleic Acids Res. (2009) 37. e1.

  9. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science (2004) 304:66–74.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Print PDF (233K) Freely available
Right arrow Screen PDF (138K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Roberts, R. J.
Right arrow Articles by Macelis, D.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Roberts, R. J.
Right arrow Articles by Macelis, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?