Nucleic Acids Research Advance Access originally published online on November 28, 2006
Nucleic Acids Research 2007 35(Database issue):D680-D684; doi:10.1093/nar/gkl898
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2007, Vol. 35, Database issue D680-D684
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Articles |
PlasmID: a centralized repository for plasmid clone information and distribution
1 Harvard Institute of Proteomics, Harvard Medical School 320 Charles Street, Cambridge, MA 02141, USA 2 DF/HCC DNA Resource Core, Harvard Medical School 320 Charles Street, Cambridge, MA 02141, USA
*To whom correspondence should be addressed. Tel: +1 6173240816; Fax: +1 6173240824; Email: joshua_labaer{at}hms.harvard.edu
Received August 15, 2006. Accepted October 11, 2006.
| ABSTRACT |
|---|
|
|
|---|
The Plasmid Information Database (PlasmID; http://plasmid.hms.harvard.edu) was developed as a community-based resource portal to facilitate search and request of plasmid clones shared with the Dana-Farber/Harvard Cancer Center (DF/HCC) DNA Resource Core. PlasmID serves as a central data repository and enables researchers to search the collection online using common gene names and identifiers, keywords, vector features, author names and PubMed IDs. As of October 2006, the repository contains >46 000 plasmids in 98 different vectors, including cloned cDNA and genomic fragments from 26 different species. Moreover, the clones include plasmid vectors useful for routine and cutting-edge techniques; functionally related sets of human cDNA clones; and genome-scale gene collections for Saccharomyces cerevisiae, Pseudomonas aeruginosa, Yersinia pestis, Francisella tularensis, Bacillus anthracis and Vibrio cholerae. Information about the plasmids has been fully annotated in adherence with a high-quality standard, and clone samples are stored as glycerol stocks in a state-of-the-art automated 80°C freezer storage system. Clone replication and distribution is highly automated to minimize human error. Infor-mation about vectors and plasmid clones, including downloadable maps and sequence data, is freely available online. Researchers interested in requesting clone samples or sharing their own plasmids with the repository can visit the PlasmID website for more information.
| INTRODUCTION |
|---|
|
|
|---|
Plasmids are useful for a wide range of molecular genetic, genomic and proteomic approaches. In recent years, plasmid clone production has increased dramatically in response to the availability of genome information and new technologies. For example, various laboratories and research centers have recently produced genome-scale gene and shRNA collections (110). The Harvard Institute of Proteomics is among those groups engaged in the production of such clone collections. This group launched the FLEXGene (Full Length EXpression-ready) project several years ago to build comprehensive, fully sequence-verified gene collections in recombinational vectors (1,11,12). Plasmid resources, including large-scale gene collections, are necessary for full exploitation of genomic information, as they facilitate genome-scale experimentation at the level of functional and phenotypic assays, as well as protein expression, purification and analysis (1315).
Circulation of plasmids in the scientific community is essential for scientific progress, as it facilitates both original research and independent verification of reported results (15,16). Distribution of plasmid clones has traditionally fallen to individual researchers, who receive requests for clones in response to publications or presentations, and then retrieve, prepare and send out the clones. This ad hoc system lacks standardized quality control and reliable delivery, causing inefficiency and delays between clone requests and fulfillment. Moreover, equipment failure, poor record keeping, personnel turnover and organizational changes can lead to loss of clone samples and/or supporting information over time. Given the dramatic increase in the number and scale of plasmid clone collections, the traditional method of sharing plasmids has become insufficient to handle storage, maintenance and distribution of plasmid clones and supporting information.
Centralization of plasmid resources in a repository can provide standardized quality control and faster delivery of clones, in addition to relieving individual investigators of the need to spend time and resources on clone storage, maintenance and distribution. Centralized repositories are better equipped to manage large clone collections in an accurate and cost effective manner (15,16). In addition, a repository can speed delivery of clones by standardizing and simplifying the Materials Transfer Agreement (MTA) process, which can otherwise delay the sharing of resources (16,17). Finally, a centralized repository serves as a reliable archive to prevent loss of samples and information over time.
The Dana-Farber/Harvard Cancer Center (DF/HCC) DNA Resource Core recently launched an effort to store, maintain and distribute plasmid clones shared by researchers at academic and non-profit institutions. The initial repository collection includes clone sets from DF/HCC members, a group of
800 principal investigators at Harvard-affiliated institutions whose research focuses on the biology and treatment of cancer. Clones received by the repository are stored in an automated freezer storage system with an integrated database that ensures accurate sample tracking and long-term viability, and are made immediately available to researchers across the globe upon request.
The Plasmid Information Database (PlasmID) was developed to facilitate the effort to collect and distribute plasmid clones. As described here, PlasmID tracks and manages all relevant information related to clone intake, storage, maintenance and distribution.
| DATABASE DESCRIPTION |
|---|
|
|
|---|
Database design
PlasmID was developed with a J2EE compliant three-tier architecture consisting of a relational database as the data storage layer, a suite of Java modules hosted by an application server as the business logic layer and a web-based presentation layer. The database was designed to manage every step from clone collection, replication and storage to clone distribution. It includes four major tracking components: core data, sample and process data, clone order data, and user account management.
Core data tracking includes all relevant aspects of plasmid information, including vector backbone, inserts, growth conditions, host strains, selectable markers, tags, mutations, associated authors or contributors, and relevant publications. Individual clones are defined as vector plus insert(s). This design obviates the need to store duplicate information for individual clones that share the same vector or insert; keeps track of the lineage from parent to child vectors; and also makes it possible to store maps, sequences and other information associated with clones, vectors and inserts separately. In addition, this design is flexible enough to accommodate vectors with any number of inserts. Most clones in the repository collection contain either no insert (empty vector clones) or a single gene insert. For the latter, the clone is linked in the database to data pertinent to the single gene insert. When the insert contains more than one gene (e.g. genomic fragments), the clone is linked to all of the genes present in the insert, such that search with any one of the genes will retrieve the clone record from the database.
PlasmID was also designed to track clone sample processing steps via integration with robotic liquid-handling instruments and the freezer storage system. This enforces use of barcode labels and automation and thus, helps ensure accurate tracking and clone sample integrity.
Quality control
Research success demands that clone repositories deliver exactly what was requested. To establish and maintain a high-quality standard, several quality control procedures are enforced by PlasmID, including a stringent clone annotation process; automated clone storage and retrieval; and automated sample handling.
When plasmid clones are received from researchers, information about the clones is carefully annotated and imported into the database. Annotation of clone data from different sources may require specialized handling to ensure the best clone information quality. For clones shared by most laboratories, information is curated from user-provided files by a PhD-level scientist. For large collections, information can be extracted and parsed automatically from other databases. The annotation process relies on controlled vocabularies and standardized terms used by public databases such as EntrezGene, FlyBase, WormBase or the Saccharomyces Genome Database (1822). The annotated information is then uploaded into the database via formatted files. Once the clone records are in the database, the contributing researchers are encouraged to review the records and make changes where needed. In addition, feedback from users who have requested clones is solicited and reviewed to help validate clone information.
Immediately upon intake, plasmids are replicated in a T1, T5 phage-resistant bacterial host strain, and working and archival glycerol samples are created. The archival copies are stored in 96- or 384-well format in a standard 80°C freezer. Working copies are stored in individually 2D barcode-labeled tubes (Micronic BV) in an automated BioBank freezer (Thermo Electron Co.). The BioBank comprises a pair of 80°C storage units with liquid CO2 back-up and a centralized automated picking station that reads the 2D barcodes on all tubes and re-arrays any requested samples at 15°C. The system has a capacity of 160 000 samples, which is expandable. Once in storage, the integrated software alerts the operator after a clone has experienced six freeze-thaw cycles or four years in storage, to allow for re-growth, and creation of a new working sample to maintain clone viability. Retrieval of samples from the BioBank is fully automated and integrated with PlasmID. Upon clone request in PlasmID, work lists are automatically generated and stored in a common file server. The operator uploads the work lists directly to the BioBank software to retrieve the clones. Clones can be retrieved and arranged in a default format that maximizes efficiency, or in a user-defined format that facilitates downstream experimentation.
Clone replication is managed with workstation automation. Each step is tracked by PlasmID via direct integration with the liquid-handling instruments to avoid human error by enforcing automated cross-check of barcode labels and by collecting appropriate log files as verification of each process step.
Database content
PlasmID currently contains information pertaining to
46 000 plasmids in 98 different vectors (Table 1). The set of empty vector clones include plasmids useful for routine and cutting-edge cloning, and plasmids useful for cell-free and cell-based expression. The set of insert-containing clones includes genes from 26 distinct species of viruses, bacteria and eukaryotes. The insert types represented include cDNAs, shRNAs, genomic fragments and whole viral genomes. Moreover, a large group of plasmids in the repository comprises fully sequence-verified genes in recombinational cloning vectors that were shared with the repository by the Harvard Institute of Proteomics. These plasmids include complete or near-complete gene collections for Saccharomyces cerevisiae, Pseudomonas aeruginosa, Yersinia pestis, Francisella tularensis, Bacillus anthracis and Vibrio cholerae, in addition to a large collection of human genes. Related clones have been organized into specific clone collections that include sets of
500 human kinases and sets of
1000 breast cancer-associated genes (4,23). Detailed information about clones and their component parts, vectors and inserts, are freely available to all researchers via the PlasmID website. Moreover, users can download maps, gene lists and sequence data relevant to specific vectors, inserts and clones.
|
Database queries
PlasmID can be searched online using four basic search types: gene identifiers, clone identifiers, vector properties, and a combined text string search by gene, vector, author and/or publication.
Search by gene or clone identifiers are useful for users who know specifically what genes or clones they are interested in. Searches can be performed using common gene identifiers such as Gene IDs, GenBank accession or GI numbers, Gene Symbols (including official symbols and aliases), specific clone identifiers (e.g. PlasmID clone IDs and FLEXGene IDs), and in some cases, specialized database identifiers relevant to well-annotated model organisms (e.g. SGD, FlyBase or WormBase IDs). Searches can be done in batch mode using a list of identifiers.
For researchers who wish to find vectors that carry specific functionalities (e.g. mammalian expression and epitope tags), a more complex query tool for vectors that harbor a broad variety of features was developed. Extensive annotation has been done for the vector backbones using controlled vocabularies for consistency. Each vector is assigned one or more properties that are further organized into broader categories such as assay type, cloning system and expression (Table 2). In response to a search, the system identifies all the vectors and clones in those vectors that meet the criteria selected by the researcher. The user then has the option to view the list of clones or to further limit the search results by selecting specific vectors.
|
The combined text string search (Advanced Text Search) is the most versatile search tool at PlasmID. This tool facilitates full-text search for all possible matches to gene or vector name(s), vector feature(s), author or contributor name(s), or PubMed ID(s), either alone or in combination. To further increase the flexibility of the tool, partial terms can be used. For example, users can search with CDK to identify all the CDK-related genes, including CDK2, CDK3, etc. Moreover, search results can be limited to display only those results that are relevant to one species.
In all cases, search results are displayed in tabular format with active links to Clone Detail and Vector Detail pages, and to external databases (Figure 1). Results can be sorted alphabetically by specific columns (e.g. vector name and gene name) and if desired, results can be downloaded as an MS-Excel spreadsheet for local storage and analysis.
|
| PLASMID SHARING AND DISTRIBUTION |
|---|
|
|
|---|
Plasmids are currently accepted as purified DNA in solution, which is then transformed into an appropriate phage-resistant bacterial host strain. At a minimum, researchers provide growth conditions, vector and insert-related information, and the PubMed IDs of any relevant publications. Vector and/or clone maps and sequences are also included when they are available.
Most plasmids received by the repository are distributed world-wide to researchers at any academic or non-profit laboratory. However, for some clones, intellectual property or biological safety concerns restrict distribution to specific sub-groups and/or geographical regions. In most cases, plasmid clone samples are distributed as glycerol stocks. In some cases, legal restrictions or shipping concerns require that plasmids are shipped as purified DNA spotted to filter paper. In keeping with accepted practices and in compliance with granting agencies, the DNA Resource Core charges a minimal handling fee for plasmid clone samples and asks that requestors cover shipping charges (16).
Most clones are distributed under the terms of a standardized and simple MTA. The DF/HCC DNA Resource Core has had considerable success in gaining pre-approval for the MTA from specific institutions in order to streamline clone distribution. Members of institutions that have granted pre-approval do not need to obtain signatures on an MTA to receive most clones. We encourage additional institutions to participate in this effort.
All clone requests are handled online. Users can monitor clone request status and retrieve clone information related to a given request via PlasmID at any time.
| FUTURE PERSPECTIVES |
|---|
|
|
|---|
Plasmid clones are tools for a vast array of cloning, expression and DNA manipulation approaches; thus, centralization and open access to plasmid samples and information help facilitate biological research. We have successfully built a robust, highly-quality controlled and flexible database of plasmid clone information. As in the past, clones produced in high-throughput cloning efforts will continue to serve as a major source for the repository. Moreover, recruitment of diverse types of plasmids from the scientific community will help expand the scope and utility of the repository collection. Researchers who wish to submit their plasmids into the repository are invited to visit the PlasmID website (http://plasmid.hms.harvard.edu) for more information about how they can help in this effort. At the same time, we will continue to improve the quality of the existing collection based on user feedback, and to improve the web interface in order to best facilitate ease-of-use and enhance search capability of PlasmID.
Note Added in Proof
Our group was recently named the site of the Protein Structure Initiative (PSI) Materials Repository for storage and distribution of plasmid clones generated at PSI site laboratories (NIH Grant 1U01GM079610-01). This will increase the number of sequence-verified, unique plasmid clones available at PlasmID by approximately 50 000100 000 over the next five years. The clone inserts in the PSI collections will include open reading frames from several different eukaryotic and prokaryotic species.
| ACKNOWLEDGEMENTS |
|---|
The authors would like to thank Michael Collins for system support and Mauricio Fernandez for help with integration of automated instruments. This work was supported by the DF/HCC, which is a designated Comprehensive Cancer Center of the National Cancer Institute (5P30 CA06516-06). Additional support was provided by the Harvard Skin Disease Research Center of Brigham and Women's Hospital and Harvard Medical School (P30 AR42689-11). Funding for the automated freezer storage system was provided by the National Institutes of Health (1S10 RR19310-01). Funding to pay the Open Access publication charges for this article was provided by the National Cancer Institute (5P30 CA06516-06).
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Labaer, J., Qiu, Q., Anumanthan, A., Mar, W., Zuo, D., Murthy, T.V., Taycher, H., Halleck, A., Hainsworth, E., Lory, S., et al. (2004) The Pseudomonas aeruginosa PA01 gene collection Genome Res, . 14, 21902200
[Abstract/Free Full Text] . - Martzen, M.R., McCraith, S.M., Spinelli, S.L., Torres, F.M., Fields, S., Grayhack, E.J., Phizicky, E.M. (1999) A biochemical genomics approach for identifying genes by the activity of their products Science, 286, 11531155
[Abstract/Free Full Text] . - Paddison, P.J., Silva, J.M., Conklin, D.S., Schlabach, M., Li, M., Aruleba, S., Balija, V., O'Shaughnessy, A., Gnoj, L., Scobie, K., et al. (2004) A resource for large-scale RNA-interference-based screens in mammals Nature, 428, 427431[CrossRef][Medline] .
- Park, J., Hu, Y., Murthy, T.V., Vannberg, F., Shen, B., Rolfs, A., Hutti, J.E., Cantley, L.C., Labaer, J., Harlow, E., et al. (2005) Building a human kinase gene repository: bioinformatics, molecular cloning and functional validation Proc. Natl Acad. Sci. USA, 102, 81148119
[Abstract/Free Full Text] . - Reboul, J., Vaglio, P., Tzellas, N., Thierry-Mieg, N., Moore, T., Jackson, C., Shin-i, T., Kohara, Y., Thierry-Mieg, D., Thierry-Mieg, J., et al. (2001) Open-reading-frame sequence tags (OSTs) support the existence of at least 17,300 genes in C.elegans Nature Genet, . 27, 332336[CrossRef][Web of Science][Medline] .
- Silva, J.M., Li, M.Z., Chang, K., Ge, W., Golding, M.C., Rickles, R.J., Siolas, D., Hu, G., Paddison, P.J., Schlabach, M.R., et al. (2005) Second-generation shRNA libraries covering the mouse and human genomes Nature Genet, . 37, 12811288[CrossRef][Web of Science][Medline] .
- Stapleton, M., Carlson, J., Brokstein, P., Yu, C., Champe, M., George, R., Guarin, H., Kronmiller, B., Pacleb, J., Park, S., et al. (2002) A Drosophila full-length cDNA resource Genome Biol, . 3, RESEARCH0080[Medline] .
- Strausberg, R.L., Feingold, E.A., Grouse, L.H., Derge, J.G., Klausner, R.D., Collins, F.S., Wagner, L., Shenmen, C.M., Schuler, G.D., Altschul, S.F., et al. (2002) Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences Proc. Natl Acad. Sci. USA, 99, 1689916903
[Abstract/Free Full Text] . - Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., et al. (2000) A comprehensive analysis of proteinprotein interactions in Saccharomyces cerevisiae Nature, 403, 623627[CrossRef][Medline] .
- Zhu, H., Bilgin, M., Bangham, R., Hall, D., Casamayor, A., Bertone, P., Lan, N., Jansen, R., Bidlingmaier, S., Houfek, T., et al. (2001) Global analysis of protein activities using proteome chips Science, 293, 21012105
[Abstract/Free Full Text] . - Brizuela, L., Braun, P., LaBaer, J. (2001) FLEXGene repository: from sequenced genomes to gene repositories for high-throughput functional biology and proteomics Mol. Biochem. Parasitol, . 118, 155165[CrossRef][Web of Science][Medline] .
- Brizuela, L., Richardson, A., Marsischky, G., Labaer, J. (2002) The FLEXGene repository: exploiting the fruits of the genome projects by creating a needed resource to face the challenges of the post-genomic era Arch Med. Res, . 33, 318324[CrossRef][Web of Science][Medline] .
- Braun, P. and LaBaer, J. (2003) High throughput protein production for functional proteomics Trends Biotechnol, . 21, 383388[CrossRef][Web of Science][Medline] .
- Pearlberg, J. and LaBaer, J. (2004) Protein expression clone repositories for functional proteomics Curr. Opin. Chem. Biol, . 8, 98102[CrossRef][Web of Science][Medline] .
- Hayashizaki, Y. and Kawai, J. (2004) A new approach to the distribution and storage of genetic resources Nature Rev. Genet, . 5, 223228[CrossRef][Web of Science][Medline] .
- Weaver, T., Maurer, J., Hayashizaki, Y. (2004) Sharing genomes: an integrated approach to funding, managing and distributing genomic clone resources Nature Rev. Genet, . 5, 861866[CrossRef][Web of Science][Medline] .
- Cozzarelli, N.R. (2004) UPSIDE: Uniform principle for sharing integral data and materials expeditiously Proc. Natl Acad. Sci. USA, 101, 37213722
[Free Full Text] . - Drysdale, R.A. and Crosby, M.A. (2005) FlyBase: genes and gene models Nucleic Acids Res, . 33, D390D395
[Abstract/Free Full Text] . - Schwarz, E.M., Antoshechkin, I., Bastiani, C., Bieri, T., Blasiar, D., Canaran, P., Chan, J., Chen, N., Chen, W.J., Davis, P., et al. (2006) WormBase: better software, richer content Nucleic Acids Res, . 34, D475D478
[Abstract/Free Full Text] . - Chen, N., Harris, T.W., Antoshechkin, I., Bastiani, C., Bieri, T., Blasiar, D., Bradnam, K., Canaran, P., Chan, J., Chen, C.K., et al. (2005) WormBase: a comprehensive data resource for Caenorhabditis biology and genomics Nucleic Acids Res, . 33, D383D389
[Abstract/Free Full Text] . - Dwight, S.S., Balakrishnan, R., Christie, K.R., Costanzo, M.C., Dolinski, K., Engel, S.R., Feierbach, B., Fisk, D.G., Hirschman, J., Hong, E.L., et al. (2004) Saccharomyces genome database: underlying principles and organisation Brief Bioinformatics, 5, 922
[Abstract/Free Full Text] . - Cherry, J.M., Adler, C., Ball, C., Chervitz, S.A., Dwight, S.S., Hester, E.T., Jia, Y., Juvik, G., Roe, T., Schroeder, M., et al. (1998) SGD: Saccharomyces Genome Database Nucleic Acids Res, . 26, 7379
[Abstract/Free Full Text] . - Witt, A.E., Hines, L.M., Collins, N.L., Hu, Y., Gunawardane, R.N., Moreira, D., Raphael, J., Jepson, D., Koundinya, M., Rolfs, A., et al. (2006) Functional proteomics approach to investigate the biological activities of cDNAs implicated in breast cancer J. Proteome Res, . 5, 599610[CrossRef][Web of Science][Medline]
.
This article has been cited by other articles:
![]() |
Y. Hu, A. Rolfs, B. Bhullar, T. V. S. Murthy, C. Zhu, M. F. Berger, A. A. Camargo, F. Kelley, S. McCarron, D. Jepson, et al. Approaching a complete repository of sequence-verified protein-encoding clones for Saccharomyces cerevisiae Genome Res., April 1, 2007; 17(4): 536 - 543. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

