Nucleic Acids Research Advance Access published online on November 7, 2008
Nucleic Acids Research, doi:10.1093/nar/gkn817
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Database Issue |
siRecords: a database of mammalian RNAi experiments and efficacies
Department of Neuroscience, University of Minnesota, Minneapolis, MN 55455, USA
*To whom correspondence should be addressed. Tel: +1 612 3481; Fax: +1 612 626 5009; Email: toli{at}biocompute.umn.edu
Received September 4, 2008. Revised October 10, 2008. Accepted October 13, 2008.
| ABSTRACT |
|---|
|
|
|---|
RNAi-based gene-silencing techniques offer a fast and cost-effective way of knocking down genes functions in an easily regulated manner. Exciting progress has been made in recent years in the application of these techniques in basic biomedical research and therapeutic development. However, it remains a difficult task to design effective siRNA experiments with high efficacy and specificity. We present siRecords, an extensive database of mammalian RNAi experiments with consistent efficacy ratings. This database serves two purposes. First, it provides a large and diverse dataset of siRNA experiments. This dataset faithfully represents the general, diverse RNAi experimental practice, and allows more reliable siRNA design tools to be developed with the overfitting problem well curbed. Second, the database helps experimental RNAi researchers directly by providing them with the efficacy and other information about the siRNAs experiments designed and conducted previously against the genes of their interest. The current release of siRecords contains the records of 17 192 RNAi experiments targeting 5086 genes.
| INTRODUCTION |
|---|
|
|
|---|
RNA interference (or RNAi) is a recently discovered, naturally occurring mechanism for sequence-specific, post-transcriptional down-regulation of gene expression (1). Because RNAi-based gene knockdown techniques (using siRNAs, or small interfering RNAs) offer a fast and cost-effective way of disrupting genes functions in an easily regulated manner, rapid progress has been made in recent years in the application of these techniques in basic biomedical research and clinical development. In the basic research domain, siRNAs have become a standard gene knockdown tool routinely used in molecular genetics and function genomics laboratories (2,3). In the clinical domain, several RNAi-based therapies against ocular diseases (e.g. AMD or age-related macular degeneration), virus infection (by Hepatitis B and C, and HIV), cancers (e.g. solid tumors) and inflammatory diseases have reached the clinical or pre-clinical trial stage in development (4–6), and a large number of other RNAi-based potential therapeutic agents are actively being explored (7,8).
The successful employment of an RNAi-based gene knockdown technique depends on the proper design or selection of the siRNAs, and the adoption of an effective strategy to deliver the siRNAs to the target cells or tissues (4,9). The purpose of designing siRNAs is to choose from a large number of candidate siRNA sites the ones likely to achieve high potency/efficacy and good specificity (against off-target activity). A properly devised delivery system (using, e.g. viral or non-viral vectors, conjugates, cationic liposomes, or complexes with peptides, polymers, antibodies and aptamers) helps to improve the stability of the siRNA agent, and reduce or eliminate the innate immune response and/or other harmful side-effects induced by the siRNA agent (5,7,10).
The issue of how to design siRNAs that produce high efficacy is the focus of a large body of recent research work [see recent reviews, e.g. (11–16)]. Since it was discovered that not all siRNAs are equally potent in their ability to silence the gene products (17), a series of studies have pointed to a large number of features that might be correlated to the higher efficacy of RNAi experiments. These features can be roughly classified into three categories. The first category are sequence features, including direct sequence features which are defined based on the nucleotide identity in particular positions of the siRNA, e.g. the 6th nucleotide of the siRNA sequence is a A (18,19), and sequence-derived features, e.g. the G/C content of the siRNA is between 30% and 52% (20), and there are no occurrences of more than three identical nucleotides in consecutive positions (21,22). The second category include features defined based on the thermodynamics of the siRNA, e.g. the binding energy in the n7-n11 region is between –1.97 and –1.65 kcal/mol (23), and features surrounding the concept of siRNA duplex terminal asymmetry, e.g. the difference in binding energy between the n16–n19 region and n1–n4 region is greater than 1 kcal/mol (24). The third category of features are defined based on the target sites on the mRNA, including target location-related features, e.g. the target site is outside of the third quartile of the coding region of the mRNA (25), and features focusing on the target site accessibility (26,27), e.g. the local free energy of the most stable structure is greater than or equal to –20.9 kcal/mol (28). Moreover, recent studies suggested that factors related to experimental settings, e.g. the types of siRNA constructs (29,30), the types of cells used (30–34) as well as the methods applied in examining gene products (35) might also influence the efficacy of the RNAi experiments.
A number of siRNA design tools were established in which various combinations of these features were implemented [see recent reviews, e.g. (15,36)]. However, the controversy continues as for which of these features are truly helpful in selecting high-efficacy siRNAs. Meanwhile, it has been increasingly recognized that many earlier siRNA design studies suffered from the overfitting problem (14,37,38)—a term commonly used in the machine learning field, referring to situations where, consequent upon excessive training of a classifier, the performance of the classifier becomes increasingly better on the training data, but worsens on testing data. The only practical way to overcome the overfitting problem is to make use of a large and diverse training dataset (which approximates the ultimate testing data—the general siRNA experimental practice as a whole) when investigating features or factors associated with the higher siRNA efficacy.
We present siRecords (http://siRecords.umn.edu/siRecords), an extensive database of mammalian RNAi experiments with consistent efficacy ratings. Because siRecords hosts the records of all kinds of siRNA experiments conducted with various laboratory techniques and experimental settings, it is a faithful representation of the general, diverse siRNA experimental practice. Recently, using a dataset compiled from siRecords, we analyzed a large number of reported features for their ability to improve RNAi effectiveness. Through carefully combining the most significant features, we derived a bundle of siRNA design rule sets (called the DRM rule sets) which were subsequently shown to outperform a number of established siRNA design tools in selecting effective siRNAs (14). This work demonstrated the usefulness of the siRecords database.
In this article, we outline the design considerations of the siRecords database, its structure and features, and describe the recent improvements made in the siRecords project.
| DATABASE CONTENT |
|---|
|
|
|---|
siRecords is designed to serve two different purposes: (i) it provides a large and diverse dataset of experimentally validated siRNAs with consistent efficacy ratings, and this dataset can be used by bioinformatics scientists in developing more reliable siRNA design tools, and (ii) it helps experimental RNAi researchers directly by providing the information about what siRNAs have been tested by other researchers against the genes of their interest, and what efficacy levels were achieved in those previous RNAi experiments.
The literature curation and data recording procedures have remained unchanged over the past four years. First, queries are sent to the PubMed database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed) for publications related to RNAi and siRNA. Then, the abstracts of the publications are screened, and the full text articles likely to contain information about RNAi gene silencing experiments are retrieved and further examined. Next, for each article containing descriptions of RNAi experiments, the siRNA sequences, the target genes and other key information about experimental conditions are recorded. This information includes: the cells or tissues in which the RNAi experiments were conducted, the forms of the siRNA agents—chemically synthesized oligos or vector transfected shRNAs, and the methods applied in testing the efficacy of the siRNAs—western blot, RT–PCR or others. The siRNA sequences are aligned with the mRNA sequences of the target genes using bl2seq (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi), and the aligned sequences are recorded.
Moreover, an efficacy rating is assigned to each RNAi experiment, based on the description about the result of the gene silencing experiment made in the article. The efficacy rating scheme was designed with balanced considerations. A very coarse-grained rating scheme (for example, a binary scheme that rates siRNAs with effective and ineffective) would result in poor usefulness of the database because of the limited information it provides. On the other hand, a very fine-grained rating scheme (for example, one that classifies siRNAs into 10 efficacy categories) would lead to difficulty in obtaining accurate ratings, resulting in a less reliable database being produced. We balanced these two factors and chose to use a four-level rating scheme, where the efficacy of an RNAi experiment is rated as very high if the gene product is reduced by more than 90%; it is rated as high if the gene product is reduced by 70–90%; medium if between 50% and 70% of gene knockdown is achieved; and low if less than 50% of gene knockdown is obtained. The informative sentences in the original articles describing the siRNA efficacy are copied down and stored in the original_assessment field in the database. When adequate textual descriptions about the siRNA efficacy are not available, best efforts are made to assign the efficacy rating scores based on the figures (gel images or summary bar-graphs) presented in the articles, and this information (the basis of the efficacy score assignment) is also kept in the original_assessment field in the database.
During the data deposition process, the siRNA sequence that is maintained in the database may undergo some transformations from the original publication into the database. First, it is possible that DNA bases from the published resource are deposited as RNA, one or more bases represented as T may be transformed into U. Second, it is possible that the sense strand or passenger strand of the siRNA sequence is deposited rather than the guide strand. These are known issues that are being actively corrected, but the data are currently heterogeneous as to whether these transformations have occurred or have been corrected. Future releases of siRecords will contain estimates of the degree to which we believe the contents are clean or contain specific kinds of contaminating or transformed data.
There are four major tables in the database schema: SiRecord, which stores the siRNA sequence, key experimental conditions (cell or tissue type, host species, method of making/delivery siRNAs, method of testing efficacy and the test object), original efficacy assessment (sentences related to efficacy assessment in the original articles), and the efficacy rating assigned by siRecords curator; Gene, which stores information about the genes targeted by the siRNAs, including Genbank accession, organism and description of the gene; Correspondent, which stores the contact information of the siRNA origin; and Publication, which stores key information, including PubMed ID and citation data of the original publication.
The current release of siRecords hosts the records of 17 192 RNAi experiments targeting 5086 unique genes, curated from 6122 research articles. The size of the database has more than quadrupled when compared to the first release of the database (Figure 1).
|
The web interface of the database has recently been rewritten. The improved interface includes a siRNA Input Wizard which will guide data contributors to submit their own records of RNAi experiments with ease. Moreover, the primitive siRNA design tool incorporated in the previous release of siRecords has been replaced by siDRM—a recently developed full-featured siRNA design program in which updated DRM rule sets are implemented (39).
| UTILITY |
|---|
|
|
|---|
siRecords can be accessed at http://siRecords.umn.edu/siRecords/. At the main page, the user could query a gene by entering the Genbank accession number or GI number, and the matching records would be presented to the user. After the user selects a record, the record display page will present with all relevant information about the record, including the siRNA sequence, experimental setting, efficacy rating and the source of the record. The links to all other records targeting the same gene, and all other records obtained from the same source is displayed.
Data contributors could submit their own records of RNAi experiments with the help of the siRNA Input Wizard shown in the left panel of the web site (registration is required).
| DATA ACCESS |
|---|
|
|
|---|
The siRecords web site is publically accessible through the URL http://siRecords.umn.edu/siRecords. Academic users can obtain a copy of the current release of the dataset by sending an email to siRecords{at}biocompute.umn.edu.
| IMPLEMENTATION |
|---|
|
|
|---|
The siRecords database is a relational database implemented with MySQL on a Fedora II Linux system running on an Intel DUO core 2 computer. The front-end web interface is implemented as a PHP project running under Apache 2.0.
| FUNDING |
|---|
|
|
|---|
University of Minnesota Graduate School and Minnesota Medical Foundation (partial); NIH/NCI (1R21CA126209, 4R33CA126209) (to T.L.). Funding for open access charge: NIH/NCI
Conflict of interest statement. None declared.
| ACKNOWLEDGEMENTS |
|---|
We thank Q. Xu, X. Zheng, and D. Lin for participating in the curation work in an earlier phase of the project, and the Supercomputing Institute, University of Minnesota for providing computing resources.
| Footnotes |
|---|
The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors.
| REFERENCES |
|---|
|
|
|---|
- Fire A, Xu S, Montgomery MK, Kostas SA, Driver SE, Mello CC. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature (1998) 391:806–811.[CrossRef][Web of Science][Medline]
- Gunsalus KC, Piano F. RNAi as a tool to study cell biology: building the genome-phenome bridge. Curr. Opin. Cell Biol. (2005) 17:3–8.[CrossRef][Web of Science][Medline]
- Xia XG, Zhou H, Xu Z. Promises and challenges in developing RNAi as a research tool and therapy for neurodegenerative diseases. Neurodegener. Dis. (2005) 2:220–231.[CrossRef][Medline]
- Kim DH, Rossi JJ. Strategies for silencing human disease using RNA interference. Nat. Rev. Genet. (2007) 8:173–184.[CrossRef][Web of Science][Medline]
- Hadj-Slimane R, Lepelletier Y, Lopez N, Garbay C, Raynaud F. Short interfering RNA (siRNA), a novel therapeutic tool acting on angiogenesis. Biochimie (2007) 89:1234–1244.[CrossRef][Web of Science][Medline]
- de Fougerolles A, Vornlocher HP, Maraganore J, Lieberman J. Interfering with disease: a progress report on siRNA-based therapeutics. Nat. Rev. Drug Discov. (2007) 6:443–453.[CrossRef][Web of Science][Medline]
- Kuhn R, Streif S, Wurst W. RNA interference in mice. Handbook Exp. Pharmacol. (2007) 149–176.
- Gaither A, Iourgenko V. RNA interference technologies and their use in cancer research. Curr. Opin. Oncol. (2007) 19:50–54.[Web of Science][Medline]
- Inoue A, Sawata SY, Taira K. Molecular design and delivery of siRNA. J. Drug Target (2006) 14:448–455.[CrossRef][Web of Science][Medline]
- Devi GR. siRNA-based approaches in cancer therapy. Cancer Gene Ther. (2006) 13:819–829.[CrossRef][Web of Science][Medline]
- Kurreck J. siRNA Efficiency: Structure or Sequence-That Is the Question. J. Biomed. Biotechnol. (2006) 2006:83757.[Medline]
- Mittal V. Improving the efficiency of RNA interference in mammals. Nat. Rev. Genet. (2004) 5:355–365.[Web of Science][Medline]
- Peek AS, Behlke MA. Design of active small interfering RNAs. Curr. Opin. Mol. Ther. (2007) 9:110–118.[Web of Science][Medline]
- Gong W, Ren Y, Xu Q, Wang Y, Lin D, Zhou H, Li T. Integrated siRNA design based on surveying of features associated with high RNAi effectiveness. BMC Bioinform. (2006) 7:516.[CrossRef][Medline]
- Patzel V. In silico selection of active siRNA. Drug Discov. Today (2007) 12:139–148.[CrossRef][Web of Science][Medline]
- Matveeva O, Nechipurenko Y, Rossi L, Moore B, Saetrom P, Ogurtsov AY, Atkins JF, Shabalina SA. Comparison of approaches for rational siRNA design leading to a new efficient and transparent method. Nucleic Acids Res. (2007) 35:e63.
[Abstract/Free Full Text] - Holen T, Amarzguioui M, Wiiger MT, Babaie E, Prydz H. Positional effects of short interfering RNAs targeting the human coagulation trigger Tissue Factor. Nucleic Acids Res. (2002) 30:1757–1766.
[Abstract/Free Full Text] - Amarzguioui M, Prydz H. An algorithm for selection of functional siRNA sequences. Biochem. Biophys. Res. Commun. (2004) 316:1050–1058.[CrossRef][Web of Science][Medline]
- Takasaki S, Kotani S, Konagaya A. An effective method for selecting siRNA target sequences in mammalian cells. Cell Cycle (2004) 3:790–795.[Web of Science][Medline]
- Reynolds A, Leake D, Boese Q, Scaringe S, Marshall WS, Khvorova A. Rational siRNA design for RNA interference. Nat. Biotechnol. (2004) 22:326–330.[CrossRef][Web of Science][Medline]
- Wang L, Mu FY. A Web-based design center for vector-based siRNA and siRNA cassette. Bioinformatics (2004) 20:1818–1820.
[Abstract/Free Full Text] - Cui W, Ning J, Naik UP, Duncan MK. OptiRNAi, an RNAi design tool. Comput. Methods Programs Biomed. (2004) 75:67–73.[CrossRef][Web of Science][Medline]
- Poliseno L, Evangelista M, Mercatanti A, Mariani L, Citti L, Rainaldi G. The energy profiling of short interfering RNAs is highly predictive of their activity. Oligonucleotides (2004) 14:227–232.[CrossRef][Web of Science][Medline]
- Khvorova A, Reynolds A, Jayasena SD. Functional siRNAs and miRNAs exhibit strand bias. Cell (2003) 115:209–216.[CrossRef][Web of Science][Medline]
- Hsieh AC, Bo R, Manola J, Vazquez F, Bare O, Khvorova A, Scaringe S, Sellers WR. A library of siRNA duplexes targeting the phosphoinositide 3-kinase pathway: determinants of gene silencing for use in cell-based screens. Nucleic Acids Res. (2004) 32:893–901.
[Abstract/Free Full Text] - Tafer H, Ameres SL, Obernosterer G, Gebeshuber CA, Schroeder R, Martinez J, Hofacker IL. The impact of target site accessibility on the design of effective siRNAs. Nat. Biotechnol. (2008) 26:578–583.[CrossRef][Web of Science][Medline]
- Shao Y, Chan CY, Maliyekkel A, Lawrence CE, Roninson IB, Ding Y. Effect of target secondary structure on RNAi efficiency. RNA (2007) 13:1631–1640.
[Abstract/Free Full Text] - Schubert S, Grunweller A, Erdmann VA, Kurreck J. Local RNA target structure influences siRNA efficacy: systematic analysis of intentionally designed binding regions. J. Mol. Biol. (2005) 348:883–893.[CrossRef][Web of Science][Medline]
- McManus MT, Haines BB, Dillon CP, Whitehurst CE, van Parijs L, Chen J, Sharp PA. Small interfering RNA-mediated gene silencing in T lymphocytes. J. Immunol. (2002) 169:5754–5760.
[Abstract/Free Full Text] - McManus MT, Sharp PA. Gene silencing in mammals by small interfering RNAs. Nat. Rev. Genet. (2002) 3:737–747.[CrossRef][Web of Science][Medline]
- Elmaagacli AH, Koldehoff M, Peceny R, Klein-Hitpass L, Ottinger H, Beelen DW, Opalka B. WT1 and BCR-ABL specific small interfering RNA have additive effects in the induction of apoptosis in leukemic cells. Haematologica (2005) 90:326–334.
[Abstract/Free Full Text] - Nicholson LJ, Philippe M, Paine AJ, Mann DA, Dolphin CT. RNA interference mediated in human primary cells via recombinant baculoviral vectors. Mol. Ther. (2005) 11:638–644.[CrossRef][Web of Science][Medline]
- Guan R, Tapang P, Leverson JD, Albert D, Giranda VL, Luo Y. Small interfering RNA-mediated Polo-like kinase 1 depletion preferentially reduces the survival of p53-defective, oncogenic transformed cells and inhibits tumor growth in animals. Cancer Res. (2005) 65:2698–2704.
[Abstract/Free Full Text] - Spankuch-Schmitt B, Bereiter-Hahn J, Kaufmann M, Strebhardt K. Effect of RNA silencing of polo-like kinase-1 (PLK1) on apoptosis and spindle formation in human cancer cells. J. Natl Cancer Inst. (2002) 94:1863–1877.
[Abstract/Free Full Text] - Atkinson PJ, Young KW, Ennion SJ, Kew JN, Nahorski SR, Challiss RA. Altered expression of G(q/11alpha) protein shapes mGlu1 and mGlu5 receptor-mediated single cell inositol 1,4,5-trisphosphate and Ca(2+) signaling. Mol. Pharmacol. (2006) 69:174–184.
[Abstract/Free Full Text] - Birmingham A, Anderson E, Sullivan K, Reynolds A, Boese Q, Leake D, Karpilow J, Khvorova A. A protocol for designing siRNAs with high functionality and specificity. Nat. Protoc. (2007) 2:2068–2078.[CrossRef][Medline]
- Chalk AM, Wahlestedt C, Sonnhammer EL. Improved and automated prediction of effective siRNA. Biochem. Biophys. Res. Commun. (2004) 319:264–274.[CrossRef][Web of Science][Medline]
- Saetrom P, Snove O. A comparison of siRNA efficacy predictors. Biochem. Biophys. Res. Commun. (2004) 321:247–253.[CrossRef][Web of Science][Medline]
- Gong W, Ren Y, Zhou H, Wang Y, Kang S, Li T. siDRM: an effective and generally applicable online siRNA design tool. Bioinformatics (2008) 24:2405–2406.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
J. W. Klingelhoefer, L. Moutsianas, and C. Holmes Approximate Bayesian feature selection on a large meta-dataset offers novel insights on factors that effect siRNA potency Bioinformatics, July 1, 2009; 25(13): 1594 - 1601. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

