Nucleic Acids Research, 2003, Vol. 31, No. 1 397-399
© 2003 Oxford University Press
NLSdb: database of nuclear localization signals
1 CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, BB217, 650 West 168th Street, New York, NY 10032, USA 2 Department of Physics, Columbia University, 538 West 120th Street, New York, NY 10027, USA 3 Columbia University Center for Computational Biology and Bioinformatics (C2B2), Russ Berrie Pavilion, 1150 St Nicholas Avenue, New York, NY 10032, USA 4 North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, BB217, 650 West 168th Street, New York, NY 10032, USA
*To whom correspondence should be addressed. Tel: +1 212 305 3773; Fax: +1 212 305 7932; Email: nair{at}cubic.bioc.columbia.edu
Received August 15, 2002; Accepted September 11, 2002
ABSTRACT
NLSdb is a database of nuclear localization signals (NLSs) and of nuclear proteins. NLSs are short stretches of residues mediating transport of nuclear proteins into the nucleus. The database contains 114 experimentally determined NLSs that were obtained through an extensive literature search. Using in silico mutagenesis this set was extended to 308 experimental and potential NLSs. This final set matched over 43% of all known nuclear proteins and matches no currently known non-nuclear protein. NLSdb contains over 6000 predicted nuclear proteins and their targeting signals from the PDB and SWISS-PROT/TrEMBL databases. The database also contains over 12 500 predicted nuclear proteins from six entirely sequenced eukaryotic proteomes (Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana and Saccharomyces cerevisiae). NLS motifs often co-localize with DNA-binding regions. This observation was used to also annotate over 1500 DNA-binding proteins. NLSdb can be accessed via the web site: http://cubic.bioc.columbia.edu/db/NLSdb/.
INTRODUCTION
Extraction and testing of NLS motifs
Proteins are actively transported into the nucleus by binding to specific molecules such as importins and karyopherins that recognize distinct targeting signals (1). The targeting signal is usually a short stretch of consecutive residues and is commonly referred to as the nuclear localization signal (NLS). Experimentally best characterized are mono-partite and bi-partite motifs. Most mono-partite motifs are characterized by a cluster of positively charged residues preceded by a helix-breaking residue. Most bipartite motifs consist of two clusters of basic residues separated by 912 residues. Over the last few years a large number of distinct NLSs have been experimentally implicated in nuclear transport (2,3). However, NLSs have been experimentally determined for fewer than 10% of known nuclear proteins. To remedy this situation we devised a procedure of in silico mutagenesis to discover new NLSs (4). Briefly this procedure works as follows. (i) Change or remove some residues from the experimentally characterized NLS motifs and monitor the resulting true (nuclear) and false (non-nuclear) matches. Obviously, allowing alternative residues at particular positions increased the number of nuclear proteins found. However, often this also increased the number of matching non-nuclear proteins. (ii) Discard any potential NLSs that are found in known non-nuclear proteins (false matches). (iii) Require that potential NLSs be found in at least two distinct nuclear families. The 194 potential NLSs discovered using this procedure increased the coverage of known nuclear proteins to 43%. All proteins in the PDB and SWISS-PROT/TrEMBL (5) database were annotated using the full list of experimental and potential NLSs. We also annotated all sequences in the yeast, worm, fruit fly and human proteomes. Approximately 20% of the NLS motifs were observed to co-localize with experimentally determined DNA-binding region of proteins (4,6). These motifs were used to annotate DNA-binding proteins.
General interest
NLSdb is a comprehensive source of information regarding NLSs and proteins translocated into the nucleus by signal sequences. Targeting signal recognition is a key control point in the regulation of nuclear transport. A database of NLS motifs is therefore a useful resource for biologists in identifying targeting signals in their sequence. The database describes all experimentally determined NLS motifs with links to original references in PubMed (7). The information provided by our tool has already been useful for experimental studies of nuclear targeting.
DATABASE DESCRIPTION
Interface
The data are stored and managed using the portal of the Sequence Retrieval System (SRS) (8). SRS provides a convenient and robust framework for managing molecular databases. This provides users with quick, efficient search, retrieval and display methods that work for any web browser. Using SRS, the information in NLSdb can be easily integrated with other public and proprietary databases. The database is continuously updated and refined from the primary literature.
Format and fields
NLSdb has been formated in an EMBL-like flat-file format, thus allowing indexing of the database in SRS (8). Each NLSdb entry describes a nuclear localization signal. Each entry is organized into six major fields; (i) Origin; (ii) Annotation; (iii) Reference; (iv) Confidence; (v) Proteins; and (vi) DNA binding. The Origin field describes whether the NLS has been found by direct experiments, or if it is a potential NLS discovered through our in silico mutagenesis. For experimentally determined NLSs, further information is provided in the fields Annotation and Reference. The Annotation field describes the protein family in which the experimental NLS was first established and the Reference field gives the primary literature citation. The Reference field also contains a link to PubMed for each citation. The Confidence field is an indicator of our confidence in the NLS; it consists of two sub-fields; Total confidence and % Nuclear. Total confidence is the number of localization annotated proteins from SWISS-PROT/TrEMBL in which this NLS is found and % Nuclear is the percentage of these that are annotated as nuclear in SWISS-PROT/TrEMBL. The Proteins field lists proteins from various databases that are likely to be targeted to the nucleus since they match the given NLS motif. Currently the Proteins field contains proteins from the SWISS-PROT/TrEMBL, PDB and the PEP (9) databases. All protein entries are linked to the original entries in the respective databases. The DNA binding field describes whether the NLS overlaps with known DNA-binding regions of proteins. NLSdb can be browsed either starting with the NLS entries or with any of the data-fields defined above.
Searches
All data-fields in NLSdb can be searched using standard Boolean queries. Proteins in NLSdb can be identified through their SWISS-PROT/TrEMBL, PDB or PEP identifiers. NLS motifs can be queried by providing a string of one-letter amino acid codes. Database entries can be downloaded using the save Complete entries functionality in SRS.
Annotations for entirely sequenced eukaryotic proteomes
Using the full set of experimental plus potential (discovered through in silico mutagenesis) NLS motifs in NLSdb, we found over 12 500 proteins with NLS in six entirely sequenced eukaryotic proteomes (Table 1).
|
CONCLUSIONS
NLSdb can greatly help in better understanding signal dependent nuclear transport of proteins. The potential NLS motifs discovered through in silico mutagenesis can aid in discovering new signal sequences involved in nuclear targeting. A future goal is to integrate NLSdb with all sequences in the SWISS-PROT/TrEMBL database and all proteomes in the PEP database.
NLSdb should be cited with the present publication as reference. The database can be accessed through the World Wide Web at: http://cubic.bioc.columbia.edu/db/NLSdb/.
ACKNOWLEDGEMENTS
Thank you to Jinfeng Liu (Columbia University) for computer assistance and the collection of genome data sets and to Jinfeng Liu and Dariusz Przybylski (Columbia University) for providing preliminary information and programs. P.C. and B.R. were supported by the grant 1-P50-GM62413-01 from the National Institutes of Health (NIH); R.N. and B.R. were supported by the grant DBI-0131168 from the National Science Foundation (NSF). Last, but not least, thank you to Amos Bairoch (SIB, Geneva) and Rolf Apweiler (EBI, Hinxton) and their crews for maintaining excellent databases and to all experimentalists without whom we could not have built our database.
REFERENCES
- Tinland,B., Koukolikova-Nicola,Z., Hall,M.N. and Hohn,B. (1992) The T-DNA-linked VirD2 protein contains two distinct functional nuclear localization signals. Proc. Natl Acad. Sci. USA, 89, 74427446.
[Abstract/Free Full Text] - Mattaj,I.W. and Englmeier,L. (1998) Nucleocytoplasmic transport: the soluble phase. Annu. Rev. Biochem., 67, 265306.[CrossRef][Web of Science][Medline]
- Jans,D.A., Xiao,C.Y. and Lam,M.H. (2000) Nuclear targeting signal recognition: a key control point in nuclear transport? Bioessays, 22, 532544.[CrossRef][Web of Science][Medline]
- Cokol,M., Nair,R. and Rost,B. (2000) Finding nuclear localization signals. EMBO Rep., 1, 411415.[CrossRef][Web of Science][Medline]
- Bairoch,A. and Apweiler,R. (2000) Nuclear localization signals overlap DNA- or RNA-binding domains in nucleic acid-binding proteins. Nucleic Acids Res., 28, 4548.
[Abstract/Free Full Text] - LaCasse,E.C. and Lefebvre,Y.A. (1995) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 23, 16471656.
[Free Full Text] - Airozo,D., Allard,R., Brylawski,B., Canese,K., Kenton,D., Knecht,L., Krasnov,S., Sandomirskiy,V., Sirotinin,V., Starchenko,G. et al. (1999). MEDLINE. National Library of Medicine (NLM), Vol. 1999.
- Etzold,T., Ulyanov,A. and Argos,P. (1996) SRS: information retrieval system for molecular biology data banks. Methods Enzymol., 266, 114128.[Web of Science][Medline]
- Carter,P., Liu,J. and Rost,B. (2003) PEP: Predictions for Entire Proteomes. Nucleic Acids Res., 31, 410413.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
A. Wilczynska, N. Minshall, J. Armisen, E. A. Miska, and N. Standart Two Piwi proteins, Xiwi and Xili, are expressed in the Xenopus female germline RNA, February 1, 2009; 15(2): 337 - 345. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. P. Meares and R. S. Jope Resolution of the Nuclear Localization Mechanism of Glycogen Synthase Kinase-3: FUNCTIONAL EFFECTS IN APOPTOSIS J. Biol. Chem., June 8, 2007; 282(23): 16989 - 17001. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Shatkay, A. Hoglund, S. Brady, T. Blum, P. Donnes, and O. Kohlbacher SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data Bioinformatics, June 1, 2007; 23(11): 1410 - 1417. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Wang, N. Kakinuma, Y. Zhu, and R. Kiyama Nucleo-cytoplasmic shuttling of human Kank protein accompanies intracellular translocation of {beta}-catenin J. Cell Sci., October 1, 2006; 119(19): 4002 - 4010. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Albor, S. El-Hizawi, E. J. Horn, M. Laederich, P. Frosk, K. Wrogemann, and M. Kulesz-Martin The Interaction of Piasy with Trim32, an E3-Ubiquitin Ligase Mutated in Limb-girdle Muscular Dystrophy Type 2H, Promotes Piasy Degradation and Regulates UVB-induced Keratinocyte Apoptosis through NF{kappa}B J. Biol. Chem., September 1, 2006; 281(35): 25850 - 25866. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Carroll and V. Pavlovic Protein classification using probabilistic chain graphs and the Gene Ontology structure Bioinformatics, August 1, 2006; 22(15): 1871 - 1878. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Hoglund, P. Donnes, T. Blum, H.-W. Adolph, and O. Kohlbacher MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition Bioinformatics, May 15, 2006; 22(10): 1158 - 1165. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Uhlmann-Schiffler, C. Jalal, and H. Stahl Ddx42p--a human DEAD box protein with RNA chaperone activities Nucleic Acids Res., January 5, 2006; 34(1): 10 - 22. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Xi, C. Doan, D. Liu, and T. Xie Pelota controls self-renewal of germline stem cells by repressing a Bam-independent differentiation pathway Development, December 15, 2005; 132(24): 5365 - 5374. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Wren, W. H. Hildebrand, S. Chandrasekaran, and U. Melcher Markov model recognition and classification of DNA/protein sequences within large text databases Bioinformatics, November 1, 2005; 21(21): 4046 - 4053. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Umeda, S. Izaddoost, I. Cushman, M. S. Moore, and S. Sazer The Fission Yeast Schizosaccharomyces pombe Has Two Importin-{alpha} Proteins, Imp1p and Cut15p, Which Have Common and Unique Functions in Nucleocytoplasmic Transport and Cell Cycle Progression Genetics, September 1, 2005; 171(1): 7 - 21. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. C. Carles, D. Choffnes-Inada, K. Reville, K. Lertpiriyapong, and J. C. Fletcher ULTRAPETALA1 encodes a SAND domain putative transcriptional regulator that controls shoot and floral meristem activity in Arabidopsis Development, March 1, 2005; 132(5): 897 - 911. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Cho, D.-U. Kim, and J. H. Kehrl RGS14 Is a Centrosomal and Nuclear Cytoplasmic Shuttling Protein That Traffics to Promyelocytic Leukemia Nuclear Bodies Following Heat Shock J. Biol. Chem., January 7, 2005; 280(1): 805 - 814. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kadura, X. He, V. Vanoosthuyse, K. G. Hardwick, and S. Sazer The A78V Mutation in the Mad3-like Domain of Schizosaccharomyces pombe Bub1p Perturbs Nuclear Accumulation and Kinetochore Targeting of Bub1p, Bub3p, and Mad3p and Spindle Assembly Checkpoint Function Mol. Biol. Cell, January 1, 2005; 16(1): 385 - 395. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Rost, G. Yachdav, and J. Liu The PredictProtein server Nucleic Acids Res., July 1, 2004; 32(suppl_2): W321 - W326. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nair and B. Rost LOCnet and LOCtarget: sub-cellular localization for structural genomics targets Nucleic Acids Res., July 1, 2004; 32(suppl_2): W517 - W521. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. la Cour, L. Kiemer, A. Molgaard, R. Gupta, K. Skriver, and S. Brunak Analysis and prediction of leucine-rich nuclear export signals Protein Eng. Des. Sel., June 1, 2004; 17(6): 527 - 536. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Tao, M. Kruhlak, S. Xia, E. Androphy, and Z.-M. Zheng Signals That Dictate Nuclear Localization of Human Papillomavirus Type 16 Oncoprotein E6 in Living Cells J. Virol., December 15, 2003; 77(24): 13232 - 13247. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. M. Fontes, T. Teh, D. Jans, R. I. Brinkworth, and B. Kobe Structural Basis for the Specificity of Bipartite Nuclear Localization Sequence Binding by Importin-{alpha} J. Biol. Chem., July 18, 2003; 278(30): 27981 - 27987. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Rost and J. Liu The PredictProtein server Nucleic Acids Res., July 1, 2003; 31(13): 3300 - 3304. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nair and B. Rost LOC3D: annotate sub-cellular localization for protein structures Nucleic Acids Res., July 1, 2003; 31(13): 3337 - 3340. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









