| Nucleic Acids Research | Pages |
HUGE: a database for human large proteins identified by Kazusa cDNA sequencing project
Introduction
Contents Of Gene/Protein Characteristic Table
How To Access Gene/Protein Characteristic Tables Of Interest
Acknowledgements
References
HUGE: a database for human large proteins identified by Kazusa cDNA sequencing project
ABSTRACT
INTRODUCTION
Kazusa DNA Research Institute has been conducting a cDNA sequencing project for prediction of primary structures of unidentified human proteins. In particular, we have been interested in long cDNA clones which direct synthesis of large proteins (>50 kDa) (1). To date, we have deposited more than 700 human cDNA sequences (average size: 5.1 kb) in public databases (2). Substantial increment of our cDNA sequence data has prompted us to generate a database for protein sequences predicted by the cDNA analysis, as future functional studies using these cDNAs would inevitably require systematic and comprehensive overview of the predicted protein sequence data. In this context, this database, called `HUGE' (Human Unidentified Gene-Encoded large protein database), was constructed to provide more detailed information concerning the predicted primary structures by the Kazusa cDNA project than those retrievable from the public databases. Since we make cDNA clones publicly available for research purposes, once the sequence data are deposited to the GenBank/EMBL/DDBJ databases, HUGE is also expected to provide practically important information of clones to worldwide clone users. While HUGE focuses mainly on the characteristics of predicted primary structures, other important information concerning cDNA clones from the genomic viewpoint is compiled in the Kazusa human cDNA database at http://www.kazusa.or.jp/cDNA
CONTENTS OF GENE/PROTEIN CHARACTERISTIC TABLE
Because all genes newly characterized by the Kazusa cDNA project are conventionally identified by KIAA plus a four figure number, these KIAA numbers are used as primary gene identifiers in HUGE. Gene/protein characteristic tables summarize the results of sequence analyses of cloned cDNAs and the predicted proteins. Each HUGE entry has its own table (Fig.
Figure 1. A typical overview of a gene/protein characteristic table in HUGE. This gene/protein characteristic table is for KIAA0621. There are some links to raw data of the sequence analyses (e.g., the GeneMark coding prediction and FASTA homology search) done for this cDNA and protein. See text for a description of the fields. The table begins with a section indicating the accession number of the cDNA sequence in the public database, the alias name of the gene, the clone name, and the biological source of the cDNA library from which the clone was isolated. The next section describes characteristics of the cloned cDNA sequence and includes four subsections. The first two subsections show the length of the cloned DNA sequence and the physical map constructed from the actual sequence data of the isolated cDNA clone. The open reading frame and untranslated regions are shown by solid and open boxes, respectively, and the position of the first ATG codon is indicated by a triangle. Alu and other repetitive sequences detected by RepeatMasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html ) are also displayed by dotted and hatched boxes, respectively, in the physical map. The next subsection offers the restriction map of the isolated cDNA generated by using the list of commercially available restriction enzymes (3). The last subsection describes prediction of the protein coding region by GeneMark analysis and linked to the graphical output of the GeneMark-RC analysis (4). The GeneMark analysis also gives warnings for N-terminal truncation and for spurious interruption of the coding region, if detected. As for the clones warned by the GeneMark analysis, we performed additional experiments using reverse transcription-coupled polymerase chain reaction (RT-PCR) to eliminate artifacts in cloning. Details of the evaluation of the cDNA sequences by the GeneMark analysis will be reported elsewhere (manuscript in preparation). The third section, in which we overview various characteristics of the predicted protein sequences, is divided into five subsections. The length of the predicted protein sequence is indicated in the first subsection. The second subsection, which is optional, describes whether the translation is conceptual or not; when error(s) in cloning (e.g., frameshift, or nonsense mutation, or retention of intron) were experimentally detected in the clone actually sequenced, translation was carried out not for the sequence of the isolated cDNA but for the experimentally corrected one. The next two subsections show the results of the homology searches against OWL database (5) and other HUGE entries. The top five entries given expectation values less than 0.001 by FASTA (6) are aligned along the query sequence in a graphical overview. Numbers at the left and the right sides of black lines in the overview indicate the numbers of amino acid residues of non-homologous N-terminal and C-terminal portions of the homologous entries, respectively. The FASTA output and the multiple alignment of these entries can be viewed by clicking. The last subsection illustrates the results of the motif/profile analysis and the prediction of transmembrane helical segments. Although the PROSITE database (7) was used for the motif analysis, the following relaxed sequence motifs were excluded from the analysis because they appear too often and are considered to be less informative: amidation site, N-glycosylation site, cAMP- and cGMP-dependent protein kinase phosphorylation site, casein kinase II phosphorylation site, protein kinase C phosphorylation site and tyrosine kinase phosphorylation site. The profile analysis was conducted with profile entries in the PROSITE database by using the pfscan program in the pftools package (ftp://ulrec3.unil.ch/pub/pftools ). Membrane-spanning regions were predicted by the SOSUI program (8). The last section is optional and shows the expression pattern at the mRNA level determined by RT-PCR coupled with enzyme-linked immunosorbent assay (ELISA). By using external control reactions with the authentic plasmid, the mRNA levels are expressed as equivalent amounts of the authentic plasmid DNA (fg) per ng of poly(A)+ RNA. For an at-a-glance overview, the mRNA levels are displayed in colors using the digit-color conversion panel shown in this section.
HOW TO ACCESS GENE/PROTEIN CHARACTERISTIC TABLES OF INTEREST
The home page allows users to easily reach a gene/protein characteristic table of interest by three different approaches. The first is to directly enter the list of gene/protein characteristic tables. The second is to search for tables that contain query keywords. Tables thus found can be further confined by adding another keyword one by one. The search can be carried out not only on the entire fields in the gene/protein characteristic table but also on a specified field such as motif/profile information and FASTA results. It is also possible to retrieve HUGE entries according to the size of cDNAs/proteins and the number of the predicted transmembrane segments. As the third approach, it is also possible to search for a gene/protein characteristic table of interest by the FASTA homology search from a user's query sequence (either nucleotide or amino acid sequence) against HUGE.
ACKNOWLEDGEMENTS
We thank Takatsugu Hirokawa, Seah Boon-Chieng, and Shigeki Mitaku for allowing us to use the SOSUI program for prediction of transmembrane helical regions. We also thank Makoto Hirosawa for providing us with the results of GeneMark-RC analysis. This work was supported by the Kazusa DNA Research Institute Foundation.
REFERENCES
This article has been cited by other articles:
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
![]()
![]()

![]()
![]()
![]()
L. Martini, M. Waldhoer, M. Pusch, V. Kharazia, J. Fong, J. H. Lee, C. Freissmuth, and J. L. Whistler
Ligand-induced down-regulation of the cannabinoid 1 receptor is mediated by the G-protein-coupled receptor-associated sorting protein GASP1
FASEB J,
March 1, 2007;
21(3):
802 - 811.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
R. Roepman, S. J. F. Letteboer, H. H. Arts, S. E. C. van Beersum, X. Lu, E. Krieger, P. A. Ferreira, and F. P. M. Cremers
Interaction of nephrocystin-4 and RPGRIP1 is disrupted by nephronophthisis or Leber congenital amaurosis-associated mutations
PNAS,
December 20, 2005;
102(51):
18520 - 18525.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
D. Vitour, P. Lindenbaum, P. Vende, M. M. Becker, and D. Poncet
RoXaN, a Novel Cellular Protein Containing TPR, LD, and Zinc Finger Motifs, Forms a Ternary Complex with Eukaryotic Initiation Factor 4G and Rotavirus NSP3
J. Virol.,
April 15, 2004;
78(8):
3851 - 3862.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
R. Kikuno, T. Nagase, M. Nakayama, H. Koga, N. Okazaki, D. Nakajima, and O. Ohara
HUGE: a database for human KIAA proteins, a 2004 update integrating HUGEppi and ROUGE
Nucleic Acids Res.,
January 1, 2004;
32(90001):
D502 - 504.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. L. Whistler, J. Enquist, A. Marley, J. Fong, F. Gladher, P. Tsuruda, S. R. Murray, and M. von Zastrow
Modulation of Postendocytic Sorting of G Protein-Coupled Receptors
Science,
July 26, 2002;
297(5581):
615 - 620.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
W. Cao, S. N. Mattagajasingh, H. Xu, K. Kim, W. Fierlbeck, J. Deng, C. J. Lowenstein, and B. J. Ballermann
TIMAP, a novel CAAX box protein regulated by TGF-beta 1 and expressed in endothelial cells
Am J Physiol Cell Physiol,
July 1, 2002;
283(1):
C327 - C337.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
G. S. Taylor, T. Maehama, and J. E. Dixon
Myotubularin, a protein tyrosine phosphatase mutated in myotubular myopathy, dephosphorylates the lipid second messenger, phosphatidylinositol 3-phosphate
PNAS,
July 12, 2000;
(2000)
160255697.
[Abstract]
[Full Text]
![]()
![]()
![]()

![]()
![]()
![]()
N. Kelkar, S. Gupta, M. Dickens, and R. J. Davis
Interaction of a Mitogen-Activated Protein Kinase Signaling Module with the Neuronal Protein JIP3
Mol. Cell. Biol.,
February 1, 2000;
20(3):
1030 - 1043.
[Abstract]
[Full Text]
![]()
![]()
![]()

![]()
![]()
![]()
R. Kikuno, T. Nagase, M. Suyama, M. Waki, M. Hirosawa, and O. Ohara
HUGE: a database for human large proteins identified in the Kazusa cDNA sequencing project
Nucleic Acids Res.,
January 1, 2000;
28(1):
331 - 332.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
G. S. Taylor, T. Maehama, and J. E. Dixon
Inaugural Article: Myotubularin, a protein tyrosine phosphatase mutated in myotubular myopathy, dephosphorylates the lipid second messenger, phosphatidylinositol 3-phosphate
PNAS,
August 1, 2000;
97(16):
8910 - 8915.
[Abstract]
[Full Text]
[PDF]
![]()
This Article ![]()
![]()
Abstract
![]()
Print PDF (333K)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Similar articles in ISI Web of Science
![]()
Similar articles in PubMed
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Search for citing articles in:
ISI Web of Science (20)
![]()
Request Permissions ![]()
Commercial Re-use Guidelines
for Open Access NAR Content
![]()
Google Scholar ![]()
![]()
Articles by Suyama, M.
![]()
Articles by Ohara, O.
![]()
Search for Related Content
![]()
PubMed ![]()
![]()
PubMed Citation
![]()
Articles by Suyama, M.
![]()
Articles by Ohara, O.
![]()
Social Bookmarking ![]()
![]()
What's this?