Skip Navigation


Nucleic Acids Research Advance Access originally published online on December 18, 2007
Nucleic Acids Research 2008 36(Database issue):D793-D799; doi:10.1093/nar/gkm999
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (2728K) Freely available
Right arrow Screen PDF (541K) Freely available
Right arrowOA All Versions of this Article:
36/suppl_1/D793    most recent
gkm999v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2008, Vol. 36, Database issue D793-D799
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article appears in the following Nucleic Acids Research issue: Database issue [View the issue table of contents]

Articles

The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts*

Genome Information Integration Project And H-Invitational 2

Received September 16, 2007. Revised October 20, 2007. Accepted October 22, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 THE ANNOTATION IN OUR...
 NEW ANNOTATED FEATURES IN...
 COMPREHENSIVE ANNOTATION...
 H-InvDB New Identifier
 H-InvDB Data Availability
 LIST OF AUTHORS FOR...
 REFERENCES
 
Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/), a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of full-length cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein–protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 THE ANNOTATION IN OUR...
 NEW ANNOTATED FEATURES IN...
 COMPREHENSIVE ANNOTATION...
 H-InvDB New Identifier
 H-InvDB Data Availability
 LIST OF AUTHORS FOR...
 REFERENCES
 
Human transcripts represent a biologically and functionally rich format for examining the structure of human genes and alternative splicing isoforms. In particular, cloning and sequencing of full-length cDNAs (FLcDNAs) that cover all exons but no introns can facilitate the precise determination of human gene structure (1). Studies on human transcripts have thus been systematically and extensively carried out to draw the outline of the human transcriptome (2–6). The human transcriptome consists of protein-coding mRNAs and non-coding functional RNAs. Analysis of these sequences will provide insights into how genomic information is transformed into higher order biological phenomena. By comparative analysis of the transcriptome with the human genome, we will be able to determine the transcribed regions of the genome and better understand the regulatory machinery of transcription (7, 8). It is therefore of great significance to collect information about human transcripts as well as their annotations. We thus held the first international workshop entitled ‘Human Full-length cDNA Annotation Invitational’ (abbreviated as H-Invitational or H-Inv) in Tokyo, Japan from 25th August to 3rd September 2002, and constructed a novel, integrative database of the human transcriptome, called H-InvDB (9,10). This consists of the annotation of 42 421 human FLcDNAs, collected from six high-throughput producers of human FLcDNAs in the world human gene collections.

To cover the increased number of human FLcDNAs since the initial release of H-InvDB, we held the second international annotation meeting entitled ‘H-Invitational 2 Functional Annotation Jamboree’ (abbreviated as H-Invitational 2 or H-Inv2) in Tokyo, Japan from 15th to 20th November 2003. The second major release of H-InvDB (release 2.0) was based on the annotation carried out at the H-Inv2 annotation jamboree. After H-Inv2, we initiated the Genome Information Integration Project (GIIP) and held the third and fourth annotation meetings in October 2005 and October 2006. The products of those two annotation meetings comprised releases 3.0 and 4.0 of H-InvDB. The increases in the number of entries in H-InvDB are summarized in Table 1.


View this table:
[in this window]
[in a new window]

 
Table 1. Statistics of H-InvDB entries

 

    THE ANNOTATION IN OUR LATEST UPDATE, H-InvDB 2007
 TOP
 ABSTRACT
 INTRODUCTION
 THE ANNOTATION IN OUR...
 NEW ANNOTATED FEATURES IN...
 COMPREHENSIVE ANNOTATION...
 H-InvDB New Identifier
 H-InvDB Data Availability
 LIST OF AUTHORS FOR...
 REFERENCES
 
In our latest release H-InvDB_4.6, we annotated 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD) in addition to 54 978 human FLcDNAs that were available on 15th June 2006. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 643 (1.9%) non-protein-coding loci, while 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. We basically followed the mapping technique we described previously (9,10). We updated annotation for the mitochondrial transcripts since the previous major release, H-InvDB_4.0, which resulted in a slightly decreased number for the transcripts and clusters. Then we assigned a standardized functional annotation to each H-Inv transcript by human curation, based on the results of similarity searches and InterProScan (11). The numbers of manually curated human proteins in each category are summarized in Table 2.


View this table:
[in this window]
[in a new window]

 
Table 2. Statistics of manually curated representative H-Inv proteins

 
For these transcripts and genes, we provide comprehensive annotation including descriptions of their gene structures, alternative splicing isoforms, functional non-protein-coding RNAs, functional domains of proteins, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene-expression profiles, orthologous genes and evolutionary features in model animals, protein–protein interaction (PPI) and annotation for gene families. We have also annotated several new features related to transcript quality.


    NEW ANNOTATED FEATURES IN H-InvDB
 TOP
 ABSTRACT
 INTRODUCTION
 THE ANNOTATION IN OUR...
 NEW ANNOTATED FEATURES IN...
 COMPREHENSIVE ANNOTATION...
 H-InvDB New Identifier
 H-InvDB Data Availability
 LIST OF AUTHORS FOR...
 REFERENCES
 
Classification of ncRNA
We annotated the transcripts that do not have homology to known protein-coding genes or InterPro-domain-containing genes as non-protein-coding transcript candidates. We classified 1216 non-protein-coding transcripts into ‘Identical to known ncRNA’ (124), ‘Similar to known ncRNA’ (74) and ‘Putative ncRNA’ (1018) by homology with known ncRNA databases and discrimination analysis

Sequence quality features: nonsense-mediated decay (NMD), read-through, reverse orientation
A total of 269 transcripts were annotated as candidates of read-through and 2731 as targets of NMD by the extended sequence quality annotation.

Category VII: pseudogene candidates
To annotate transcribed pseudogene candidates, we did the following: First, we filtered out the functional protein-coding genes by only targeting representative category II transcripts and those identified to have frame shifts and/or nonsense mutations; Second, we predicted transcribed pseudogene candidates based on a support vector machine (SVM) method. In the current release, we annotated 1112 transcribed pseudogene candidates (Category VII).

Annotation of gene families/groups
We annotated four selected gene families/groups: T-cell receptor (TCR), Immunoglobulin (Ig), Major Histocompatibility Complex (MHC) or Human Leukocyte Antigen (HLA) and Olfactory receptor (OR) using the original pipeline based on sequence analysis against genome and protein databases complemented by a text-mining approach. In the current release, we identified 15 TCR, 21 Ig, 72 MHC and 122 OR gene clusters.

All the annotation items and features of H-Inv transcript sequences are stored and shown in the main views or sub-databases in H-InvDB.


    COMPREHENSIVE ANNOTATION RESOURCES IN H-InvDB
 TOP
 ABSTRACT
 INTRODUCTION
 THE ANNOTATION IN OUR...
 NEW ANNOTATED FEATURES IN...
 COMPREHENSIVE ANNOTATION...
 H-InvDB New Identifier
 H-InvDB Data Availability
 LIST OF AUTHORS FOR...
 REFERENCES
 
The current H-InvDB annotation resources consist of two main views, Transcript view and Locus view, and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group view with the appropriate cross-links. An overview of the comprehensive annotation resources of the human gene and transcripts in H-InvDB is shown in Figure 1.


Figure 1
View larger version (98K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. H-InvDB: overview of the comprehensive annotation resource for the human genes and transcripts. The current H-InvDB annotation resources consist of two main views, Transcript view and Locus view, and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group view. The Transcript view and the Locus view are the main viewers to display the annotation of each H-Invitational transcript (HIT) and H-Invitational cluster (HIX). The DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group view are sub-databases to provide detailed annotation for each annotation feature. The links to related databases are provided from the appropriate viewers.

 
Transcript view
The transcript view shows all the annotation of the H-Inv transcript in 12 section tabs: (i) gene structure, (ii) gene function, (iii) gene ontology, (iv) predicted CDS, (v) functional motif, (vi) sub cellular localization, (vii) protein structure information, (viii) gene expression, (ix) disease/pathology, (x) evolutionary information, (xi) polymorphism (SNP, indel and microsatellite) and interspersed repeat information and (xii) transcript and sequence quality information. As seen in the example of a transcript view shown in Figure 1, this view also has links to many external public databases including DDBJ/EMBL/GenBank, RefSeq, UniProtKB, HGNC, InterPro, Ensembl, EntrezGene, PubMed, dbSNP, GO and GTOP and to web sites of the original data producers of the FLcDNA clones and sequences including the Chinese National Human Genome Center (CHGC), German cDNA Consortium (DKFZ/MIPS), Helix Research Institute, Inc. (HRI), the Institute of Medical Science in the University of Tokyo (IMSUT), the Kazusa DNA Research Institute (KDRI), the Mammalian Gene Collection (MGC/NCI) and NEDO. This view was previously known as the cDNA view (mRNA view).

Locus view
The Locus view shows all the annotation of a locus in six section tabs: (i) gene structure and location in the human genome, (ii) gene function, (iii) alternative splicing pattern, (iv) gene expression, (v) disease/pathology and (vi) cluster member information. As seen in the example of a Locus view shown in Figure 1, it shows links to external public databases including DDBJ/EMBL/GenBank, RefSeq, EntrezGene, GeneCards, HGNC and OMIM.

DiseaseInfo Viewer
The DiseaseInfo Viewer is a database of known and orphan genetic diseases and their relation to H-Inv clusters with EntrezGene and OMIM cross-links. The DiseaseInfo Viewer provides two kinds of disease information related to H-Inv clusters: known disease-related genes and co-localized orphan diseases. An orphan disease is defined as a disease mapped on a chromosomal region, but for which the responsible gene has not been identified yet. Co-localization does not necessarily mean a direct relationship between gene and disease; however, genes that are cytogenetically co-localized with a disease could be possible candidate genes for that disease. The co-localized H-Inv clusters are chosen by computing the physical range of each cytogenetic band with a 1 Mbp margin.

Human anatomic gene expression library (H-ANGEL)
H-ANGEL is a database of expression patterns that we constructed to obtain a broad outline of such patterns for human genes (12). We collected gene-expression data in normal and adult human tissues that were generated by three types of methods and in seven different platforms, including: iAFLP, a PCR-based quantitative expression profiling method; DNA arrays (long oligomers, short oligomers and cDNA microarrays); and cDNA sequence tags (SAGE, EST, BodyMap and MPSS). The H-ANGEL database comprises the largest and most comprehensive collection of gene expression patterns so far, which also provides a classification of human genes in terms of their expression.

Clustering Viewer
The Clustering Viewer facilitates the comparisons of different clustering. It allows users to see whether H-Inv transcripts are consistently clustered by different clustering methods. It also displays multiple alignments of transcripts by using CLUSTALW (13). The Clustering Viewer shows all the member transcripts of an H-Inv cluster to which a query sequence belongs.

G-integra
G-integra is an integrated genome browser, in which we can examine the genomic structures of the transcripts. As seen in an example view in Figure 1, the location in the human genome and gene structure of H-Inv transcript (green), and the corresponding RefSeq and Ensembl entries are shown. The structures of the genes and transcripts for 11 non-human species, Pan troglodytes (chimpanzee), Macaca sp. (macaque), Mus musculus (mouse), Rattus norvegicus (rat), Canis familiaris (dog), Bos taurus (cow), Monodelphis domestica (opossum), Gallus gallus (chicken), Danio rerio (zebrafish), Tetraodon nigroviridis (tetraodon) and Takifugu rubripes (fugu) can be optionally displayed for comparison. Other options allow the, the results of gene prediction programs such as GenScan (14), HMMgene (15), FGENESH (16) and JIGSAW (17) to be displayed.

TOPO Viewer
The TOPO Viewer is a tool for viewing subcellular targeting signals predicted by TargetP (18) and the presence of transmembrane helices predicted by SOSUI (19) and TMHMM(20). The probabilities that a protein may be delivered to up to nine distinct sub cellular locations are predicted by WoLF PSORT (21). TargetP predicts whether a protein contains a signal peptide, a mitochondrial targeting signal or any other type of signal. The TOPO Viewer consists of four tab pages: TABLE, MAP, FILE and GFP. The TABLE tab page displays the prediction results for all the programs used.

Evola
Evola is a database of evolutionary annotation of human genes (22). It provides sequence alignments and phylogenetic trees of manually curated orthologous genes among human and 11 model organisms, Pan troglodytes (chimpanzee), Macaca sp. (macaque), Mus musculus (mouse), Rattus norvegicus (rat), Canis familiaris (dog), Bos taurus (cow), Monodelphis domestica (opossum), Gallus gallus (chicken), Danio rerio (zebra fish), Tetraodon nigroviridis (tetraodon) and Takifugu rubripes (fugu). Sequence alignments and phylogenetic trees of the orthologous genes and homologous genes are shown in Evola.

PPI view
The PPI view displays H-InvDB human PPI information at http://www.jbirc.aist.go.jp/hinv/ppi/. We collected PPI data from five databases; BIND, DIP, MINT, HPRD and IntAct, removed redundancies of the PPI data among the databases based on their sequence similarities and integrated them with the H-Invitational proteins.

Gene family/Group view
The Gene family/Group view provides human-curated annotation datasets for the selected gene families/groups at http://www.jbirc.aist.go.jp/hinv/ahg-db/geneFamilyIndex.jsp. For H-InvDB release 4.0, we provided detailed annotations for four selected gene families/groups: TCR, Ig, MHC and OR. Each page provides the list of genes, gene names, definitions and links for the appropriate H-InvDB views.


    H-InvDB New Identifier
 TOP
 ABSTRACT
 INTRODUCTION
 THE ANNOTATION IN OUR...
 NEW ANNOTATED FEATURES IN...
 COMPREHENSIVE ANNOTATION...
 H-InvDB New Identifier
 H-InvDB Data Availability
 LIST OF AUTHORS FOR...
 REFERENCES
 
We defined and assigned a unique identifier for each annotation unit, transcript, protein or cluster (7,8). The identifier for H-Invitational transcript is ‘HIT’, prefix HIT plus nine digit numbers (e.g. HIT000000001) and for H-Invitational cluster is ‘HIX’, prefix HIX plus seven digit numbers (e.g. HIX0000001). In order to identify the modification in sequence or annotation of an H-Inv entry, a version is assigned to each ID and always stated with the ID. Additionally, we now provide a new identifier for each H-Invitational protein, ‘HIP’, prefix HIP with nine digit numbers (e.g. HIP000000001).


    H-InvDB Data Availability
 TOP
 ABSTRACT
 INTRODUCTION
 THE ANNOTATION IN OUR...
 NEW ANNOTATED FEATURES IN...
 COMPREHENSIVE ANNOTATION...
 H-InvDB New Identifier
 H-InvDB Data Availability
 LIST OF AUTHORS FOR...
 REFERENCES
 
H-InvDB is freely available for both academic and commercial use and can be accessed online at http://www.h-invitational.jp/(or hinv.jp). Annotated data can also be downloaded in FASTA sequence files, the original-format flat files or XML files at HTTP and FTP servers. The mirror database is also available at http://hinvdb.ddbj.nig.ac.jp/. Minor updates are released every three months and major updates are released once a year.


    LIST OF AUTHORS FOR THE GENOME INFORMATION INTEGRATION PROJECT AND H-INVITATIONAL 2 CONSORTIUM
 TOP
 ABSTRACT
 INTRODUCTION
 THE ANNOTATION IN OUR...
 NEW ANNOTATED FEATURES IN...
 COMPREHENSIVE ANNOTATION...
 H-InvDB New Identifier
 H-InvDB Data Availability
 LIST OF AUTHORS FOR...
 REFERENCES
 
Chisato Yamasaki1,2, Katsuhiko Murakami1,2, Yasuyuki Fujii3, Yoshiharu Sato1,2, Erimi Harada1,2, Jun-ichi Takeda1,2, Takayuki Taniya1,2, Ryuichi Sakate1,2, Shingo Kikugawa1,2, Makoto Shimada1,2, Motohiko Tanino4, Kanako O. Koyanagi5, Roberto A. Barrero6, Craig Gough1,2, Hong-Woo Chun1,2, Takuya Habara1, Hideki Hanaoka7, Yosuke Hayakawa1,8, Phillip B. Hilton1,2, Yayoi Kaneko9, Masako Kanno1,2, Yoshihiro Kawahara1,2, Toshiyuki Kawamura10, Akihiro Matsuya1,11, Naoki Nagata12, Kensaku Nishikata1,13, Akiko Ogura Noda1,2, Shin Nurimoto14, Naomi Saichi1,2, Hiroaki Sakai15, Ryoko Sanbonmatsu1,2, Rie Shiba1,2, Mami Suzuki1,2, Kazuhiko Takabayashi8, Aiko Takahashi1,2, Takuro Tamura16, Masayuki Tanaka1,2, Susumu Tanaka17, Fusano Todokoro1,18, Kaori Yamaguchi1, Naoyuki Yamamoto1,19, Toshihisa Okido20, Jun Mashima20, Aki Hashizume20, Lihua Jin20, Kyung-Bum Lee20, Yi-Chueh Lin20, Asami Nozaki20, Katsunaga Sakai20, Masahito Tada20, Satoru Miyazaki21, Takashi Makino22, Hajime Ohyanagi20,23, Naoki Osato20, Nobuhiko Tanaka20, Yoshiyuki Suzuki20, Kazuho Ikeo20, Naruya Saitou24, Hideaki Sugawara20, Claire O’Donovan25, Tamara Kulikova25, Eleanor Whitfield25, Brian Halligan26, Mary Shimoyama26, Simon Twigger26, Kei Yura27, Kouichi Kimura28, Tomohiro Yasuda28, Tetsuo Nishikawa28,29, Yutaka Akiyama30, Chie Motono30, Yuri Mukai30, Hideki Nagasaki15,30, Makiko Suwa30, Paul Horton30, Reiko Kikuno31, Osamu Ohara31, Doron Lancet31, Eric Eveno33,34, Esther Graudens33,34, Sandrine Imbeaud33,3435, Marie Anne Debily33,3436, Yoshihide Hayashizaki37,38, Clara Amid39, Michael Han39, Andreas Osanger39, Toshinori Endo5, Michael A. Thomas40, Mika Hirakawa41, Wojciech Makalowski42, Mitsuteru Nakao43, Nam-Soon Kim44, Hyang-Sook Yoo44, Sandro J. De Souza45, Maria de Fatima Bonaldo46, Yoshihito Niimura47, Vladimir Kuryshev48, Ingo Schupp48, Stefan Wiemann48, Matthew Bellgard6, Masafumi Shionyu49, Libin Jia50, Danielle Thierry-Mieg51, Jean Thierry-Mieg51, Lukas Wagner51, Qinghua Zhang34,52, Mitiko Go53, Shinsei Minoshima54, Masafumi Ohtsubo54, Kousuke Hanada55, Peter Tonellato56, Takao Isogai29, Ji Zhang34,57, Boris Lenhard58, Sangsoo Kim59, Zhu Chen34,6061, Ursula Hinz62, Anne Estreicher62, Kenta Nakai63, Izabela Makalowska64, Winston Hide65, Nicola Tiffin65, Laurens Wilming66, Ranajit Chakraborty67, Marcelo Bento Soares68, Maria Luisa Chiusano69, Yutaka Suzuki70, Charles Auffray33,34, Yumi Yamaguchi-Kabata2, Takeshi Itoh2,15, Teruyoshi Hishiki2, Satoshi Fukuchi20, Ken Nishikawa20, Sumio Sugano2,70, Nobuo Nomura2, Yoshio Tateno20, Tadashi Imanishi2,5,{dagger}, Takashi Gojobori2,20


    ACKNOWLEDGEMENTS
 
We acknowledge all the members of the H-Invitational 2 consortium and Genome Information Integration Project (GIIP), especially the staffs of JBIRC for construction of H-InvDB, Ryo Aono, Tomohiro Endo, Yukie Makita, Hiromi Kubooka, Yuji Shinso, Harutoshi Maekawa, Yasuhiro Fukunaga, Hajime Nakaoka, Yoshito Ueki, Yoshihide Mimiura, Ryuzou Matsumoto, Seigo Hosoda, Yo Takahashi, Taichirou Sugisaki, Hiroki Hokari, Hiroaki Kawashima, Yasuhiro Imamizu, Makoto Ogawa for their technical assistance. This research is financially supported by the Ministry of Economy, Trade and Industry of Japan (METI), the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT) and the Japan Biological Informatics Consortium (JBIC). Also, this work is partly supported by the Research Grant for the RIKEN Genome Exploration Research Project from MEXT to Y.H. and the Grant for the RIKEN Frontier Research System, Functional RNA research program. Funding to pay the Open Access publication charges for this article was provided by JBIC.

Conflict of interest statement. None declared.


    Footnotes
 
*A complete list of authors appears at the end of this article. Back

1 Japan Biological Information Research Center, Japan Biological Informatics Consortium Back

2 Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo Back

3 Graduate School Medicine, Dentistry and Pharmaceutical Sciences, Okayama University, Okayama Back

4 DNA Chip Research Inc., Kanagawa Back

5 Hokkaido University, Hokkaido, Japan Back

6 Centre for Comparative Genomics, Murdoch University, WA, Australia Back

7 Biotechnology Research Center, The University of Tokyo Back

8 Hitachi Software Engineering Co., Ltd. Back

9 Mitsubishi Kagaku Institute of Life Sciences Back

10 Fujitsu Limited, Tokyo Back

11 Hitachi, Co., Ltd., Saitama Back

12 Japan Science and Technology Agency Back

13 NEC Soft, Ltd. Back

14 Mitsui Knowledge Industry Co., Ltd, Tokyo Back

15 National Institute of Agrobiological Sciences, Ibaraki Back

16 BITS Co., Ltd., Shizuoka Back

17 Tokyo Institute of Psychiatry, Tokyo Back

18 DYNACOM Co., Ltd., Chiba Back

19 C's Lab Co., Ltd., Hokkaido Back

20 Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Shizuoka Back

21 Tokyo University of Science, Chiba, Japan Back

22 University of Dublin, Trinity College, Dublin, Ireland Back

23 Mitsubishi Space Software Co., Ltd., Ibaraki Back

24 Division of Population Genetics, National Institute of Genetics, Shizuoka, Japan Back

25 EMBL Outstation-Hinxton, European Bioinformatics Institute, Cambridge, UK Back

26 Bioinformatics Research Center, Medical College of Wisconsin, WI, USA Back

27 Center for Computational Science and Engineering, Japan Atomic Energy Agency, Kyoto Back

28 Central Research Laboratory, Hitachi Ltd. Back

29 Reverse Proteomics Research Institute, CO., Ltd. Back

30 Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo Back

31 Department of Human Gene, Kazusa DNA Research Institute, Chiba, Japan Back

32 Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Back

33 Genexpres, Functional Genomics and Systems Biology for Health (CNRS and Pierre & Marie Curie University - Paris VI), Villejuif, France Back

34 Sino-French Laboratory in Life Sciences and Genomics, Shanghai, China Back

35 Centre de Génétique Moléculaire, CNRS and Gif/Orsay DNA Microarray Platform, Gifs/Yvette Back

36 Laboratory of Genomes Functional Exploration, CEA, DSV, IRCM, Evry, France Back

37 Genomic Sciences Center, RIKEN Yokohama Institute, Kanagawa Back

38 Genome Science Laboratory, Discovery and Research Institute, RIKEN Wako Institute, Saitama, Japan Back

39 GSF - National Research Center for Environment and Health, Institute for Bioinformatics, Neuherberg, Germany Back

40 Idaho State University, ID, USA Back

41 Institute for Chemical Research, Kyoto University, Kyoto, Japan Back

42 Institute of Bioinformatics, University of Muenster, Muenster, Germany Back

43 Kazusa DNA Research Institute, Chiba, Japan Back

44 Korea Research Institute of Bioscience & Biotechnology, Taejeon, Korea Back

45 Ludwig Institute for Cancer Research, Sao Paulo, Brazil Back

46 Medical Education and Biomedical Research Facility, University of Iowa, IA, USA Back

47 Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan Back

48 Molecular Genome Analysis, German Cancer Research Center, Heidelberg, Germany Back

49 Nagahama Institute of Bio-Science and Technology, Shiga, Japan Back

50 National Cancer Institute, National Institutes of Health, MD Back

51 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, USA Back

52 National Engineering Center for Biochips at Shanghai, Shanghai, China Back

53 Ochanomizu University, Tokyo Back

54 Photon Medical Research Center, Hamamatsu University School of Medicine, Shizuoka Back

55 Plant Science Center, RIKEN Yokohama Institute, Kanagawa Back

56 Harvard Medical School, MA, USA Back

57 Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China Back

58 Center for Genomics and Bioinformatics, Karolinska Institute, Stockholm, Sweden Back

59 Soongsil University, Seoul, Korea Back

60 State Key Laboratory of Medical Genomics, Shanghai Institute of Hematology, Rui Jin Hospital, Shanghai Jiao Tong University School of Medicine Back

61 Chinese National Human Genome Center at Shanghai, Shanghai, China Back

62 Swiss Institute of Bioinformatics, Geneva, Switzerland Back

63 The Institute of Medical Science, The University of Tokyo, Tokyo, Japan Back

64 The Pennsylvania State University, PA, USA Back

65 The South African National Bioinformatics Institute, University of Western Cape, Cape Town, South Africa Back

66 The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK Back

67 University of Cincinnati, OH Back

68 Children's Memorial Research Center, Northwestern University, Feinberg School of Medicine, USA Back

69 University of Naples "Federico II", Naples, Italy Back

70 Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan Back

{dagger}To whom correspondence should be addressed. Tel: +81-3-3599-8800; Fax: +81-3-3599-8801; E-mail: t.imanishi{at}aist.go.jp Correspondence may also be addressed to Takashi Gojobori. Tel: +81-55-981-6847; Fax: +81-55-981-6848; Email: tgojobor{at}genes.nig.ac.jp Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 THE ANNOTATION IN OUR...
 NEW ANNOTATED FEATURES IN...
 COMPREHENSIVE ANNOTATION...
 H-InvDB New Identifier
 H-InvDB Data Availability
 LIST OF AUTHORS FOR...
 REFERENCES
 

  1. Ota T, et al. Full-length cDNA project toward a high throughput functional analysis. Microb. Comp. Genomics (1997) 2:204–205.

  2. Yudate HT, et al. HUNT: launch of a full-length cDNA database from the helix research institute. Nucleic Acids Res. (2001) 29:185–188.[Abstract/Free Full Text]

  3. Wiemann S, et al. Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs. Genome Res. (2001) 11:422–435.[Abstract/Free Full Text]

  4. Strausberg RL, et al. Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc. Natl Acad. Sci. USA (2002) 99:16899–16903.[Abstract/Free Full Text]

  5. Kikuno R, et al. HUGE: a database for human large proteins identified in the Kazusa cDNA sequencing project. Nucleic Acids Res. (2002) 30:166–168.[Abstract/Free Full Text]

  6. Carninci P, et al. The transcriptional landscape of the mammalian genome. Science (2005) 309:1559–1563.[Abstract/Free Full Text]

  7. Frith MC, et al. Pseudo-messenger RNA: phantoms of the transcriptome. PLoS Genet (2006) 2:p. e23.

  8. Gingeras TR, et al. Origin of phenotypes: genes and transcripts. Genome Res. (2007) 17:682–690.[Abstract/Free Full Text]

  9. Imanishi T, et al. Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol. (2004) 2:856–875.[Web of Science]

  10. Yamasaki C, et al. Investigation of protein functions through data-mining on integrated human transcriptome database, H-Invitational database (H-InvDB). Gene (2005) 364:99–107.[CrossRef][Web of Science][Medline]

  11. Mulder NJ, et al. New developments in the InterPro database. Nucleic Acids Res. (2007) 35(Database issue):D224–D228.[Abstract/Free Full Text]

  12. Tanino M, et al. The human anatomic gene expression library (H-ANGEL), the H-Inv integrative display of human gene expression across disparate technologies and platforms. Nucleic Acids Res. (2005) 33(Database Issue):D567–D572.[Abstract/Free Full Text]

  13. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. (1994) 22:4673–4680.[Abstract/Free Full Text]

  14. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. (1997) 268:78–94.[CrossRef][Web of Science][Medline]

  15. Krogh A. Two methods for improving performance of an HMM and their application for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. (1997) 5:179–186.[Medline]

  16. Salamov AA, Solovyev VV. Ab initio gene finding in Drosophila genomic DNA. Genome Res. (2000) 10:516–522.[Abstract/Free Full Text]

  17. Allen JE, Salzberg SL. JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics (2005) 21:3596–3603.[Abstract/Free Full Text]

  18. Emanuelsson O, et al. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. (2000) 300:1005–1016.[CrossRef][Web of Science][Medline]

  19. Hirokawa T, Boon-Chieng S, Mitaku S. SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics (1998) 14:378–379.[Abstract/Free Full Text]

  20. Krogh A, et al. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. (2001) 305:567–580.[CrossRef][Web of Science][Medline]

  21. Horton P, et al. WoLF PSORT: protein localization predictor. Nucleic Acids Res. (2007) 35(Web Server issue):W585–W587.[Abstract/Free Full Text]

  22. Matsuya A, et al. Evola: ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees. Nucleic Acids Res, (2008) (in press).


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
C. Yamasaki, K. Murakami, J.-i. Takeda, Y. Sato, A. Noda, R. Sakate, T. Habara, H. Nakaoka, F. Todokoro, A. Matsuya, et al.
H-InvDB in 2009: extended database and data mining resources for human genes and transcripts
Nucleic Acids Res., November 23, 2009; (2009) gkp1020v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Imanishi and H. Nakaoka
Hyperlink Management System and ID Converter System: enabling maintenance-free hyperlinks among major biological databases
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W17 - W22.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
K. Mochida, T. Yoshida, T. Sakurai, Y. Ogihara, and K. Shinozaki
TriFLDB: A Database of Clustered Full-Length Coding Sequences from Triticeae with Applications to Comparative Grass Genomics
Plant Physiology, July 1, 2009; 150(3): 1135 - 1146.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. K. Shimada, R. Matsumoto, Y. Hayakawa, R. Sanbonmatsu, C. Gough, Y. Yamaguchi-Kabata, C. Yamasaki, T. Imanishi, and T. Gojobori
VarySysDB: a human genetic polymorphism database based on all H-InvDB transcripts
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D810 - D815.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J.-i. Takeda, Y. Suzuki, R. Sakate, Y. Sato, M. Seki, T. Irie, N. Takeuchi, T. Ueda, M. Nakao, S. Sugano, et al.
Low conservation and species-specific evolution of alternative splicing in humans and mice: comparative genomics analysis using well-annotated full-length cDNAs
Nucleic Acids Res., November 1, 2008; 36(20): 6386 - 6395.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J.-T. Li, Y. Zhang, L. Kong, Q.-R. Liu, and L. Wei
Trans-natural antisense transcripts including noncoding RNAs in 10 species: implications for expression regulation
Nucleic Acids Res., September 1, 2008; 36(15): 4833 - 4844.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (2728K) Freely available
Right arrow Screen PDF (541K) Freely available
Right arrowOA All Versions of this Article:
36/suppl_1/D793    most recent
gkm999v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?