Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (257K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (51)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Kantor, R.
Right arrow Articles by Shafer, R. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kantor, R.
Right arrow Articles by Shafer, R. W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2001, Vol. 29, No. 1 296-299
© 2001 Oxford University Press

Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database: an expanded data model integrating natural language text and sequence analysis programs

Rami Kantor1, Rhoderick Machekano1, Mathew J. Gonzales1, Kathryn Dupnik1, Jonathan M. Schapiro1,2 and Robert W. Shafer1,*

1Division of Infectious Diseases, Stanford University Medical Center, Stanford, CA 94305, USA and 2National Hemophilia Center, Tel-Hashomer Hospital, Tel Aviv, Israel

Received October 3, 2000; Accepted October 11, 2000.


    ABSTRACT
 TOP
 ABSTRACT
 DATABASE RATIONALE AND HISTORY
 DATABASE OVERVIEW
 FUTURE DIRECTIONS
 CITING THE DATABASE
 REFERENCES
 
The HIV Reverse Transcriptase and Protease Sequence Database is an on-line relational database that catalogs evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of anti-HIV therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions from International Collaboration databases and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. During the past year 3500 sequences have been added and the data model has been expanded to include drug susceptibility data on sequenced isolates. Database content has also been integrated with didactic text and the output of two sequence analysis programs.


    DATABASE RATIONALE AND HISTORY
 TOP
 ABSTRACT
 DATABASE RATIONALE AND HISTORY
 DATABASE OVERVIEW
 FUTURE DIRECTIONS
 CITING THE DATABASE
 REFERENCES
 
Medical and biological relevance
Human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes are the molecular targets of the 15 currently licensed antiretroviral drugs. Sequence changes in the genes coding for these enzymes are directly responsible for phenotypic resistance to RT and protease inhibitors. Individuals infected with drug-susceptible HIV isolates experience reductions in morbidity and mortality with appropriate antiretroviral drug therapy. In contrast, individuals infected with drug-resistant isolates do not usually respond to drug therapy (1). Assays for sequencing HIV RT and protease are commercially available and are widely used in clinical settings. However, the optimal means of interpreting RT and protease sequences is not known, and the potential utility of such sequence results in clinical settings is an area of intense clinical investigation (2).

HIV RT and protease
HIV RT is a heterodimer composed of a p66 subunit, which contains the DNA-binding groove and the active site, and a p51 subunit, which appears to function as a scaffold for the enzymatically active p66 subunit. HIV RT is responsible for RNA-dependent DNA polymerization, RNase H activity and DNA-dependent DNA polymerization. Although it is 560 amino acids in length, almost all known drug resistance mutations are found in the 5'-polymerase coding region, which is the region sequenced by most clinical laboratories. HIV protease is responsible for the post-translational processing of the viral gag and gag-pol polyproteins to yield the structural proteins and enzymes of the virus. The enzyme is an aspartic protease composed of two non-covalently associated, structurally identical monomers 99 amino acids in length. The protease has a binding cleft that specifically recognizes and cleaves at least 10 different sequences on viral precursor polyproteins.

HIV genetic variation
Genetic analysis of HIV isolates has revealed 10 different group M (main) subtypes, differing from one another by 10–30%, and several highly divergent group O and group N outlier sequences (3). Sequences of HIV isolates collected from around the world provide evidence for naturally occurring genetic polymorphisms; sequences of isolates from persons receiving anti-HIV drug therapy indicate genetic changes selected under antiretroviral drug pressure that may be associated with drug resistance. Sequences within the HIV RT and Protease Sequence Database indicate that ~40% of the 99 amino acids in the protease are polymorphic in untreated individuals, and up to 67% are polymorphic in patients receiving antiretroviral therapy. In the RT, percentages of polymorphisms are ~25 and 40%, respectively.

Database history
The HIV RT and Protease Sequence Database is a relational database begun in 1998 as a catalog of sequences linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. The database schema has been described in detail in earlier publications (4,5). During the past year, the data model has been expanded to include phenotypic drug susceptibility data and other data pertaining to the interpretation of RT and protease sequences obtained in clinical settings. In addition, two sequence analysis programs and didactic text have been integrated via hyperlinks to specific database queries. In the sections that follow, we will review key database features with an emphasis on those that were introduced within the past year.


    DATABASE OVERVIEW
 TOP
 ABSTRACT
 DATABASE RATIONALE AND HISTORY
 DATABASE OVERVIEW
 FUTURE DIRECTIONS
 CITING THE DATABASE
 REFERENCES
 
Table 1 contains a summary of the key web pages in the database divided into three sections: documentation, queries and sequence analysis programs. As of September 15, 2000, the database contained 7909 sequences from 2215 individuals, including 4038 RT and 3871 protease sequences. These sequences include 5820 sequences published in GenBank and 2089 sequences from published journal articles. The database also contains >3500 drug susceptibility results.


View this table:
[in this window]
[in a new window]
 
Table 1. Summary of the HIV RT and Protease Sequence Database Web Site
 
Database documentation
The ‘Background’ is a brief bulleted description of the database rationale targeted towards a lay audience. The ‘Primer’ is a more detailed description of the database rationale, particularly with respect to the problem of HIV drug resistance; the ‘Primer’ also describes how the HIV RT and Protease Sequence Database differs from other databases containing HIV sequences, including the International Collaboration databases and Los Alamos HIV Sequence Database (6). The ‘Data Model for Understanding HIV Drug Resistance’ describes an ontology of HIV drug resistance based on four types of correlations between genetic sequences and other types of data, including drug treatment histories of patients from whom HIV isolates are sequenced, in vitro drug susceptibility data from laboratory HIV isolates, in vitro drug susceptibility data from clinical HIV isolates and clinical outcome data from patients receiving anti-HIV therapy.

‘Summary Statistics’ is a dynamically generated page based on SQL queries that provides the number of patients, isolates and sequences in the database meeting a variety of criteria. The ‘Drug Resistance Notes’ page contains graphical didactic summaries of the most common drug resistance mutations. The graphical summaries are image maps linked to data within the database relating the mutation to each of the clinically available anti-HIV compounds (Fig. 1).



View larger version (32K):
[in this window]
[in a new window]
 
Figure 1. Protease mutations and their relationship to protease inhibitor drug susceptibilities. Drug names are shown at the top of the columns. Amino acid positions are shown at the left of each row. NFV, nelfinavir; SQV, saquinavir; IDV, indinavir; RTV, ritonavir; APV, amprenavir. Red rectangles denote primary resistance mutations. Yellow rectangles denote mutations that contribute to resistance when present with other mutations. Pale yellow rectangles represent mutations that are accessory resistance mutations which also occur as polymorphisms in untreated individuals. Rectangles containing a question mark imply that the relationship between the mutation and drug resistance has not been fully defined. Rectangles containing a star imply that the mutation causes hypersusceptibility to the drug shown at the top of the column. Each rectangle is hyperlinked to a page containing the four types of data shown in the top right box that link mutation and drug. An example of one of these types of data (e.g. SQV susceptibilities of laboratory isolates containing G48V) is shown in the bottom right box.

 
Database queries
Users can retrieve theoretically limitless numbers of different sequence sets matching selection criteria based on specific references, drug treatments, RT and protease mutations, and drug susceptibility patterns (Table 1, Database query forms). Each query returns a new table and each record in the new table contains 8–12 columns of associated data selected by the user. The data returned may include: (i) hyperlinks to MEDLINE abstracts and GenBank records; (ii) complete nucleotide sequences and translations; (iii) classification of the sequences by individual and time point; (iv) data on the HIV species, group and subtype represented by each sequence; (iv) data on drug treatment including lists of drugs and complete summaries of drug regimens received by the patient from whom the isolates were obtained; and (v) technical data on the methods of virus isolation and sequencing. Following retrieval of a sequence set, users can view or download complete sequence alignments in a variety of formats, including a composite sequence alignment summarizing the frequency and type of mutation at each position in the sequence (Fig. 2).



View larger version (22K):
[in this window]
[in a new window]
 
Figure 2. Composite sequence alignment. This is one of the methods for viewing the results of queries. This query retrieved sequences from HIV-1 subtype B patients receiving nelfinavir as their sole protease inhibitor. The page header shows the query parameters and the numbers of references, patients and isolates returned by the query. The figure legend shows whether the user wished to view the absolute number of mutations at each position and whether the user wished to ignore rare occurrences that were observed in only one sequence. The first line in that summary shows the numbered consensus sequence. The second line contains the total number of isolates in the data set at each position. The remaining lines show the frequency of variation at each position in the sequence data set. If this composite alignment is compared to the composite alignment of protease sequences from untreated patients, one would observe that the changes at codons 30, 46, 73, 88 and 90 occurred only in isolates from patients who had received nelfinavir and that changes at codons 36 and 71 were present in both alignments but occurred more commonly in isolates from patients receiving nelfinavir.

 
Sequence analysis programs
HIV-SEQ (HIV RT and Protease Search Engine for Queries, formerly HRP-ASAP) accepts user-submitted RT and protease sequences, compares them to a reference sequence and uses the differences (mutations) as query parameters for interrogating the sequence database (7). The program allows users to discover associations between a submitted sequence and previously published sequences containing the same mutations. The web site also contains a beta test version of an anti-HIV drug resistance interpretation program, which accepts user-submitted RT and protease sequences and returns inferred levels of resistance to 15 clinically available anti-HIV drugs. Drug resistance is inferred using a comprehensive set of rules hyper-linked to the output of specific database queries.


    FUTURE DIRECTIONS
 TOP
 ABSTRACT
 DATABASE RATIONALE AND HISTORY
 DATABASE OVERVIEW
 FUTURE DIRECTIONS
 CITING THE DATABASE
 REFERENCES
 
The data model will be broadened to link sequence data with additional types of in vitro and clinical phenotypic data. The model will continue to utilize relational features to represent data such as sequences and in vitro drug susceptibility results. However, object-oriented features will be incorporated to model higher-level data such as disease stage, complex treatment regimens and response to anti-HIV therapy. These changes in the data model will expand the usefulness of the database, increasing the user base from beyond a core group of sequence analysts to include clinical investigators studying other aspects of HIV drug resistance. Preliminary research on a third molecular target of HIV therapy, the gp41 envelope protein (8), has also begun and will be included in the database when appropriate.


    CITING THE DATABASE
 TOP
 ABSTRACT
 DATABASE RATIONALE AND HISTORY
 DATABASE OVERVIEW
 FUTURE DIRECTIONS
 CITING THE DATABASE
 REFERENCES
 
Please refer to this article, when citing the HIV RT and Protease Sequence Database.


    FOOTNOTES
 
* To whom correspondence should be addressed. Tel: +1 650 725 2946; Fax: +1 650 723 8596; Email: rshafer{at}cmgm.stanford.edu Back


    REFERENCES
 TOP
 ABSTRACT
 DATABASE RATIONALE AND HISTORY
 DATABASE OVERVIEW
 FUTURE DIRECTIONS
 CITING THE DATABASE
 REFERENCES
 

    1 Carpenter,C.C., Cooper,D.A., Fischl,M.A., Gatell,J.M., Gazzard,B.G., Hammer,S.M., Hirsch,M.S., Jacobsen,D.M., Katzenstein,D.A., Montaner,J.A. et al. (2000) Antiretroviral therapy in adults: updated recommendations of the International AIDS Society-USA Panel. JAMA, 283, 381–390.[Abstract/Free Full Text]

    2 Hirsch,M.S., Brun-Vezinet,F., D’Aquila,R.T., Hammer,S.M., Johnson,V.A., Kuritzkes,D.R., Loveday,C., Mellors,J.W., Clotet,B., Conway,B. et al. (2000) Antiretroviral drug resistance testing in adult HIV-1 infection: recommendations of an International AIDS Society-USA Panel. JAMA, 283, 2417–2426.[Abstract/Free Full Text]

    3 Robertson,D.L., Anderson,J.P., Bradac,J.A., Carr,J.K., Foley,B., Funkhouser,R.K., Gao,F., Hahn,B.H., Kalish,M.L., Kuiken,C.L. et al. (2000) HIV-1 Nomenclature Proposal A Reference Guide to HIV-1 Classification. Science, 288, 55–56.

    4 Shafer,R.W., Stevenson,D. and Chan,B. (1999) Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database. Nucleic Acids Res., 27, 348–352.[Abstract/Free Full Text]

    5 Shafer,R.W., Jung,D.R., Betts,B.J., Xi,Y. and Gonzales,M.J. (2000) Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res., 28, 346–348.[Abstract/Free Full Text]

    6 Kuiken,C.L., Foley,B., Hahn,B.H., Marx,P., McCutchan,F.E., Mellors,J., Mullins,J., Wolinsky,S. and Korber,B. (1999) Human Retroviruses and AIDS. A Compilation and Analysis of Nucleic Acid and Amino Acid Sequences. Theoretical Biology and Biophysics, Los Alamos, NM.

    7 Shafer,R.W., Jung,D.R. and Betts,B.J. (2000) Human immunodeficiency virus type 1 reverse transcriptase and protease search engine for queries. Nature Med., 6, 1290–1292.[ISI][Medline]

    8 Kilby,J.M., Hopkins,S., Venetta,T.M., DiMassimo,B., Cloud,G.A., Lee,J.Y., Alldredge,L., Hunter,E., Lambert,D., Bolognesi,D. et al. (1998) Potent suppression of HIV-1 replication in humans by T-20, a peptide inhibitor of gp41-mediated virus entry. Nature Med., 4, 1302–1307. [ISI][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
K. Deforche, R. Camacho, K. Van Laethem, P. Lemey, A. Rambaut, Y. Moreau, and A.-M. Vandamme
Estimation of an in vivo fitness landscape experienced by HIV-1 under drug selective pressure useful for prediction of drug resistance evolution during treatment
Bioinformatics, January 1, 2008; 24(1): 34 - 41.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Deforche, T. Silander, R. Camacho, Z . Grossman, M. A. Soares, K. Van Laethem, R. Kantor, Y. Moreau, A.-M. Vandamme, and on behalf of the non-B Workgroup
Analysis of HIV-1 pol sequences using Bayesian Networks: implications for drug resistance
Bioinformatics, December 15, 2006; 22(24): 2975 - 2979.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
L. M. F. Gonzalez, R. S. Aguiar, A. Afonso, P. A. Brindeiro, M. B. Arruda, M. A. Soares, R. M. Brindeiro, and A. Tanuri
Biological characterization of human immunodeficiency virus type 1 subtype C protease carrying indinavir drug-resistance mutations.
J. Gen. Virol., May 1, 2006; 87(Pt 5): 1303 - 1309.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
R. A. Smith, D. J. Anderson, and B. D. Preston
Purifying Selection Masks the Mutational Flexibility of HIV-1 Reverse Transcriptase
J. Biol. Chem., June 18, 2004; 279(25): 26726 - 26734.
[Abstract] [Full Text] [PDF]


Home page
Antimicrob. Agents Chemother.Home page
Z. Grossman, E. E. Paxinos, D. Averbuch, S. Maayan, N. T. Parkin, D. Engelhard, M. Lorber, V. Istomin, Y. Shaked, E. Mendelson, et al.
Mutation D30N Is Not Preferentially Selected by Human Immunodeficiency Virus Type 1 Subtype C in the Development of Resistance to Nelfinavir
Antimicrob. Agents Chemother., June 1, 2004; 48(6): 2159 - 2165.
[Abstract] [Full Text] [PDF]


Home page
J Antimicrob ChemotherHome page
S. Paolucci, F. Baldanti, M. Zavattoni, and G. Gerna
Novel recombinant phenotypic assay for clonal analysis of reverse transcriptase mutations conferring drug resistance to HIV-1 variants
J. Antimicrob. Chemother., May 1, 2004; 53(5): 766 - 771.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
B. C. Logsdon, J. F. Vickrey, P. Martin, G. Proteasa, J. I. Koepke, S. R. Terlecky, Z. Wawrzak, M. A. Winters, T. C. Merigan, and L. C. Kovari
Crystal Structures of a Multidrug-Resistant Human Immunodeficiency Virus Type 1 Protease Reveal an Expanded Active-Site Cavity
J. Virol., March 15, 2004; 78(6): 3123 - 3132.
[Abstract] [Full Text] [PDF]


Home page
Antimicrob. Agents Chemother.Home page
N. T. Parkin, N. S. Hellmann, J. M. Whitcomb, L. Kiss, C. Chappey, and C. J. Petropoulos
Natural Variation of Drug Susceptibility in Wild-Type Human Immunodeficiency Virus Type 1
Antimicrob. Agents Chemother., February 1, 2004; 48(2): 437 - 443.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S.-Y. Rhee, M. J. Gonzales, R. Kantor, B. J. Betts, J. Ravela, and R. W. Shafer
Human immunodeficiency virus reverse transcriptase and protease sequence database
Nucleic Acids Res., January 1, 2003; 31(1): 298 - 303.
[Abstract] [Full Text] [PDF]


Home page
J Antimicrob ChemotherHome page
S. Paolucci, F. Baldanti, M. Zavattoni, G. Comolli, N. Labo, S. Menzo, M. Clementi, and G. Gerna
Comparison of levels of HIV-1 resistance to protease inhibitors by recombinant versus conventional virus phenotypic assay and two genotypic interpretation procedures in treatment-naive and HAART-experienced HIV-infected patients
J. Antimicrob. Chemother., January 1, 2003; 51(1): 135 - 139.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
G. H. Kijak, V. Simon, P. Balfe, J. Vanderhoeven, S. E. Pampuro, C. Zala, C. Ochoa, P. Cahn, M. Markowitz, and H. Salomon
Origin of Human Immunodeficiency Virus Type 1 Quasispecies Emerging after Antiretroviral Treatment Interruption in Patients with Therapeutic Failure
J. Virol., June 14, 2002; 76(14): 7000 - 7009.
[Abstract] [Full Text] [PDF]


Home page
Clin. Microbiol. Rev.Home page
R. W. Shafer
Genotypic Testing for Human Immunodeficiency Virus Type 1 Drug Resistance
Clin. Microbiol. Rev., April 1, 2002; 15(2): 247 - 277.
[Abstract] [Full Text] [PDF]


Home page
Antimicrob. Agents Chemother.Home page
M. A. Winters and T. C. Merigan
Variants Other than Aspartic Acid at Codon 69 of the Human Immunodeficiency Virus Type 1 Reverse Transcriptase Gene Affect Susceptibility to Nucleoside Analogs
Antimicrob. Agents Chemother., August 1, 2001; 45(8): 2276 - 2279.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (257K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (51)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Kantor, R.
Right arrow Articles by Shafer, R. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kantor, R.
Right arrow Articles by Shafer, R. W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?