Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (795K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Huynh, T.
Right arrow Articles by Shibuya, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Huynh, T.
Right arrow Articles by Shibuya, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2003, Vol. 31, No. 13 3645-3650
© 2003 Oxford University Press

The web server of IBM's Bioinformatics and Pattern Discovery group

Tien Huynh, Isidore Rigoutsos*, Laxmi Parida, Daniel Platt and Tetsuo Shibuya1

Bioinformatics and Pattern Discovery Group, IBM TJ Watson Research Center, PO BOX 218, Yorktown Heights, NY 10598, USA 1 Exploratory Technology, IBM Tokyo Research Laboratory, 1623-14, Shimotsuruma, Yamato-shi, Kanagawa 242-8502, Japan

*To whom correspondence should be addressed. Tel: +1 9149451384; Fax: +1 914 945 4104; Email: rigoutso{at}us.ibm.com

Received February 13, 2003; Revised and Accepted April 8, 2003


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 WHAT IS AVAILABLE ON...
 PLANNED EXTENSIONS AND UPCOMING...
 DISCUSSION
 REFERENCES
 
We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 WHAT IS AVAILABLE ON...
 PLANNED EXTENSIONS AND UPCOMING...
 DISCUSSION
 REFERENCES
 
‘Pattern discovery’ or ‘data mining’ methods have emerged as an alternative and effective approach for tackling a variety of biological problems. Although these methods have been used in the context of traditional computer science activities for more than a decade, their use in analyzing biological data began only recently.

For the last several years, the focus of the research activity in the Bioinformatics and Pattern Discovery group at IBM's TJ Watson Research Center has been on theoretical and applied aspects of pattern discovery with an emphasis on tackling problems from molecular biology. During this time, we have designed and published a number of methods for generic pattern discovery and applications (16), multiple sequence alignment (7,8), gene discovery (9), protein annotation (5,1013), comparative molecular moment analysis (14,15) etc. Implementations for many of these methods have been web-enabled and are accessible through the group's web server. Executable code bundles for several operating system and processor combinations can also be downloaded from the same server. Finally, we make available content in the form of annotations for >70 complete genomes; provided graphical user interfaces and searching software facilitate the interactive study of these annotations.

In summary, the tools that are available on our server permit the discovery of patterns in the following types of datasets: amino acid sequences, nucleic acid sequences, integer streams, gene expression data (time series as well as association experiments), natural text, invariant representations of molecular shape and electrostatics, and others. The tools can also be used to generate multiple sequence alignments of amino acid, nucleic acid or generic textual input sets, to discover genes in nucleic acid sequences, and to interactively annotate amino acid sequences. In what follows, we briefly describe the various services and genomic content found on our web server; extensive information for each one of the tools, detailed help files and a full-length tutorial are available on-line.


    WHAT IS AVAILABLE ON THE SERVER
 TOP
 ABSTRACT
 INTRODUCTION
 WHAT IS AVAILABLE ON...
 PLANNED EXTENSIONS AND UPCOMING...
 DISCUSSION
 REFERENCES
 
The desired tool can be selected by clicking on the tab with the relevant header (Fig. 1). As seen in Figure 1, the typical tool page will include the following logical sections: ‘parameters’, ‘options’, ‘input sequences’, ‘references’, and ‘other relevant links’, the latter appearing always on the left side of each page. The parameters are multi-valued entities and are used to control the actual computation. The options, on the other hand, are binary in nature and are used to enable/disable various filtering steps, to modify attributes of the computation or to affect the format of the output. The references section enumerates those of the group's publications which describe the corresponding tool. The ‘other relevant links’ section includes pointers to tool-specific pages (e.g. ‘About this Engine’ links to a detailed explanation of the tool, the underlying algorithm, the tool's capabilities, the input and output formats, recommended values for the various parameters, etc.), to server-specific pages (e.g. ‘News’, ‘License Terms’, ‘Brief/Full Tutorial’), and to pages with general information. Notably, on each tool page we provide email links that permit the users to send us questions, comments and suggestions pertaining to the tools and the server's operation.



View larger version (51K):
[in this window]
[in a new window]
 
Figure 1. Screen shot of a typical tool page on our web server. The graphical user interface permits the user to select among the available tools by using the tabs provided at the top of each tool's page. See text for more details.

 
Pull-down menus provide access to sample input datasets and common settings whereas the options and parameters are automatically initialized to default values. The user's interaction with the server is facilitated by real-time checking of the various settings; this in turn enables the identification of logical and other conflicts and the prompting of the user with the appropriate suggestion. Context-sensitive help is also available on each tool page.

Computational requests are given 10 min of wall-clock processing time, an interval that is more than sufficient for handling a very large spectrum of problems and problem sizes. Computations that exceed this time limit are terminated and the user is notified to this effect. Note that the server will not prevent a user from issuing several simultaneous requests from multiple browser windows: each request will be given its own 10-min processing time limit. For security reasons, all of the submitted input and generated output files are deleted automatically 30 min after their creation.

Users can download executable codes for a variety of operating systems and processor combinations from http://cbcsrv.watson.ibm.com/download.phtml.html; note that our code updates closely follow the web enablement and publication of each new method. The downloadable codes and provided web services can be used free of charge by users that carry out not-for-profit work (e.g. academic or personal research); a license is required for use in commercial, for-profit environments and additional details on the terms and conditions can be found on our web site.

We next describe briefly each of the available tools using the same headers as the tabs which appear in Figure 1; the corresponding URL is also listed.

Sequence pattern discovery (http://cbcsrv.watson.ibm.com/Tspd.html).
This is our implementation of Teiresias, a two-phase, combinatorial algorithm for general purpose pattern discovery (13). The user can run Teiresias in either ‘exact discovery’ or in ‘equivalence’ mode. In the latter mode, the user must provide groups of characters which can replace one another (=‘character equivalences’) and are potentially overlapping. Commonly used equivalences (e.g. ‘chemical’ and ‘structural’) are available through a pull-down menu; for non-amino acid inputs the user can define equivalences by simply typing the character groups in the provided window. The input set can be amino/nucleic acid sequences and general character streams composed of printable ASCII characters.

The discovered patterns are reported in groups of 100 at a time and a mechanism is provided for the user to navigate through this output. Individual patterns can be selected and their highlighted instances displayed in the processed input. Filtering tools based on pattern properties such as minimum/maximum support, minimum/maximum density, log-probability estimates, etc., are also available on the output page together with a link providing access to the complete set of discovered patterns. For amino acid inputs, individual patterns can be searched within SWISS-PROT/TrEMBL: the output is a list of strings showing the phylogenetic domain, database id, accession number, function and organism name for all the database entries containing an instance of the pattern under consideration—each output string is a dynamic link to the SWISS-PROT/TrEMBL record of the respective sequence. A working session can be seen in Figure 2.



View larger version (58K):
[in this window]
[in a new window]
 
Figure 2. A working session using the output of the sequence pattern discovery tool. The user can subselect among the generated patterns using various properties such as pattern density, minimum and maximum number of appearances, etc. Instances of individual patterns can be shown highlighted in the original input. Patterns can also be located in SWISS-PROT/TrEMBL: the output page contains a list of links to the corresponding SWISS-PROT/TrEMBL records, whereas substantial information about the protein containing each instance is included in the text of the link itself.

 
Multiple sequence alignment (http://cbcsrv.watson.ibm.com/Tmsa.html).
This tool represents our implementation of the MUSCA algorithm which allows the user to align multiple streams of letters so as to reveal salient features that are potentially present in the considered input (7,8). The user-controlled options and parameters are the same as in the previous tool with the addition of optional hydropathy coloring of the generated output.

Gene expression analysis (time series) (http://cbcsrv.watson.ibm.com/Tgea.html).
With this tool one can analyze datasets that track the induction/repression of genes over time and in response to environmental changes. The input is given as a NxM matrix of space-separated real numbers each representing log-expression ratios. Teiresias is used to discover all relationships which simultaneously involve subsets of the rows and subsets of the columns in this matrix. This is a special case of the generic problem of association discovery and a discussion on the importance of discovering such patterns appears in the literature (5). This tool can carry out discovery on the raw expression ratios and on the signs of the derivatives of the expression ratios (see also 5). The degree of quantization that is applied to the input data can be changed interactively. Also, for each discovered pattern, the user can generate plots of the corresponding input rows or highlight a pattern's instances in the original input (Fig. 3). Currently, only patterns that are synchronized in time (lock-step) are reported; the tool neither reports nor seeks patterns corresponding to cascaded signals.



View larger version (54K):
[in this window]
[in a new window]
 
Figure 3. Sample session using the gene expression (time series) tool. As with the sequence pattern discovery interface, the user can subselect among the generated patterns and show their highlighted instances in the original input. The user can also plot those of the original input streams which contain the pattern under consideration: the data points comprising the pattern are shown in green color.

 
Gene expression analysis (association data)/Association discovery (http://cbcsrv.watson.ibm.com/Tad.html).
Unlike the previous tool which is appropriate for time series inputs, this tool is used to process more general inputs representing association data; a NxM matrix of space-separated integers is now expected at the input. For example, the rows of this matrix could correspond to plants or tissues, the columns to the genes of interest, and the (i, j) cell to the expression level of gene j in tissue/plant i. The tool outputs the associations which are present in the input, i.e. the correlations of subsets of rows that are deduced from the presence of similar data values among subsets of columns. The reported associations are provably maximal in composition and length, i.e. all of the columns that are involved in defining a cluster will be identified and reported (see 13 for a discussion on the concept of maximality and 5 for a discussion of this problem's computational difficulty).

The general association discovery problem has also numerous instances outside the biological context. Indeed, the entries of the NxM input set can be a mix of categorical and numerical values corresponding to a multitude of units. Our desire to make this tool as flexible as possible for more users and the impossibility to anticipate all instances of such a diverse spectrum of inputs have led us to expect that the user remap the original data points to a set of integers in a manner which best reflects his/her understanding of the data source(s).

Comparative molecular moment analysis (CoMMA) (http://cbcsrv.watson.ibm.com/Tco.html).
CoMMA utilizes information from moments of molecular mass and moment expansions of electrostatic potentials up to and inclusive of second order, to generate molecular descriptions that remain invariant under Euclidean transformations (14,15). QSAR studies have shown that CoMMA descriptors do predict activity and that molecules with similar indices can be expected to behave similarly even if their topological structures are somewhat different. CoMMA descriptors provide a succinct representation of the three-dimensional (3D) distribution of mass, shape and charge without any need for 3D-alignment and molecular superposition; the descriptors can thus be used in molecular comparisons, an essential step in computer-aided drug discovery.

This tool can process inputs that contain one or more molecules in the MOL2 format (Mol2 File Format, TRIPOS Associates Inc., http//www.tripos.com/custResources/mol2Files/mol2.pdf) and generates a set of CoMMA descriptors for each molecule. Charges must have been computed for each molecule. When multiple input molecules are provided, the user can also run Teiresias on the CoMMA descriptors to determine associations among the various molecules as these are captured by the commonalities among subsets of the molecules' CoMMA representations.

Integer pattern discovery (http://cbcsrv.watson.ibm.com/Tipd.html).
This tool is our integer-based implementation of Teiresias and is meant for pattern discovery on event streams of positive integers. It is meant to tackle inputs where the underlying alphabet contains more distinct symbols than the printable ASCII set; in fact, up to 231-1 distinct integers can be used to form inputs. The number of problems that can be solved using this version of Teiresias is very large: virtually any data mining problem that can be cast as a pattern discovery task on streams of positive integers can be tackled by this tool. For the special case where the integers correspond to natural language words we have created the next tool.

Natural text mining—word units (http://cbcsrv.watson.ibm.com/Ttwpd.html).
This tool uses the integer version of Teiresias to process natural text (not necessarily English) and treats each input word as the unit of information. The provided input should be in ASCII, and in free format (e.g. cut-and-pasted documents from a news web site). Input pre-filtering discards punctuation marks and uses carriage returns as event stream separators. No linguistic processing of any kind takes place during the reading of the input and the determination of the vocabulary: e.g. ‘go’ and ‘goes’ are treated as two distinct words. Once the vocabulary is determined, the textual input is mapped to streams of integers that is then processed by the integer-based version of Teiresias. Upon completion of the discovery phase, the integers in the patterns are replaced by the corresponding vocabulary words and word-based patterns are reported to the user.

Natural text mining—symbol units (http://cbcsrv.watson.ibm.com/Ttspd.html).
The only difference between the previous and this tool is that here it is the individual character that is treated as a unit of information. Processing a large body of natural text in this manner is expected to uncover pieces of words, whole words, and small phrases all of which are valid for the natural language that the text represents (5).

Protein annotation (http://cbcsrv.watson.ibm.com/Tpa.html).
This tool allows the user to automatically annotate a given amino acid sequence using our Bio-Dictionary-based approach (11,13). The tool determines and reports local and global similarities between the query and the contents of the SWISS-PROT/TrEMBL database (16), phylogenetic domain membership as a function of position within the query, the location and nature of domains, active sites, post-translationally modified sites, PUBMED references which are relevant to the query, etc.

The output page currently contains three frames (Fig. 4). The top frame contains a link to the set of patterns that were found in the processed query sequence and were used to annotate it; also included are plots showing the likeness of the query to archaeal, bacterial, eukaryotic, viral and unclassified sequences as a function of amino acid position. The middle frame shows local and global similarities which are shared by the query and other entries in the database from which the Bio-Dictionary was built. Also reported in this frame are relevant PUBMED references; with the help of a text-clustering tool, a hierarchy of phrases contained in the title/abstract of these PUBMED references can be generated simplifying the analysis (17,18). Finally, the bottom frame contains plots that show the type and location of features which have been identified in the annotated query: these features include binding domains, transmembrane domains, active sites, signals, etc. The plots' y-axis indicates the degree of confidence in what the plot describes, and all plots are ordered in a manner that will list narrow (in terms of extent) and better-conserved regions before wide but not-so-well-conserved ones. The captions of all plots are ‘active’ and when selected a query is issued to the Expasy server's SRS interface (http://www.expasy.org/srs5bin/cgi-bin/wgetz?) in order to determine which entries from SWISS-PROT/TrEMBL contain the string corresponding to the selected caption.



View larger version (63K):
[in this window]
[in a new window]
 
Figure 4. Sample working session using the protein annotation tool. For the processed sequence, the tool will report similarities of the query to archaeal, bacterial, eukaryotic, viral and unclassified sequences, as a function of position. It will also report local and global similarities to other sequences in SWISS-PROT/TrEMBL, as well as various features such as domains, binding sites etc. that have been identified in the query. The captions of the various plots are active and allow the user to interact with Expasy's SRS server. Also, the tool will identify and report relevant PUBMED references that can be clustered interactively and organized in a dendrogram of phrases. See text for additional information.

 
Gene identification (http://cbcsrv.watson.ibm.com/Tgi.html).
This tool provides access to the implementation of our dictionary-based approach to finding genes in prokaryotic genomes (9). The input page provides the user with the ability to select ‘start’ codons while the output consists of a graphical interface that allows the user to zoom in/out and navigate over the processed nucleic acid sequence. Discovered genes are displayed using score-based colors and clicking on a gene will provide information about the gene's location within the processed input sequence, its reading frame, its amino acid translation, etc. The graphical interface also permits the user to annotate the amino acid translation of a selected gene using our annotation tool—a screen shot of a working session is included in Shibuya and Rigoutsos (9).

Genome annotations (http://cbcsrv.watson.ibm.com/Annotations/).
We have used our protein annotation method to process and annotate putative protein sequences from >70 complete genomes. These annotations are accessible through our web server. Within each genome, a gene's annotation can be retrieved either using the gene's accession number or with the help of regular expressions (13). On-line documentation includes several examples and information on how to construct these regular expressions. Searches across genomes are also available through http://cbcsrv.watson.ibm.com/TsearchAllGenomes.html.

Human cytomegalovirus (HHV5) annotation (http://cbcsrv.watson.ibm.com/virus/).
In recent work, we used two distinct methods to annotated the human cytomegalovirus (HCMV) genome. The first method employed the ProCeryon program (ProCeryon, a program for fold recognition and protein structure analysis, ProCeryon Biosciences, http://www.proceryon.com, 1999–2001) and generated a structural and functional hypothesis for each putative gene using a threading-based scheme (19). The second, more recent effort re-annotated HCMV using our Bio-Dictionary-based approach that exploits only sequence-based information (20). Three viral strains were annotated in each study: (i) AD169 which was sequenced in 1990; (ii) Towne; and (iii) Toledo (21,22). Our results as well as natural language summaries for each putative protein have been included in a system that we designed and implemented on IBM's DB2 database system and which can be seen in Figure 5.



View larger version (62K):
[in this window]
[in a new window]
 
Figure 5. Snapshot of a working session with the HHV5 database. Using either a graphical interface or a table listing the various genes, the user can select a gene and view all the information which we have compiled using our threading- and Bio-Dictionary-based approaches. Access to the sequence of the selected gene as well its SWISS-PROT record is readily available through the provided interface. Also available are the threading and Bio-Dictionary results which led us to drawing the stated conclusions. For additional information see text as well as the discussion in Rigoutsos et al. (20).

 

    PLANNED EXTENSIONS AND UPCOMING TOOLS
 TOP
 ABSTRACT
 INTRODUCTION
 WHAT IS AVAILABLE ON...
 PLANNED EXTENSIONS AND UPCOMING...
 DISCUSSION
 REFERENCES
 
We are currently working on and will soon be making available a parallel version of Teiresias for both shared-memory and message-passing architectures; a utility for creating JPEG files from such output pages as those of the protein annotation, multiple sequence alignment and gene discovery tools; and tools for determining irredundant motifs in an input (4), finding tandem repeats in nucleic acid sequences (23), discovering flexible patterns (6) and determining the fine structure details of transmembrane helices.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 WHAT IS AVAILABLE ON...
 PLANNED EXTENSIONS AND UPCOMING...
 DISCUSSION
 REFERENCES
 
We have described the capabilities of a Bioinformatics web server designed and maintained by our group. The server provides remote access to web-based implementations of methods that the group's members and collaborators have published over the years and is operational around the clock. Code bundles are also available for download.

The various tools revolve around the concept of pattern discovery and provide solutions to a large variety of problems from biology and elsewhere. For each tool, detailed descriptions that include information on setting options, parameters, input formats etc. can be found on-line. Real-time controls check for errors and inconsistencies at all stages of the user's interaction with the server while context-sensitive help is available on all pages.

Additionally, graphical user interfaces and software permit the interactive searching of our on-line annotations for >70 complete archaeal, bacterial and eukaryotic genomes. Finally, for the HCMV genome an IBM DB2-based system permits web access to our published analyses of three HCMV strains.


    ACKNOWLEDGEMENTS
 
The authors thank Stephen Chin-Bow, Mashama McFarlane and Kyle Jensen for their contributions to the development of various aspects of the tools and graphical user interfaces.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 WHAT IS AVAILABLE ON...
 PLANNED EXTENSIONS AND UPCOMING...
 DISCUSSION
 REFERENCES
 

  1. Rigoutsos,I. and Floratos,A. (1998) Combinatorial pattern discovery in biological sequences: the Teiresias algorithm. Bioinformatics, 14, 55–67.[Abstract/Free Full Text]

  2. Rigoutsos,I. and Floratos,A. (1998) Motif discovery without alignment or enumeration. Proceedings of the Second Annual ACM International Conference on Computational Molecular Biology (RECOMB), New York, NY, pp. 221–227.

  3. Floratos,A. and Rigoutsos,I. (1998) On the Time Complexity of the TEIRESIAS Algorithm. IBM Technical Report, RC 21161 (94582). IBM TJ Watson Research Center.

  4. Parida,L., Rigoutsos,I., Floratos,A., Platt,D.E. and Gao,Y. (2000) Pattern discovery on character sets and real valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm. In Proceedings of the Eleventh Annual ACM/SIAM Symposium on Discrete Algorithms (SODA '00). San Francisco, CA, pp. 297–308.

  5. Rigoutsos,I., Floratos,A., Parida,L., Gao,Y. and Platt,D. (2000) The emergence of pattern discovery techniques in computational biology. Metabol. Eng., 2, 159–177.

  6. Parida,L., Rigoutsos,I. and Platt,D.E. (2001) An output-sensitive flexible pattern discovery algorithm. In Proceedings of the Twelfth Annual Symposium on Combinatorial Pattern Matching. Jerusalem, Israel, pp. 131–142.

  7. Parida,L., Floratos,A. and Rigoutsos,I. (1998) MUSCA: an algorithm for constrained alignment of multiple data sequences. In Proceedings of the Ninth Workshop on Genome Informatics, Tokyo, Japan, pp. 112–119.

  8. Parida,L., Floratos,A. and Rigoutsos,I. (1999) An approximation algorithm for alignment of multiple sequences using motif discovery. J. Comb. Optim., 3, 247–275.[CrossRef]

  9. Shibuya,T. and Rigoutsos,I. (2002) Dictionary-driven microbial gene finding. Nucleic Acids Res., 30, 2710–2725.[Abstract/Free Full Text]

  10. Floratos,A., Rigoutsos,I., Parida,L., Stolovitzky,G. and Gao,Y. (1999) Sequence homology detection through large-scale pattern discovery. In Proceedings of the Third Annual ACM International Conference on Computational Molecular Biology (RECOMB '99), Lyon, France, pp. 164–173.

  11. Rigoutsos,I., Floratos,A., Ouzounis,C., Gao,Y. and Parida,L. (1999) Dictionary building via unsupervised hierarchical motif discovery in the sequence space of natural proteins. Proteins, 37, 264–277.[CrossRef][Web of Science][Medline]

  12. Rigoutsos,I., Gao,Y., Floratos,A. and Parida,L. (1999) Building dictionaries of 1D and 3D motifs by mining the unaligned 1D sequences of 17 archaeal and bacterial genomes. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology (ISMB '99), Menlo Park, California. AAAI Press, pp. 223–233.

  13. Rigoutsos,I., Huynh,T., Floratos,A., Parida,L. and Platt,D. (2002) Dictionary-driven protein annotation. Nucleic Acids Res., 30, 3901–3916.[Abstract/Free Full Text]

  14. Silverman,B.D and Platt,D.E. (1996) Comparative molecular moment analysis (CoMMA): 3D Qsar without molecular superposition. J. Med. Chem., 39, 2129–2140.[CrossRef][Medline]

  15. Platt,D.E. and Silverman,B.D. (1996) Registration, orientation, and similarity of molecular electrostatic potentials through multipole matching. J. Comp. Chem., 17, 358–366.

  16. Bairoch,A. and Apweiler,R. (2000) The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45–48.[Abstract/Free Full Text]

  17. Mack,R., Ravin,Y. and Byrd,R. (2001) Knowledge portals and the emerging digital knowledge workplace. IBM Sys. J., 40, 925–955.

  18. Ando,R.Y., Boguraev,B., Byrd,R. and Neff,M. (2000) Multidocument summarization by visualizing topic content. In Proceedings of ANLP/NAACL Workshop on Automatic Summarization, pp. 79–88.

  19. Novotny,J., Rigoutsos,I., Coleman,D. and Shenk,T. (2001) In silico structural and functional analysis of the human cytomegalovirus (HHV5) genome. J. Mol. Biol., 310, 1151–1166.[CrossRef][Web of Science][Medline]

  20. Rigoutsos,I., Novotny,J., Huynh,T., Chin-Bow,S., Parida,L., Platt,D., Coleman,D. and Shenk,T. (2003) In silico pattern-based analysis of the human cytomegalovirus (HHV5) genome. J. Virol., 77, 4326–4344.[Abstract/Free Full Text]

  21. Chee,M., Bankier,A., Beck,S., Bohni,R., Brown,C., Cerny,R., Horsnell,T., Hutchinson,C.,III, Kouzarides,T., Martignetti,J. et al. (1990) Analysis of the protein-coding content of the sequence of human cytomegalovirus strain AD169. Curr. Top. Micro. Immunol., 154, 125–169.[Web of Science][Medline]

  22. Cha,T.A., Tom,E., Kemble,G.W., Duke,G.M., Mocarski,E.S. and Spaete,R.R. (1996) Human cytomegalovirus clinical isolates carry at least 19 genes not found in laboratory strains. J. Virol., 70, 78–83.[Abstract]

  23. Stolovitzky,G., Gao,Y., Floratos,A. and Rigoutsos,I. (1999) Tandem repeat detection using pattern discovery with application to the identification of yeast satellites. IBM Technical Report, RC 21508 (96944). IBM TJ Watson Research Center.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
T. Huynh and I. Rigoutsos
The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W10 - W15.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (795K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Huynh, T.
Right arrow Articles by Shibuya, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Huynh, T.
Right arrow Articles by Shibuya, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?