Nucleic Acids Research, 1994, Vol. 22, No. 14 2769-2775
© 1994
COMPUTATIONAL BIOLOGY |
Discovering active motifs in sets of related protein sequences and using them for classification
Department of Computer and Information Science, New Jersey Institute of Technology Newark, NJ 07102 1Cold Spring Harbor Laboratory 100 Bungtown Road, Cold Spring Harbor, NY 11724 2Courant Institute of Mathematical Sciences, New York University Mercer Street, New York, NY 10012 3Image Processing Section, Laboratory of Mathematical Biology, Division of Cancer Biology and Diagnosis, National Cancer Institute, National Institutes of Health Frederick, MD 21701 4Department of Computer and Information Science, New Jersey Institute of Technology Newark, NJ 07102, USA
*To whom correspondence should be addressed
Received April 21, 1994. Revised May 12, 1994. Accepted May 12, 1994.
We describe a method for discovering active motifs in a set of related protein sequences. The method is an automatic two step process: (1) find candidate motifs in a small sample of the sequences; (2) test whether these motifs are approximately present in all the sequences. To reduce the running time, we develop two optimization heuristics based on statistical estimation and pattern matching techniques. Experimental results obtained by running these algorithms on generated data and functionally related proteins demonstrate the good performance of the presented method compared with visual method of O'Farrell and Leopold. By combining the discovered motifs with an existing fingerprint technique, we develop a protein classifier. When we apply the classifier to the 698 groups of related proteins in the PROSITE catalog, it gives information that is complementary to the BLOCKS protein classifier of Henikoff and Henikoff. Thus, using our classifier in conjunction with theirs, one can obtain high confidence classifications (if BLOCKS and our classifier agree) or suggest a new hypothesis (if the two disagree).
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C.-M. Hsu, C.-Y. Chen, and B.-J. Liu Corrigendum Nucleic Acids Res., March 27, 2008; 36(4): 1400 - 1406. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-M. Hsu, C.-Y. Chen, and B.-J. Liu MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W356 - W361. [Abstract] [Full Text] [PDF] |
||||
