Nucleic Acids Research, 2003, Vol. 31, No. 13 3692-3697
© 2003 Oxford University Press
SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence
1 Department of Computational Science, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543, Singapore 2 Department of Applied Physics, Chongqing University, Chongqing 400044, PR China
*To whom correspondence should be addressed. Tel: +65 68746877; Fax: +65 67746756; Email: yzchen{at}cz3.nus.edu.sg
Received February 14, 2003; Revised March 19, 2003. Accepted April 2, 2003
| ABSTRACT |
|---|
|
|
|---|
Prediction of protein function is of significance in studying biological processes. One approach for function prediction is to classify a protein into functional family. Support vector machine (SVM) is a useful method for such classification, which may involve proteins with diverse sequence distribution. We have developed a web-based software, SVMProt, for SVM classification of a protein into functional family from its primary sequence. SVMProt classification system is trained from representative proteins of a number of functional families and seed proteins of Pfam curated protein families. It currently covers 54 functional families and additional families will be added in the near future. The computed accuracy for protein family classification is found to be in the range of 69.199.6%. SVMProt shows a certain degree of capability for the classification of distantly related proteins and homologous proteins of different function and thus may be used as a protein function prediction tool that complements sequence alignment methods. SVMProt can be accessed at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.
| INTRODUCTION |
|---|
|
|
|---|
Knowledge about protein function is essential in the understanding of biological processes (1,2). As the gap between the amount of sequence information and functional characterization widens, increasing efforts are being directed at the development of computational tools for protein function prediction (25). Various methods have been developed, which include sequence similarity (68), evolutionary analysis (9,10), structure-based approach (11), protein/gene fusion (12,13), protein interaction (14,15) and family classification by sequence clustering (16,17).
In the absence of clear sequence or structural similarities, the criteria for comparison of distantly-related proteins become increasingly difficult to formulate (17). Moreover, not all homologous proteins have analogous functions (9). The presence of a shared domain within a group of proteins does not necessarily imply that these proteins perform the same function (18). Many proteins sharing promiscuous domains (e.g. SH2, WD40, DnaJ) are known to have very different functions (12). These problems often hinder some of the clustering-based methods (16). In addition to the development of algorithms to overcome these problems (16), different approaches that combine or complement existing methods are being explored (3,9,17,19).
It is of interest to consider protein functional family classification as a method for facilitating protein function prediction, which is expected to be particularly useful in the cases described above and may thus be used as a protein function prediction tool to complement sequence alignment methods. Functional families of various proteins have been documented (2023). A method for the classification of proteins with diverse sequence distribution is also available. A statistical learning method, support vector machines (SVM) (24), has recently been used for classification of G-protein coupled receptors (25) and DNA-binding proteins (26). It has also been employed in a number of other protein studies including proteinprotein interaction prediction (15), fold recognition (27), solvent accessibility (28) and structure prediction (29,30). The prediction accuracy ranges from 65 to 91.4% in these studies. Thus SVM classification of protein functional family may be potentially developed into a protein function prediction tool to complement methods based on sequence similarity and clustering.
Instead of direct comparison or clustering of sequences, SVM classification is based on the analysis of physicochemical properties of a protein generated from its sequence (2530). Samples of proteins known to be in a functional class (positive samples) and those not in the class (negative samples) are used to train a SVM system to recognize specific features and classify proteins into either the functional class or outside of the class. Such an approach may be applied to functional prediction for both distantly-related and closely-related proteins. Proteins of specific functional class share common structural and chemical features essential for performing similar functions (2022). Given sufficient samples of proteins of specific function, SVM can be trained and used to recognize proteins with characteristics for a particular function (15,25,26).
We have developed a web-based software, SVMProt, for the classification of a protein into functional class from its primary sequence. The functionally distinguished classes of proteins are collected from several databases (2023,31,32) that include all major classes of enzymes, receptors, transporters, channels, DNA-binding proteins and RNA-binding proteins. The core SVM program used in SVMProt is SVM
which has recently been developed and tested for the classification of DNA-binding proteins (26). SVMProt is specifically trained and tested on each of the functional classes currently collected. Its usefulness on protein functional classification is evaluated. Its capability in the classification of distantly related proteins and homologous proteins of different function is also studied.
| SOFTWARE ACCESS |
|---|
|
|
|---|
The SVMProt web page is at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi and it is shown in Figure 1. The sequence of a protein, in RAW format and containing no non-amino acid letters, can be input in a window provided. A sequence of less than 50 amino acids is not accepted. The computed result is displayed in a separate window as shown in Figure 2. Depending on the computed result, one of the following four outcomes is displayed. If the input protein is predicted to belong to one or more functional families, then the name of each family is displayed. For some protein families, a cross-link to the respective protein family database is provided and that of more families will be added. If the input protein is predicted to not belong to any of the functional classes currently included in SVMProt, then a message of Your input protein is not in any of the functional classes currently covered by SVMProt is displayed. If the input sequence contains invalid characters or abnormal composition such as a long stretch of consecutive single letters, then a message of invalid character ... or your input sequence is not a valid sequence is displayed. If the input sequence is less than 50 amino acids, then a message of your input sequence is less than 50 amino acids is displayed.
|
|
| METHODS |
|---|
|
|
|---|
Table 1 lists the protein functional families currently covered by SVMProt. These include 46 families of enzymes from BRENDA (20), G-protein coupled receptors from GPCRDB (21), nuclear receptors from NucleaRDB (21), tyrosine receptor kinases derived from NCBI (31), five families of channels and one family of transporters from TCDB (22) and LGICdb (23) and DNA- and RNA-binding proteins derived from SWISS-PROT (32). Additional families of transporters will be added very soon. Other families of proteins are being searched and collected. The updated list of functional classes is provided in the SVMProt web page.
|
SVMProt is trained for protein classification in the following manner. First, every protein sequence is represented by specific feature vector assembled from encoded representations of tabulated residue properties including amino acid composition, hydrophobicity, normalized Van der Waals volume, polarity, polarizability, charge, surface tension, secondary structure and solvent accessibility for each residue in the sequence (15,2530). Three descriptors, composition (C), transition (T) and distribution (D), are used to describe global composition of each of these properties (33). C is the number of amino acids of a particular property (such as hydrophobicity) divided by the total number of amino acids. T characterizes the percent frequency with which amino acids of a particular property is followed by amino acids of a different property. D measures the chain length within which the first, 25, 50, 75 and 100% of the amino acids of a particular property is located respectively.
A hypothetical protein sequence AEAAAEAEEAAAAAEAEEEAAEEAEEEAAE, as shown in Figure 3, has 16 alanines (n1=16) and 14 glutamic acids (n2=14). The composition for these two amino acids are n1x100.00/(n1+n2)=53.33 and n2x100.00/(n1+n2)=46.67 respectively. There are 15 transitions from A to E or from E to A in this sequence and the percent frequency of these transitions is (15/29)x100.00= 51.72. The first, 25, 50, 75 and 100% of As are located within the first 1, 5, 12, 20 and 29 residues, respectively. The D descriptor for As is thus 1/30x100.00=3.33, 5/30x 100.00=16.67, 12/30x100.00=40.0, 20/30x100.00= 66.67, 29/30x100.00=96.67. Likewise, the D descriptor for Es is 6.67, 26.67, 60.0, 76.67, 100.0. Overall, the amino acid composition descriptors for this sequence are C=(53.33, 46.67), T=(51.72) and D=(3.33, 16.67, 40.0, 66.67, 96.67, 6.67, 26.67, 60.0, 76.67, 100.0), respectively.
|
Descriptors for other properties can be computed by a similar procedure and all the descriptors are combined to form the feature vector. In most studies, amino acids are divided into three classes for each property and thus the three descriptors for each property consist of 21 elements: three for C, three for T and 15 for D (15,2530,33).
SVMProt is fed and trained with examples of proteins of a particular functional family (positive samples) and those that do not belong to this family (negative samples). The feature vectors of these positive and negative samples are input into the SVMProt system. The trained SVMProt system can then be used to classify a protein into either the positive group (protein is predicted to be in the family) or the negative group (protein is predicted to not belong to the family). Because protein feature vectors describe global composition of various physicochemical properties, SVMProt cannot address such questions as which part of a protein sequence is likely to match with a protein family.
All distinct protein members in each family found by us are used to construct positive samples for training SVMProt. More proteins are being searched which will be added in training and testing SVMProt. The negative samples for training are selected from seed proteins of the curated protein families in the Pfam database (34) excluding those that belong to the family under study. Training sets of both positive and negative samples are further screened so that only essential proteins that optimally represent each class are retained. The SVMProt training system for each family is optimized and tested by using separate testing sets of both positive and negative samples. While possible, all the remaining distinct proteins in each functional family (not in the training set of that family) are used as positive samples and all the remaining representative seed proteins in Pfam curated families are used to construct negative samples in a testing set. The performance of SVMProt classification is further evaluated by using independent sets of both positive and negative samples. There is no duplicate protein in each training, testing or independent evaluation set. The number of both positive and negative samples of proteins for the training, testing and independent evaluation sets of every functional class is given in Table 1.
The theory of SVM had been described in the literature (15,2430). Thus only a brief description is given here. SVM is based on the structural risk minimization (SRM) principle from statistical learning theory (24). In linearly separable cases, SVM constructs a hyperplane which separates two different groups of feature vectors with a maximum margin. A feature vector is represented by xi, with physicochemical descriptors of a protein as its components. The hyperplane is constructed by finding another vector w and a parameter b that minimizes ||w||2 and satisfies the following conditions:
![]() |
![]() |
![]() |
![]() |
![]() |
i0 and b are determined by maximizing the following Langrangian expression:
![]() |
![]() |
Scoring of SVM classification of proteins has been estimated by a reliability index and its usefulness has been demonstrated by statistical analysis (29). A slightly modified reliability score, R-value, is used in SVMProt:
![]() |
|
As in the case of all discriminative methods (24,35), the performance of SVMProt classification can be measured by the quantity of true positives (TP), true negatives (TN), false positives (FP), false negatives (FN) and the overall accuracy (Q) given below:
![]() |
| RESULTS AND REMARKS |
|---|
|
|
|---|
The results for the classification of each of the functional classes are given in Table 1. All the computed TP, TN, FP, FN and Q are given in the table. The overall accuracy Q of protein classification ranges from 69.1 to 99.6%, which is on average slightly improved from that obtained in other SVM studies of proteins (15,2430). One possible reason for this improvement is the use of representative proteins of Pfam curated families as negative samples for SVM classification, which provides a more comprehensive sampling of proteins not in a functional class.
Some low sequence similarity proteins share similar function (3638). Efforts have been directed at exploration of various novel approaches in predicting the function of these distantly related proteins (16,37,39). SVMProt is tested on 24 randomly selected distantly related proteins in seven families. Sequence similarity E-value for each of these proteins from BLAST search against most members of its family is significantly higher than the commonly accepted value of 0.05 for similarity proteins. Thus alignment methods may not work well for these proteins. Fourteen proteins are correctly classified by SVMProt, which accounts for 58.3% of all distantly related proteins studied. This suggests that, to a certain extent, SVMProt is useful for the classification of distantly related proteins.
Homologous proteins do not necessarily have analogous function (9) and there are certain levels of difficulty to distinguish them using sequence alignment methods. SVMProt is tested to four pairs of homologous proteins of different families and the results are shown in Table 2. While all eight proteins are correctly classified into their respective family, only five of them are not classified into the family of their respective homolog, representing 62.5% of all the homologous proteins examined. This limited study seems to indicate that SVMprot has a certain degree of capability for classification of homologous proteins of different functions. Further analysis is needed to provide a more objective assessment.
|
The ability of SVMProt in the classification of some distantly related proteins and homologous proteins of different functions probably results from the use of a combination of physicochemical properties to represent a protein. Protein function is determined by specific structural and chemical features at substrate binding sites (20). Some of these function-related features might be captured by the residue properties such as hydrophobicity, normalized Van der Waals volume, polarity, polarizability, charge, surface tension, secondary structure and solvent accessibility which are used in the construction of the SVMProt feature vectors for proteins.
As shown in Table 1, there are several families with substantially high Q score (
90%) but relatively modest TP : FN ratio (<100 : 37). Generally, SVMProt gives an accurate prediction of TNs. The imbalance between the number of proteins in a family and those outside of the family may thus lead to cases of high Q score with modest TP : FN ratio. Examination of FN proteins of these families shows that many of these proteins either belong to more than one family or contain a domain shared by proteins in another family. These proteins are often classified into the related family. An analysis of a broad range of families indicates that a substantial portion (61.3%) of incorrectly classified proteins are of low sequence similarity to most of the other members in its family (i.e. the sequence similarity score E value of each of these proteins against most members of its family is significantly higher than 0.05). The percentage of low sequence similarity proteins in a family is not expected to be very high. Therefore, our study seems to suggest that sequence distance has a certain level of influence on the accuracy of SVM classification.
Several factors may affect the prediction accuracy. One is the diversity of protein samples. It is likely that not all possible types of proteins are adequately represented in some functional classes. This can be improved along with the availability of more protein data. SVM prediction may be further improved by using more comprehensive and refined set of protein descriptors. The SVM optimization procedure and feature vector selection algorithm may also be improved by adding additional constraints and by incorporating independent component analysis and kernel PCA in the preprocessing steps.
Our study suggests that SVM has potential in the classification of proteins into functional families. SVMProt appears to have a certain level of capability for classification of distantly related proteins and homologous proteins of different functions and, thus, potentially may be used as a protein function prediction tool that complements sequence alignment methods. Further improvements on protein functional family coverage, sample collection and SVM algorithm may enable the development of SVMProt into a useful protein function prediction tool.
| REFERENCES |
|---|
|
|
|---|
- Eisenberg,D., Marcotte,C.A., Xenarios,I. and Yeates,T.O. (2000) Protein function in the post-genomic era. Nature, 405, 823826.[CrossRef][Medline]
- Bork,P., Dandekar,T., Diaz-Lazcoz,Y., Eisenhaber,F., Huynen,M. and Yuan,Y. (1998) Predicting function: from genomes and back. J. Mol. Biol., 283, 707725.[CrossRef][Web of Science][Medline]
- Pellegrini,M. (2001) Computational methods for protein function analysis. Curr. Opin. Chem. Biol., 5, 4650.[CrossRef][Web of Science][Medline]
- Teichman,S.A. and Mitchison,G. (2000) Computing protein function. Nat. Biotechnol., 18, 27.[CrossRef][Web of Science][Medline]
- Huynen,M., Snel,B., Lathe,W. and Bork,P. (2000) Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res., 10, 12041210.
[Abstract/Free Full Text] - Bork,P. and Koonin,E.V. (1998) Predicting functions from protein sequenceswhere are the bottlenecks? Nature Genet., 18, 313318.[CrossRef][Web of Science][Medline]
- Baxevanis,A.D. (1998) Practical aspects of multiple sequence alignment. Methods Biochem. Anal., 39, 172188.[Web of Science][Medline]
- Schuler,G.D. (1998) Sequence alignment and database searching. Methods Biochem. Anal., 39, 145171.[Web of Science][Medline]
- Benner,S.A., Chamberlin,S.G., Liberles,D.A., Govindarajan,S. and Knecht,L. (2000) Functional inferences from reconstructed evolutionary biology involving rectified databasesan evolutionarily grounded approach to functional genomics. Res. Microbiol., 151, 97106.[Medline]
- Eisen,J.A. (1998) Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res., 8, 163167.
[Free Full Text] - Teichmann,S.A., Murzin,A.G. and Chothia,C. (2001) Determination of protein function, evolution and interactions by structural genomics. Curr. Opin. Struct. Biol., 11, 354363.[CrossRef][Web of Science][Medline]
- Marcotte,E.M., Pellegrini,M., Ng,H.L., Rice,D.W., Yeates,T.O. and Eisenberg,D. (1999) Detecting protein function and proteinprotein interactions from genome sequences. Science, 285, 751753.
[Abstract/Free Full Text] - Enright,A.J., Iliopoulos,I., Kyrpides,N. and Ouzounis,C.A. (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature, 402, 8690.[CrossRef][Medline]
- Aravind,L. (2000) Guilt by association: contextual information in genome analysis. Genome Res., 10, 10741077.
[Free Full Text] - Bock,J.R. and Gough,D.A. (2001) Predicting proteinprotein interactions from primary structure. Bioinformatics, 17, 455462.
[Abstract/Free Full Text] - Enright,A.J., Van Dongen,S.V. and Ouzounis,C.A. (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res., 30, 15751584.
[Abstract/Free Full Text] - Enright,A.J. and Ozounis,C.A. (2000) GeneRage: a robust algorithm for sequence clustering and domain detection. Bioinformatics, 16, 451457.
[Abstract/Free Full Text] - Henikoff,S., Greene,E.A., Pietrokovski,S., Bork,P., Attwood,T.K. and Hood,L. (1997). Gene families: the taxonomy of protein paralogs and chimeras. Science, 278, 609614.
[Abstract/Free Full Text] - Ponting,C.P. (2001) Issues in predicting protein function from sequence. Brief Bioinform., 2, 1929.
[Abstract/Free Full Text] - Schomburg,I., Chang,A. and Schomburg,D. (2002) BRENDA, enzyme data and metabolic information. Nucleic Acids Res., 30, 4749.
[Abstract/Free Full Text] - Horn,F., Vriend,G. and Cohen,F.E. (2001) Collecting and harvesting biological data: the GPCRDB and NucleaRDB information systems. Nucleic Acids Res., 29, 346349.
[Abstract/Free Full Text] - Saier,M.H.Jr (2000) A functional-phylogenetic classification system for transmembrane solute transporters. Microbiol. Mol. Biol. Rev., 64, 354411.
[Abstract/Free Full Text] - Le Novere,N. and Changeux,J.-P. (2001) LGICdb: the ligand-gated ion channel database. Nucleic Acids Res., 29, 294295.
[Abstract/Free Full Text] - Burges,C.J.C. (1998) A tutorial on Support Vector Machine for pattern recognition. Data Min. Knowl. Disc., 2, 121167.[CrossRef]
- Karchin,R., Karplus,K. and Haussler,D. (2002) Classifying G-protein coupled receptors with support vector machines. Bioinformatics, 18, 147159.
[Abstract/Free Full Text] - Cai,C.Z., Wang,W.L. and Chen,Y.Z. (2003) Support Vector Machine classification of physical and biological datasets. Inter. J. Mod. Phys. C., in press.
- Ding,C.H.Q. and Dubchak,I. (2001) Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics, 17, 349358.
[Abstract/Free Full Text] - Yuan,Z., Burrage,K. and Mattick,J.S. (2002) Prediction of protein solvent accessibility using support vector machines. Proteins, 48, 566570.[CrossRef][Web of Science][Medline]
- Hua,S.J. and Sun,Z.R. (2001) A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J. Mol. Biol., 308, 397407.[CrossRef][Web of Science][Medline]
- Cai,Y.D., Liu,X.J., Xu,X.B. and Chou,K.C. (2002) Prediction of protein structural classes by support vector machines. Comput. Chem., 26, 293296.[CrossRef][Web of Science][Medline]
- Wheeler,D.L., Church,D.M., Federhen,S., Lash,A.E., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Sequeira,E., Tatusova,T.A. and Wagner,L. (2003) Database resources of the National Center for Biotechnology. Nucleic Acids Res., 31, 2833.
[Abstract/Free Full Text] - Boeckmann,B., Bairoch,A., Apweiler,R., Blatter,M.-C., Estreicher,A., Gasteiger,E., Martin,M.J., Michoud,K., O'Donovan,C., Phan,I. et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res., 31, 365370.
[Abstract/Free Full Text] - Dubchak,I., Muchnik,I., Holbrook,S.R. and Kim,S.-H. (1995) Prediction of protein folding class using global description of amino acid sequence. Proc. Natl Acad. Sci. USA, 92, 87008704.
[Abstract/Free Full Text] - Bateman,A., Birney,E., Cerruti,L., Durbin,R., Etwiller,L., Eddy,S.R., Griffiths-Jones,S., Howe,K.L., Marshall,M. and Sonnhammer,E.L. (2002) The Pfam protein families database. Nucleic Acids Res., 30, 276280.
[Abstract/Free Full Text] - Baldi,P., Brunak,S., Chauvin,Y., Anderson,C.A.F. and Nielsen,H. (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 16, 412419.
[Abstract/Free Full Text] - Nagano,N., Porter,C.T. and Thornton,J.M. (2001) The (betaalpha)(8) glycosidases: sequence and structure analyses suggest distant evolutionary relationships. Protein Eng., 14, 845855.
[Abstract/Free Full Text] - Frishman,D. and Argos,P. (1992) Recognition of distantly related protein sequences using conserved motifs and neural networks. J. Mol. Biol., 228, 951962.[CrossRef][Web of Science][Medline]
- Miyata,Y. and Nishida,E. (1999) Distantly related cousins of MAP kinase: biochemical properties and possible physiological functions. Biochem. Biophys. Res. Commun., 266, 291295.[CrossRef][Web of Science][Medline]
- Yang,A.S. (2002) Structure-dependent sequence alignment for remotely related proteins. Bioinformatics, 18, 16581665.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
M. Lin, B. Hu, L. Chen, P. Sun, Y. Fan, P. Wu, and X. Chen Computational Identification of Potential Molecular Interactions in Arabidopsis Plant Physiology, September 1, 2009; 151(1): 34 - 46. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Zhu, L. Han, C. Zheng, B. Xie, M. T. Tammi, S. Yang, Y. Wei, and Y. Chen What Are Next Generation Innovative Therapeutic Targets? Clues from Genetic, Structural, Physicochemical, and Systems Profiles of Successful Targets J. Pharmacol. Exp. Ther., July 1, 2009; 330(1): 304 - 315. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Cui, Q. Liu, D. Puett, and Y. Xu Computational prediction of human proteins that can be secreted into the bloodstream Bioinformatics, October 15, 2008; 24(20): 2370 - 2375. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-L. Faulon, M. Misra, S. Martin, K. Sale, and R. Sapra Genome scale enzyme metabolite and drug target interaction predictions using the signature molecular descriptor Bioinformatics, January 15, 2008; 24(2): 225 - 233. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-R. Xu, J.-X. Zhang, B.-C. Han, L. Liang, and Z.-L. Ji CytoSVM: an advanced server for identification of cytokine-receptor interactions Nucleic Acids Res., July 13, 2007; 35(suppl_2): W538 - W542. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Fujishima, M. Komasa, S. Kitamura, H. Suzuki, M. Tomita, and A. Kanai Proteome-Wide Prediction of Novel DNA/RNA-Binding Proteins Using Amino Acid Composition and Periodicity in the Hyperthermophilic Archaeon Pyrococcus furiosus DNA Res, June 15, 2007; (2007) dsm011v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. R. Li, H. H. Lin, L. Y. Han, L. Jiang, X. Chen, and Y. Z. Chen PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W32 - W37. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Zheng, L. Y. Han, C. W. Yap, Z. L. Ji, Z. W. Cao, and Y. Z. Chen Therapeutic targets: progress of their exploration and investigation of their characteristics. Pharmacol. Rev., June 1, 2006; 58(2): 259 - 279. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. DeBolt, D. R. Cook, and C. M. Ford L-Tartaric acid synthesis from vitamin C in higher plants PNAS, April 4, 2006; 103(14): 5608 - 5613. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. H. Lin, L. Y. Han, H. L. Zhang, C. J. Zheng, B. Xie, and Y. Z. Chen Prediction of the functional class of lipid binding proteins from sequence-derived properties irrespective of sequence similarity J. Lipid Res., April 1, 2006; 47(4): 824 - 831. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Solan, D. Horn, E. Ruppin, and S. Edelman Unsupervised learning of natural languages PNAS, August 16, 2005; 102(33): 11629 - 11634. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Y. Han, C. Z. Cai, Z. L. Ji, Z. W. Cao, J. Cui, and Y. Z. Chen Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach Nucleic Acids Res., December 7, 2004; 32(21): 6437 - 6444. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. W. Yap, C. Z. Cai, Y. Xue, and Y. Z. Chen Prediction of Torsade-Causing Potential of Drugs by Support Vector Machine Approach Toxicol. Sci., May 1, 2004; 79(1): 170 - 177. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Y. HAN, C. Z. CAI, S. L. LO, M. C.M. CHUNG, and Y. Z. CHEN Prediction of RNA-binding proteins from primary sequence by a support vector machine approach RNA, March 1, 2004; 10(3): 355 - 368. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






















