Skip Navigation

This Article
Right arrow Print PDF (648K)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Landés, C.
Right arrow Articles by Risler, J.-L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Landés, C.
Right arrow Articles by Risler, J.-L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 1992, Vol. 20, No. 14 3631-3637
© 1992


ENZYMOLOGY

A comparison of several similarity indices used in the classification of protein sequences: a multivariate analysis

Claudine Landés, Alain Hénaut and Jean-Loup Risler

Centre de Génétique Moléculaire du CNRS 91198 Gif sur Yvette Cedex, France

Received April 6, 1992. Accepted June 26, 1992.

The present work describes an attempt to identify reliable criteria which could be used as distance Indices between protein sequences. Seven different criteria have been tested: i and II) the scores of the alignments as given by the BESTFIT and the FASTA programs; ill) the ratio parametrer, i.e. the BESTFIT score divided by the length of the aligned peptides; iv and v) the statistical significance (Z-scores) of the scores calculated by BESTFIT and FASTA, as obtained by comparison with shuffled sequences; vi) the Z-scores provided by the program RELATE which performs a segment-by-segment comparison of 2 sequences, and vii) an original distance index calculated by the program DOCMA from all the pairwise dotplots between the sequences. These 7 criteria have been tested against the aminoacid sequences of 39 globins and those of the 20 aminoacyl-tRNA synthetases from E. coli. The distances between the sequences were analyzed by the multivariate analysis techniques. The results show that the distances calculated from the scores of the pairwise alignments are not adequately sensitive. The Z-score from RELATE is not selective enough and too demanding in computer time. Three criteria gave a classification consistent with the known similarities between the sequences in the sets, namely the Z-scores from BESTFIT and FASTA and the multiple dotplot comparison distance index from DOCMA.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.