Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit AllReportar como inadecuado

Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit All - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Comparative and Functional Genomics - Volume 2 2001, Issue 1, Pages 4-9

Short Communication

Department of Biomolecular Sciences, UMIST, PO Box 88, Manchester M60 1QD, UK

School of Biological Sciences, University of Manchester, Oxford Road, Manchester M13 9PT, UK

Copyright © 2001 Hindawi Publishing Corporation. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Bioinformatic tools have become essential to biologists in their quest to understand the vastquantities of sequence data, and now whole genomes, which are being produced at an everincreasing rate. Much of these sequence data are single-pass sequences, such as samplesequences from organisms closely related to other organisms of interest which have alreadybeen sequenced, or cDNAs or expressed sequence tags ESTs. These single-pass sequencesoften contain errors, including frameshifts, which complicate the identification ofhomologues, especially at the protein level. Therefore, sequence searches with this type ofdata are often performed at the nucleotide level. The most commonly used sequence searchalgorithms for the identification of homologues are Washington University’s and theNational Center for Biotechnology Information’s NCBI versions of the BLAST suites oftools, which are to be found on websites all over the world. The work reported hereexamines the use of these tools for comparing sample sequence datasets to a knowngenome. It shows that care must be taken when choosing the parameters to use with theBLAST algorithms. NCBI’s version of gapped BLASTn gives much shorter, andsometimes different, top alignments to those found using Washington University’s versionof BLASTn which also allows for gaps, when both are used with their default parameters.Most of the differences in performance were found to be due to the choices of defaultparameters rather than underlying differences between the two algorithms. WashingtonUniversity’s version, used with defaults, compares very favourably with the results obtainedusing the accurate but computationally intensive Smith–Waterman algorithm.

Autor: K. Cara Woodwark, Simon J. Hubbard, and Stephen G. Oliver



Documentos relacionados