InDel marker detection by integration of multiple softwares using machine learning techniquesReportar como inadecuado

InDel marker detection by integration of multiple softwares using machine learning techniques - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

BMC Bioinformatics

pp 1–11

Comparative genomics


BackgroundIn the biological experiments of soybean species, molecular markers are widely used to verify the soybean genome or construct its genetic map. Among a variety of molecular markers, insertions and deletions InDels are preferred with the advantages of wide distribution and high density at the whole-genome level. Hence, the problem of detecting InDels based on next-generation sequencing data is of great importance for the design of InDel markers. To tackle it, this paper integrated machine learning techniques with existing software and developed two algorithms for InDel detection, one is the best F-score method BF-M and the other is the Support Vector Machine SVM method SVM-M, which is based on the classical SVM model.

ResultsThe experimental results show that the performance of BF-M was promising as indicated by the high precision and recall scores, whereas SVM-M yielded the best performance in terms of recall and F-score. Moreover, based on the InDel markers detected by SVM-M from soybeans that were collected from 56 different regions, highly polymorphic loci were selected to construct an InDel marker database for soybean.

ConclusionsCompared to existing software tools, the two algorithms proposed in this work produced substantially higher precision and recall scores, and remained stable in various types of genomic regions. Moreover, based on SVM-M, we have constructed a database for soybean InDel markers and published it for academic research.

KeywordsInsertions and deletions InDel detection Evaluation AbbreviationsAFLPAmplified fragment length polymorphism

BF-MBest F-score method

CNVCopy number variation

DPRead depth

DSDetection software

InDelInsertions and deletions

LINELong interspersed nuclear element

LTRLong terminal repeat

RAPDRandom amplified polymorphism detection

RBFRadial basis function

RFLPRestriction fragment length polymorphism

RTType of the repeat region where the variation is located

SNPSingle nucleotide polymorphisms

SSVariation size

SSCPSingle-strand conformation polymorphism

SSRShort simple tandem repeats

STVariation type

SVStructural variation

SVMSupport vector machine

SVM-MSupport vector machine method

TIRTerminal inverted repeats

Download fulltext PDF

Autor: Jianqiu Yang - Xinyi Shi - Lun Hu - Daipeng Luo - Jing Peng - Shengwu Xiong - Fanjing Kong - Baohui Liu - Xiaohui Yuan


Documentos relacionados