Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein–Friesian cattleReport as inadecuate

Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein–Friesian cattle - Download this document for free, or read online. Document in PDF available to download.

Genetics Selection Evolution

, 48:95

First Online: 01 December 2016Received: 14 July 2016Accepted: 24 November 2016


BackgroundWhole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study GWAS with imputed whole-genome sequence data.

MethodsPhenotypes were available for 5503 Holstein–Friesian bulls. Genotypes were imputed up to whole-genome sequence 13,789,029 segregating DNA variants by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield PY, somatic cell score SCS and interval from first to last insemination IFL. From the GWAS, subsets of variants were selected and genomic relationship matrices GRM were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants.

ResultsThe GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants less than 0.31 for all traits. When selected variants were used, accuracy of genomic predictions decreased and bias increased.

ConclusionsAlthough 35 to 42 variants were detected that together explained 13 to 19% of the total variance 18 to 23% of the genetic variance when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified.

Electronic supplementary materialThe online version of this article doi:10.1186-s12711-016-0274-1 contains supplementary material, which is available to authorized users.

Download fulltext PDF

Author: Roel F. Veerkamp - Aniek C. Bouwman - Chris Schrooten - Mario P. L. Calus


Related documents