Statistical HLA type imputation from large and heterogeneous datasetsReport as inadecuate

Statistical HLA type imputation from large and heterogeneous datasets - Download this document for free, or read online. Document in PDF available to download.

Reference: Alexander Dilthey, (2012). Statistical HLA type imputation from large and heterogeneous datasets. DPhil. University of Oxford.Citable link to this page:


Statistical HLA type imputation from large and heterogeneous datasets

Abstract: An individual's Human Leukocyte Antigen (HLA) type is an essential immunogenetic parameter, influencing susceptibility to a variety of autoimmune and infectious diseases, to certain types of cancer and the likelihood of adverse drug reactions.I present and evaluate two models for the accurate statistical determination of HLA types for single-population and multi-population studies, based on SNP genotypes. Importantly, SNP genotypes are already available for many studies, so that the application of the statistical methods presented here does not incur any extra cost besides computing time.HLA*IMP:01 is based on a parallelized and modified version of LDMhc (Leslie et al., 2008), enabling the processing of large reference panels and improving call rates. In a homogeneous single-population imputation scenario on a mainly British dataset, it achieves accuracies (posterior predictive values) and call rates >=88% at all classical HLA loci (HLA-A, HLA-B, HLA-C, HLA-DQA1, HLA-DQB1, HLA-DRB1) at 4-digit HLA type resolution.HLA*IMP:02 is specifically designed to deal with multi-population heterogeneous reference panels and based on a new algorithm to construct haplotype graph models that takes into account haplotype estimate uncertainty, allows for missing data and enables the inclusion of prior knowledge on linkage disequilibrium. It works as well as HLA*IMP:01 on homogeneous panels and substantially outperforms it in more heterogeneous scenarios. In a cross-European validation experiment, even without setting a call threshold, HLA*IMP:02 achieves an average accuracy of 96% at 4-digit resolution (>=91% for all loci, which is achieved at HLA-DRB1). HLA*IMP:02 can accurately predict structural variation (DRB paralogs), can (to an extent) detect errors in the reference panel and is highly tolerant of missing data. I demonstrate that a good match between imputation and reference panels in terms of principal components and reference panel size are essential determinants of high imputation accuracy under HLA*IMP:02.

Digital Origin:Born digital Type of Award:DPhil Level of Award:Doctoral Awarding Institution: University of Oxford Notes:Please note that 2 figures have been removed from this version of the thesis for copyright reasons.


Prof Gil McVeanMore by this contributor


 Bibliographic Details

Issue Date: 2012

Copyright Date: 2012 Identifiers

Urn: uuid:1bca18bf-b9d5-4777-b58e-a0dca4c9dbea Item Description

Type: thesis;

Language: en Keywords: HLA Human Leukocyte Antigen MHC major histocompatibility complex imputation prediction autoimmune immunology graph population geneticsSubjects: Genetics (life sciences) Bioinformatics (life sciences) Immunodiagnostics Immunology Mathematical genetics and bioinformatics (statistics) Statistics (see also social sciences) Tiny URL: ora:6485


Author: Alexander Dilthey - institutionUniversity of Oxford facultyMathematical,Physical and Life Sciences Division - Statistics research



Related documents