Inference under the coalescent with recombination
AffiliationSchool of Mathematics and Statistics
Document TypePhD thesis
Access StatusOpen Access
© 2020 Ali Mahmoudi
Inferring the genealogical history, also known as the Ancestral Recombination Graph (ARG), of a set of DNA sequences has been a central challenge in population genetics for decades. Reconstructing the actual ARG simplifies many inference problems in population genetics. Many different methods have been proposed for inferring the ARG, most of which are limited in size and accuracy. The state-of-the-art probabilistic model, ARGweaver, provides substantial improvements over other methods but uses a discretized version of the Sequentially Markov Coalescent (SMC), which is an approximation of the Coalescent with Recombination (CwR) and ignores a significant amount of information in the ARG. In this thesis, I develop a novel Markov Chain Monte Carlo (MCMC) algorithm, implemented in the software ARGinfer, to perform probabilistic inference under the CwR. This method takes advantage of the superior properties of the Tree Sequence (TS), which is an efficient data structure to store the genealogical trees in an ARG so that the identical subtrees of the neighboring trees are recorded only once. I first devise a data structure to represent the ARG and the mutation information by augmenting the TS. Then, I develop a heuristic algorithm to construct an ARG consistent with the data used as an initial value for the MCMC algorithm. Computing both the prior (CwR model) and the likelihood under an approximation to the infinite sites model are relatively straightforward and fast. The challenging part is to explore the ARG space, for which I introduce a proposal distribution in the form of six transition types to rearrange both the topology and the event times. I demonstrate the utility of ARGinfer by applying it to simulated data sets. ARGinfer can accurately estimate many ARG-derived parameters such as the total branch length, number of recombination events, time to the most recent common ancestor, recombination rate, and allele ages. I also compare ARGinfer against ARGweaver. Since ARGinfer assumes a more complex evolutionary model than ARGweaver, it can infer a larger class of parameters. ARGinfer outperforms ARGweaver in estimating the recombination rate and is at least as accurate for other parameters that ARGweaver can infer. ARGinfer also accurately estimates parameters that ARGweaver cannot, such as the number of recombinations on trapped non-ancestral materials.
KeywordsThe Coalescent with Recombination; Ancestral Recombination Graph; Markov chain Monte Carlo; ARG Inference; Augmented Tree Sequence; Statistical Genetics; Population Genetics
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References