Analysis of whole-genome sequencing data from cell-free DNA in maternal plasma to detect fetal aneuploidy
Document TypePhD thesis
Access StatusThis item is currently not available from this repository
© 2015 Dr. Dineika Chandrananda
Since its discovery in 1948 there has been immense interest in circulating cell-free DNA as a source of genetic material that can be non-invasively obtained in a routine phlebotomy. With the rapid development of next-generation sequencing and the decreasing cost of associated reagents and platforms, the potential of cell-free DNA as a biomarker has been explored in diverse research areas such as in scanning the genetic landscape of tumors and the monitoring of organ transplant rejection. However, non-invasive prenatal testing has been the field that has experienced the greatest research reward including a subsequent translation to the clinic. A complex mixture of maternal and fetal DNA fragments circulates in the plasma of pregnant women and next-generation sequencing allows the high-resolution interrogation of this mixture to detect fetal chromosomal abnormalities. Prenatal screening based on cell-free DNA sequencing has been one of the most rapidly adopted genomic tests and is now widely available around the world. Current tests focus on detecting common autosomal trisomies such as Down Syndrome (trisomy 21) and aneuploidy conditions of sex chromosomes. Although these tests report high accuracy estimates, both false positive and false-negative results occur. Biological factors such as a low fetal DNA concentration and technical factors such as bias in the next-generation sequencing data have been implicated as the cause of some discordant results. However, recent reports show that the sequencing-based tests have facilitated a significant reduction in the number of invasive diagnostic procedures that have been performed. This thesis describes various statistical and bioinformatics analyses carried out on whole-genome sequencing data from cell-free DNA in maternal plasma. Most published studies focus on the diagnostic applications of cell-free DNA with only a few investigating its biological characteristics. The thesis first aims to expand the understanding of cell-free DNA by examining characteristics related to the biological cleavage processes that produce the fragments circulating in blood. This work has successfully uncovered novel sequence signatures associated with the origin of cell-free DNA along the human genome, the distribution of fragment sizes both between and within chromosomes and the nucleotide motifs stemming from the enzymatic cleaving of DNA. The secondary focus of this thesis is the implementation of bioinformatics algorithms that reduce the bias in next-generation sequencing data due to technical and biological factors such as GC content, genomic repeats and non-random fragmentation of DNA. Reducing the noise in maternal plasma sequencing data would facilitate the detection of any signal due to copy number changes in the fetal genome. The biological characteristics investigated under the first aim, such as fragment lengths and cleavage motifs are used to tailor the bias correction methods to fit the intricacies of cell-free DNA sequencing data. The methods applied in the thesis show substantial improvement in the sensitivity of trisomy 21 detection compared to the conventional bias correction protocols used in the literature. The proportion of fetal DNA in maternal plasma is one of the most important factors that affect the accuracy of cell-free DNA based tests. Estimating this factor prior to testing can identify samples with a fetal fraction too low to ensure a reliable result. The third and final aim for this thesis is to investigate an approach that can quantify fetal DNA directly from low coverage or ‘low-pass’ whole-genome sequencing data. The developed methodology exploits uneven sequence coverage due to the aforementioned technical and biological biases and genotypes known SNPs at regions with excess reads. A mixture model framework incorporates this allelic information and a maximum likelihood based algorithm is implemented to accurately calculate the proportion of fetal DNA.
Keywordsbioinformatics; statistics; genomics; computational biology; next generation sequencing; cell-free DNA; prenatal testing; clinical research; extracellular DNA; Down Syndrome; trisomy; aneuploidy; fetal aneuploidy
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References