School of BioSciences - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 3 of 3
  • Item
    Thumbnail Image
    Exploring the cancer transcriptome with novel bioinformatics approaches
    Schmidt, Breon Michael ( 2022)
    Currently three out of every 10 deaths within Australia will be a direct consequence of cancer. Cancer is a complex and genetically heterogeneous disease that is, as a consequence, effectively unique to each individual. However, there are common driving events, phenotypes, and risks that can segregate cancer within tumour types and subtypes. These groupings are beneficial as they can both inform treatment regimes and yield new targets for pharmaceutical development. Next Generation Sequencing (NGS) of RNA has enabled measurement of the abundance and makeup of a sample’s transcriptome, which through bioinformatics analysis, can reveal the rich interplay between genetic mutations and their functional and phenotypic consequences. This thesis focuses on three key transcriptome projects. The first project developed the ALLSorts software which is the first publicly available and open-source classifier for determining subtypes of B-Cell Acute Lymphoblastic Leukemia (B-ALL). The purpose of this tool is to provide researchers with an accurate method for using transcriptome data to quickly label B-ALL samples according to 18 subtypes. Subtyping is becoming part of clinical standard-of-care, informing targeted pharmaceutical treatment and/or treatment intensity. The second project, Slinker, is a publicly available and open source visualisation tool that can be applied to any gene that highlights splicing variation between a case and controls. Novel splicing is regularly observed across a variety of diseases, including cancer, and can lead to a significant alteration of the final transcript, possibly transforming it into a pathogenic driver. Slinker is novel in that it utilises the superTranscritome method to create succinct visualisations by removing redundant features. The final project in this thesis is an analysis of the utility of long read transcripts as a transcriptomic reference, specifically within a spatial context. Three references were compared: the hg38 reference transcriptome, the long reads themselves as a reference, and both combined. Each had gene expression quantified through highly accurate, short read technology. The combined reference resulted in both a higher mapping rate and novel expressed sequences, of which one belongs to a gene that is a known prognostic marker for the oropharyngeal head and neck cancers that this method was applied to.
  • Item
    Thumbnail Image
    Genetic Variation Within an Indigenous Australian Cohort and its Implications for Future Studies of Genomics, Health and Disease
    Silcocks, Matthew ( 2022)
    Gaining an understanding of the genetic characteristics of human populations is important for establishing approaches to and expectations from future studies of genomics, health and disease involving these groups. While genomic studies have recently expanded in scope to sample from a wide range of human ancestry groups, the Indigenous communities of Australia remain poorly characterised, and under-represented in global reference panels. Our failure to gain an understanding of patterns of genetic variation within these communities, and how they differ from other human groups, may widen the already considerable gap in health outcomes between Indigenous Australians and the general Australian population. To address this issue, the National Centre for Indigenous Genomics (NCIG) has collected genomic data from four Indigenous Australian communities from across a wide expanse of the continent. This thesis will describe the analysis of the patterns of genetic variation and diversity within this dataset, and will emphasise how they underpin future research of genomics, health and disease for Indigenous Australians. Firstly, this thesis will describe various forms of analysis aiming to identify the source and quantify the degree of non-Indigenous admixture within the NCIG dataset, and detail approaches to ‘mask’ these regions, and analyse exclusively the Indigenous component of each genome. After producing this masked dataset, subsequent analysis will explore various aspects of Indigenous Australian population variation and diversity relevant to future studies of genomics, health and disease. In particular, it will explore patterns of ‘population structure’ within Indigenous Australian groups, and compare these to patterns observed within human cohorts separated by comparable distances within other regions of the world. The medical and genomic implications of the immense degree of structure, haplotype and rare allele sharing within the Indigenous Australian communities will then be discussed. Subsequently, this thesis will analyse Indigenous Australian genomic variation within the context of worldwide human populations. By analysing the NCIG dataset alongside a diverse global cohort, this investigation will show the high abundance of novel genetic variation within these communities, and will emphasise additional genetic characteristics relevant to the design of future studies involving Indigenous Australian genomes. More demographically oriented analysis, involving Indigenous Australian communities, and groups from the surrounding Oceanic region, will provide context for the key findings presented within this thesis. This data will reveal a previously undocumented history of genetic interaction between the populations of Melanesia and northern Australia, and will show Indigenous Australian communities to have sustained small, yet stable population sizes over recent millennia. This thesis will close with an analysis of patterns of uniparental genetic variation within the dataset, and will assess the ways this data supports autosomal based inferences. In particular, this chapter will highlight the presence of a globally rare, and presumably deleterious Y-chromosome variant, which is present at near fixation within the Tiwi Island community.
  • Item
    Thumbnail Image
    Bioinformatics methods and approaches to discover disease variants from DNA sequencing data
    Dashnow, Harriet ( 2019)
    Next-generation sequencing is increasingly used to diagnose patients with suspected genetic disease. Yet, even after exome or whole genome sequencing, many patients remain undiagnosed. In many cases a genetic diagnosis is not made because we either failed to detect the causal variant, or succeeded in detecting it, but failed to identify it as causative. There is a clear need to develop novel bioinformatics methods and sequencing strategies to address these shortcomings and to increase diagnostic rates. In this thesis I develop several strategies to address these issues. I propose a pooled-parent exome sequencing approach to prioritise de novo variants for genetic disease diagnosis. In this strategy, a set of probands have individual exome sequencing, while the DNA from all the parents of the probands are pooled, exome captured and sequenced together. The variants called in this pool are used to filter out inherited variants in the probands so the remaining list is enriched for de novo variants. Short Tandem Repeat (STR) expansions are a class of disease-causing variants that are frequently missed in short read sequencing data. Here I develop and validate STRetch, a new bioinformatics method to detect STR expansions using STR decoy chromosomes. I show that STRetch can be used to detect both known pathogenic STR expansions, and novel expansions at other annotated STR loci across the genome. I further use STRetch to explore variation across hundreds of individuals to inform our understanding of what is common variation and what is potentially pathogenic, to aid in prioritising STR variants in a gene-discovery setting. Some of the methods that I have developed and describe within this thesis have already been used to help patients receive a genetic diagnosis.