Medical Biology - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 343
  • Item
    Thumbnail Image
    Understanding protein variants with high-throughput mutagenesis and machine learning
    Fu, Yunfan ( 2023-10)
    Genetic variations in protein-coding genes may cause amino acid substitutions in the matured proteins. These variants can potentially change the properties and functions of a protein. To evaluate the effects of these protein variants, multiple experimental and computational approaches have been utilised. Within these approaches, deep mutational scanning (DMS), a recently developed high-throughput mutagenesis method, enables the measurement of thousands of protein variant effects in a single experiment. To fully investigate the rich information in DMS results and have a better understanding of protein variant effects, here, I leveraged machine learning algorithms to build advanced computational models for DMS data. First, I reviewed that there are missing variant effect data in most DMS results, and I developed imputation models to fill in the missing values. I started by investigating the correlations between the variant effects measured within a DMS experiment and used these correlations to build imputation models. To understand the strengths and weaknesses of these models, I benchmarked them with previously published DMS imputation methods. At the end of this study, I built an ensemble imputation model by combining these novel and previously published methods to further improve the imputation accuracy. Many of the state-of-the-art variant effect predictors are built with DMS data, and I then managed to improve these predictors by further integrating variant effect data from alanine scanning (AS), a low-throughput mutagenesis approach. In this study, I established a rule-based classification tree to evaluate the compatibility between DMS and AS studies according to the similarity of their experimental assays. I showed that an improved variant effect predictor could be built only by modelling with high compatibility DMS and AS data. Finally, experimental measurements of protein variant effects may conflate protein stability and function. Here, I explored this relationship using DMS-measured variant effects and computed variant stability. I demonstrated that the correlation between variant effect and stability data differed on distinct protein regions and properties measured. Analysing these data with a dimensional reduction algorithm, I was able to automatically distinguish protein residues with different scales of fitness–stability association. Further investigation showed that this approach might be applied to discover protein functional sites and explain the mechanisms of loss-of-function variants.
  • Item
    No Preview Available
    Statistical models for pre-processing and simulating single cell RNA-seq data
    Wang, Jianan ( 2023-10)
    Since the first protocol published in the year of 2009, single-cell RNA sequencing (scRNA-seq) has become one of the most popular technologies in the omics world. scRNA-seq has been broadly applied in different areas, including understanding tumor microenvironment, inferring embryonic development, and discovering regulation pathways involved in plant seeding. The diverse applications facilitate the development of the scRNA-seq, and new protocols tend to sequence numerous of cells at low cost. With rapid growth of scRNA-seq technology, many issues arise and wait to be resolved when processing the data, such as removing batch effects, addressing dropout events, and annotation of cell populations. However, the requirements to tackle the above issues bring opportunities to apply statistical methods and devise new computational tools for analysing the data. In this thesis, we focus on pre-processing and simulation of single cell data. In the first part of the thesis, we mainly discuss different demultiplexing methods in the pre-processing step. The first method, CMDdemux, utilizes Mahalanobis distance to distinguish cells from different samples. This method performs well in low-quality data with hashtag contaminations during the cell hashing experiment. The second method is LCADemux, which uses latent class analysis (LCA) to combine cell hashing- and single nucleotide polymorphism (SNP)-based demultiplexing results. This hybrid framework is advantageous for analysing low input cell hashing data. The third method, LCAdoublet, applies the LCA method to combine doublet information from cell hashing and VDJ data. LCAdoublet is better at identifying inter-sample doublets and T-cell doublets than methods solely using transcriptomics data. In summary, our three pre-processing methods enable the accurate identification of doublets and assigning cells to their samples of origins, which contributes to cleaner data for downstream analysis. In the second part of the thesis, we talk about a novel single cell simulation method and its applications for data analysis. Our method is named GLMsim, and it applies a generalized linear model to simulate the batch and biological effects simultaneously. Compared to other methods, our method is able to simulate single cell data resembling the original data, especially for the data collected under complex conditions. Our single cell simulators have multiple applications, such as benchmarking single cell integration methods, providing guidance for differential expression analysis and checking the assumptions of models. In short, our simulation method should help researchers develop better tools for downstream analysis. Overall, we applied multiple statistical methods to fill gaps existing in dealing with single cell RNA-seq data. Our methods have been demonstrated to overcome challenges when analysing single cell RNA-seq data. We hope that they can be applied in the medical field to extend our knowledge of human health and disease.
  • Item
    Thumbnail Image
    Structural and Pharmacological Investigation of the JAK/SOCS1 Interaction
    Wu, Yuntong ( 2023-09)
    Cytokines are important signalling molecules which can cause inflammation, regulate haematopoiesis and modulate the immune response. Over 50 cytokines achieve their biological functions on cells via the JAK/STAT pathway. SOCS (Suppressors of Cytokine Signalling) family are induced by the cytokine-JAK-STAT cascade and act as negative feedback regulators to specific cytokine pathway. The existence of the SOCS family prevents cells from persistent activation of cytokine signalling but can also play a role in cytokine resistance and tumour progression. SOCS1, a member of SOCS family, is specifically induced by IFN-gamma. And the SH2 domain and the KIR domain are critical for SOCS1’s function. The SH2 domain is responsible for protein-protein interactions by specifically targeting pTyr-containing substrates. However, the structural details of the SOCS1 SH2 domain interacting with its targets and how this interaction happens in vivo remained unclear. Our work determined the crystal structures of SOCS1 SH2 domain bound to various ligands, which uncovered the molecular interactions within this domain. Additionally, small molecules with potentially high affinity for targeting the SOCS1 SH2 domain were identified through structure-guided drug design. The KIR domain of SOCS1 can insert into the substrate-binding pocket and directly inhibit the catalytic activity of JAK1. Our research also attempted to develop novel type of JAK inhibitors based on the binding mechanism of KIR domain. These findings provided insights into the interactions between JAK and SOCS1 and also highlighted the potential of developing small molecules to disrupt these interactions.
  • Item
    Thumbnail Image
    Structural Studies of Type-I Haematopoietic cytokine receptors
    Sarson-Lawrence, Kaiseal Tane Garvey ( 2023-09)
    Haematopoiesis is a complex process by which the full complement of mature blood cells is produced from a small population of hematopoietic stem cells in the bone marrow. Cytokine signalling plays a crucial role in haematopoiesis and at least 14 different cytokines are involved in determining the fate of hematopoietic stem cells. Cytokines act on cells by binding and oligomerising cytokine receptors on the cell surface. This oligomerisation activates intracellular JAK kinases that kick off a signalling cascade resulting in a cellular response. As cytokine binding is the critical first step in receptor activation, understanding how different cytokines bind to their receptors and how this extracellular binding event translates into an intracellular signal is fundamental to understanding hematopoietic diseases and designing therapeutic molecules. Thrombopoietin (TPO) and granulocyte colony stimulating factor (GCSF) are haematopoietic cytokines that regulate the production of platelets and neutrophils, respectively, from their precursor cells. The thrombopoietin receptor (TPOR) belongs to the short family of cytokine receptors and contains a unique duplication of its ligand binding domain not usually seen in other cytokine receptors. While the structure of TPO has previously been determined by X-ray crystallography, the structure of the receptor and receptor-cytokine complex have not. The GCSF receptor (GCSFR) belongs to the “tall” family of cytokine receptors. This family is characterised by three additional fibronectin type-III (FnIII) domains in the extracellular domain, which extends the receptor out from the cell membrane. Although a partial structure of the GCSFR-GCSF complex has been determined previously, this structure lacked the three membrane-proximal FnIII domains. I expressed and purified recombinant extracellular domains of both GCSFR and TPOR along with their respective cytokines. Cryogenic electron microscopy (cryo-EM) was then used to solve the structures of both receptor-ligand complexes and study the mechanism of cytokine binding and receptor activation of the two receptors. This research has resulted in the first experimentally determined structures of the entire extracellular domains of both the TPOR and GCSFR receptors in complex with their cytokines. These structures and the corresponding biophysical data have improved our understanding of how these two receptors are activated by their cytokines.
  • Item
    Thumbnail Image
    Structural studies of the mitochondrial import pathway
    Webb, Chaille Teresa (University of Melbourne, 2008)
  • Item
  • Item
    Thumbnail Image
    Understanding epilepsy genetic risk: integrating common and rare genetic variation
    Oliver, Karen Louise ( 2023-10)
    The epilepsies are a group of neurological disorders affecting up to 4% of people during their lifetime. There are many different epilepsy syndromes. At the broadest level, syndromes can be classified into those that are focal, where there are focal-onset seizures, or generalised, where there are generalised-onset seizures; these represent the most common epilepsies. The developmental and/or epileptic encephalopathies (DEEs) are the most severe group of epilepsies where seizure activity is associated with developmental slowing or regression. The DEEs typically begin in infancy and childhood and have many rare single-gene causes. With both rare and common variants established contributors to epilepsy genetic risk, the over-arching aim of this thesis was to explore the potential interaction of these different variant types. First, I curated all reported single-gene monogenic causes. The clinically heterogeneous DEEs had >800 reported single-gene causes. In contrast, <50 single-gene causes for focal and generalised epilepsies are known. Clinical genetic studies suggest these epilepsy types are polygenic, with current evidence suggesting many contributions from both rare and common genetic variants. Common variants are detectable by genome-wide association studies (GWASs). I was Consortium Coordinator and one of the analysts for the third International League Against Epilepsy GWAS involving >29,000 patients with epilepsy. Here, I led an analysis to prioritise genes near GWAS signals that demonstrated a convergence between common and rare gene pathways, and a study of cross-trait genetic correlations that highlighted the pleiotropic nature of many epilepsy-associated loci. Next, to explore potential rare and common genetic variant interplay, I focused on families with epilepsy. This familial approach was motivated by the observation that, even when sharing the same familial rare pathogenic variant of major effect, the clinical presentation for relatives can be highly variable. This supports a role for, yet unidentified, epilepsy genetic modifiers. In other complex traits, like cancer and heart disease, studies have shown that common polygenic background, as captured by polygenic risk scores (PRSs), derived from large GWASs, can modify the penetrance of rare monogenic causes. To determine if common polygenic background plays a modifying role in the epilepsies, I next demonstrated that epilepsy PRSs are enriched in patients with a positive family history for epilepsy compared to those without. Whilst families with epilepsy have previously been targeted for rare variant discoveries, we provide the first support for common genetic variation playing a role in the familial aggregation of epilepsy. Furthermore, common risk variants for focal epilepsy were shown to be enriched in a specific familial focal epilepsy syndrome, despite no variants in the largest focal epilepsy GWAS reaching genome-wide significance. Finally, I explored whether the role played by common epilepsy risk variants is disease modifying. This was done by studying 58 families with the clinically heterogenous syndrome of genetic epilepsy with febrile seizures plus (GEFS+), many with a known rare variant of major effect. In these families, I showed that higher epilepsy PRSs correlated with more severe epilepsy phenotypes. This provides the first support for common genetic background modifying the clinical expression of rare pathogenic variants in the epilepsies.
  • Item
    Thumbnail Image
    3D Imaging and Cellular Barcoding: Novel Tools for Exploring Cancer Heterogeneity
    Lewis, Sabrina Milly ( 2023-09)
    Breast cancer affects 1 in 7 Australian women, and the risk of death from metastatic (stage 4) disease remains high. Progression to advanced disease is difficult to treat, especially when the availability of targeted treatments is limited for some cancer subtypes. Metastases form when cancer cells shed from the primary tumour, enter the blood and lymphatic vessels, exit and proliferate in distant organs. Understanding the interactions between these heterogeneous lesions and the vessels that facilitate their spread, will enable a better understanding of this process and potentially lead to improved cancer treatments. Not all tumour cells have the same ability to generate metastases. Specific clones (defined as cancer cells that have derived from the same ancestral cell) differ in terms of the organs they target, their behaviour in specific microenvironments, and how they cooperate with other clones. To date, the methods used to study the clonality and heterogeneity of cancer metastases often involve tissue dissociation or 2D imaging. Consequently, the spatial resolution of clones in their native microenvironment is lost. New methodologies and technologies are required to facilitate spatial discoveries, to advance our understanding of cancer heterogeneity, metastasis, and the tumour microenvironment. Here, I developed a novel pipeline for three-dimensional whole organ imaging of human-in-mouse models of metastatic breast cancer. Light-sheet microscopy was used to capture large volumetric datasets, reducing the information loss observed in 2D tissue sections. I used lentiviral gene ontology (LeGO) vectors, an optical barcoding method, to identify seven individual clones. In combination with vessel casting (a perfusion-based method that enables vasculature imaging), tissue clearing, and an analysis pipeline, I reveal the relationship of aggressive breast cancer clones and the blood vasculature in murine lungs and brain. This represents a method with unprecedented detail and clonal resolution at large volume scales. My results indicate that large vessels may be correlated with enhanced metastatic growth. I also show that metastases that wrap around blood vessels are more likely to be polyclonal (containing multiple clonal populations), which are more aggressive than monoclonal (single clone) metastases, with potential implications for treatment targets. Underlying the clones’ specific behaviours, are differences in gene expression. Based on these results, I propose that transcriptional information, in combination with clonal identity and spatial tissue context, is required to reveal the molecular pathways that are responsible for these clonal behaviours, which may represent novel therapeutic targets. To simultaneously track a higher number of cancer clones (i.e. thousands) and their gene expression in situ, I developed a novel smFISH and RNA barcoding method (FISHcodes). I show that the detection of hundreds of transcripts alongside dozens of clones (scalable to thousands) is feasible and will enable novel insights regarding clonal behaviour in cancer biology. Together, the results presented throughout this thesis have demonstrated that novel methodologies enabling the study of cancer cell heterogeneity and its interplay with the microenvironment, can address new questions about in situ cancer clone metastasis and growth.
  • Item
    Thumbnail Image
    Characterising the molecular regulation of erythroferrone
    Moir-Meyer, Gemma Louise ( 2023-10)
    ERFE encodes the hormone erythroferrone which is secreted from erythroblasts in response to increased erythroid drive. ERFE protein suppresses hepcidin, the master iron regulator, which allows iron to be released from body-iron stores and used for red blood cell production. In erythropoietic disorders such as thalassaemia, ineffective red blood cell production results in reduced tissue-oxygen levels, increased erythroid drive, and chronic hepcidin suppression. However, despite representing a possible drug target, the regulation of ERFE has not been well studied. This work has identified a putative ERFE control locus comprising an enhancer and several key transcription factors using in vitro differentiated Human Umbilical Cord Blood-derived Erythroid Progenitor (HUDEP-2) cells. ERFE transcription and chromatin accessibility were tracked during four stages of terminal erythroblast maturation using quantitative PCR and Assay for Transposase-Accessible Chromatin-sequencing (ATAC-seq). These data demonstrated a dynamic chromatin accessibility landscape with distinct erythroid maturation stages and an expression profile that peaked in intermediate erythroblasts (p<0.001). Capture-C then demonstrated contact between ERFE’s 5’ promoter and a putative enhancer that also aligns with trimethylation of lysine 4 and acetylation of lysine 4 on histone 3 (promoter marks), monomethylation of lysine 4 and aceytylation of lysine 27 on histone 3 (enhancer marks) Cleavage Under Targets & Release Using Nuclease (CUT&RUN). Moreover, when ERFE expression is at its highest, CUT&RUN showed that response elements in the enhancer are bound by master erythroid regulators GATA1, KLF1 and TAL1, and the stress erythroid response factor, STAT5, suggesting a role for multiple signalling pathways in ERFE activation. These pathways, and ERFE’s place within them, were further explored using weighted gene correlation network analysis on RNA sequencing from the four progenitor stages, where gene set enrichment analysis demonstrated that ERFE is co-expressed alongside genes that are highly associated with haem metabolism (p=3.65x10-30). Overall, this data provides new insights into the regulation of erythroferrone and may contribute valuable details for identifying therapeutic targets in iron-loading anaemias.
  • Item
    Thumbnail Image
    Sequencing and validation of variants causing autoinflammatory diseases
    Reygaerts, Thomas Jean F. ( 2023-09)
    Aberrant activation of the innate immune system leads to systemic and organ specific inflammation in the absence of pathogens or autoimmunity. This system is genetically encoded and mutations stimulating those pathways or impacting cellular homeostasis cause auto-inflammatory diseases (AIDs). Monogenic and complex (for which other genetic factors and the environment play a role) AIDs share phenotypic features and physiopathologic mechanisms. Only about 15-40% of patients suspected of monogenic AIDs will receive a specific molecular diagnosis. The Australian Autoinflammatory Diseases RegistrY (AADRY) gathers undiagnosed patients and provides whole exome sequencing to patients and families in search of new mechanisms explaining their diseases. In this thesis, we functionally validate the polymorphism E148Q in pyrin showing its functional effect in vitro. In patients with familial Mediterranean fever (FMF), it potentiates pathogenic mutations in cis and therefore could have a role in disease presentation and severity. I also assess a new mutation in CDC42 found in a family with members presenting an AID over three generation. A phenotypic and mechanistic hypothesis based approach shows that this variant promotes the pyrin inflammasome. Finally, I investigated somatic mosaicism which could explain around 15-20% of undiagnosed AID cases. I developed a cost effective ultra deep amplicon sequencing of the third exon of NLRP3, a part of the gene in which somatic mosaicism is particularly well described. These three projects intend to broaden our knowledge about the mechanism of autoinflammation.