Chancellery Research - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 161
  • Item
    No Preview Available
    Instance Space Analysis of Search-Based Software Testing
    Neelofar, N ; Smith-Miles, K ; Munoz, MA ; Aleti, A (IEEE COMPUTER SOC, 2023-04-01)
  • Item
    No Preview Available
    Disease Delineation for Multiple Sclerosis, Friedreich Ataxia, and Healthy Controls Using Supervised Machine Learning on Speech Acoustics.
    Schultz, BG ; Joukhadar, Z ; Nattala, U ; Quiroga, MDM ; Noffs, G ; Rojas, S ; Reece, H ; Van Der Walt, A ; Vogel, AP (Institute of Electrical and Electronics Engineers (IEEE), 2023)
    Neurodegenerative disease often affects speech. Speech acoustics can be used as objective clinical markers of pathology. Previous investigations of pathological speech have primarily compared controls with one specific condition and excluded comorbidities. We broaden the utility of speech markers by examining how multiple acoustic features can delineate diseases. We used supervised machine learning with gradient boosting (CatBoost) to delineate healthy speech from speech of people with multiple sclerosis or Friedreich ataxia. Participants performed a diadochokinetic task where they repeated alternating syllables. We subjected 74 spectral and temporal prosodic features from the speech recordings to machine learning. Results showed that Friedreich ataxia, multiple sclerosis and healthy controls were all identified with high accuracy (over 82%). Twenty-one acoustic features were strong markers of neurodegenerative diseases, falling under the categories of spectral qualia, spectral power, and speech rate. We demonstrated that speech markers can delineate neurodegenerative diseases and distinguish healthy speech from pathological speech with high accuracy. Findings emphasize the importance of examining speech outcomes when assessing indicators of neurodegenerative disease. We propose large-scale initiatives to broaden the scope for differentiating other neurological diseases and affective disorders.
  • Item
    No Preview Available
    Thermal and reionization history within a large-volume semi-analytic galaxy formation simulation
    Balu, S ; Greig, B ; Qiu, Y ; Power, C ; Qin, Y ; Mutch, S ; Wyithe, JSB (OXFORD UNIV PRESS, 2023-02-15)
    ABSTRACT We predict the 21-cm global signal and power spectra during the Epoch of Reionization using the meraxes semi-analytic galaxy formation and reionization model, updated to include X-ray heating and thermal evolution of the intergalactic medium. Studying the formation and evolution of galaxies together with the reionization of cosmic hydrogen using semi-analytic models (such as M eraxes) requires N-body simulations within large volumes and high-mass resolutions. For this, we use a simulation of side-length 210 h−1 Mpc with 43203 particles resolving dark matter haloes to masses of $5\times 10^8 \rm{ }h^{-1}\, \mathrm{M_\odot }$. To reach the mass resolution of atomically cooled galaxies, thought to be the dominant population contributing to reionization, at z = 20 of $\sim 2\times 10^7 \text{ }h^{-1}\, \mathrm{M_\odot }$, we augment this simulation using the darkforest Monte Carlo merger tree algorithm (achieving an effective particle count of ∼1012). Using this augmented simulation, we explore the impact of mass resolution on the predicted reionization history as well as the impact of X-ray heating on the 21-cm global signal and the 21-cm power spectra. We also explore the cosmic variance of 21-cm statistics within 703 h−3 Mpc3 sub-volumes. We find that the midpoint of reionization varies by Δz ∼ 0.8 and that the cosmic variance on the power spectrum is underestimated by a factor of 2–4 at k ∼ 0.1–0.4 Mpc−1 due to the non-Gaussian nature of the 21-cm signal. To our knowledge, this work represents the first model of both reionization and galaxy formation which resolves low-mass atomically cooled galaxies while simultaneously sampling sufficiently large scales necessary for exploring the effects of X-rays in the early Universe.
  • Item
    No Preview Available
    Metaphor-A workflow for streamlined assembly and binning of metagenomes.
    Salazar, VW ; Shaban, B ; Quiroga, MDM ; Turnbull, R ; Tescari, E ; Rossetto Marcelino, V ; Verbruggen, H ; Lê Cao, K-A (Oxford University Press (OUP), 2022-12-28)
    Recent advances in bioinformatics and high-throughput sequencing have enabled the large-scale recovery of genomes from metagenomes. This has the potential to bring important insights as researchers can bypass cultivation and analyze genomes sourced directly from environmental samples. There are, however, technical challenges associated with this process, most notably the complexity of computational workflows required to process metagenomic data, which include dozens of bioinformatics software tools, each with their own set of customizable parameters that affect the final output of the workflow. At the core of these workflows are the processes of assembly-combining the short-input reads into longer, contiguous fragments (contigs)-and binning, clustering these contigs into individual genome bins. The limitations of assembly and binning algorithms also pose different challenges depending on the selected strategy to execute them. Both of these processes can be done for each sample separately or by pooling together multiple samples to leverage information from a combination of samples. Here we present Metaphor, a fully automated workflow for genome-resolved metagenomics (GRM). Metaphor differs from existing GRM workflows by offering flexible approaches for the assembly and binning of the input data and by combining multiple binning algorithms with a bin refinement step to achieve high-quality genome bins. Moreover, Metaphor generates reports to evaluate the performance of the workflow. We showcase the functionality of Metaphor on different synthetic datasets and the impact of available assembly and binning strategies on the final results.
  • Item
    No Preview Available
    Disease progression modelling of Alzheimer's disease using probabilistic principal components analysis
    Saint-Jalmes, M ; Fedyashov, V ; Beck, D ; Baldwin, T ; Faux, NG ; Bourgeat, P ; Fripp, J ; Masters, CL ; Goudey, B (ACADEMIC PRESS INC ELSEVIER SCIENCE, 2023-09)
    The recent biological redefinition of Alzheimer's Disease (AD) has spurred the development of statistical models that relate changes in biomarkers with neurodegeneration and worsening condition linked to AD. The ability to measure such changes may facilitate earlier diagnoses for affected individuals and help in monitoring the evolution of their condition. Amongst such statistical tools, disease progression models (DPMs) are quantitative, data-driven methods that specifically attempt to describe the temporal dynamics of biomarkers relevant to AD. Due to the heterogeneous nature of this disease, with patients of similar age experiencing different AD-related changes, a challenge facing longitudinal mixed-effects-based DPMs is the estimation of patient-realigning time-shifts. These time-shifts are indispensable for meaningful biomarker modelling, but may impact fitting time or vary with missing data in jointly estimated models. In this work, we estimate an individual's progression through Alzheimer's disease by combining multiple biomarkers into a single value using a probabilistic formulation of principal components analysis. Our results show that this variable, which summarises AD through observable biomarkers, is remarkably similar to jointly estimated time-shifts when we compute our scores for the baseline visit, on cross-sectional data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Reproducing the expected properties of clinical datasets, we confirm that estimated scores are robust to missing data or unavailable biomarkers. In addition to cross-sectional insights, we can model the latent variable as an individual progression score by repeating estimations at follow-up examinations and refining long-term estimates as more data is gathered, which would be ideal in a clinical setting. Finally, we verify that our score can be used as a pseudo-temporal scale instead of age to ignore some patient heterogeneity in cohort data and highlight the general trend in expected biomarker evolution in affected individuals.
  • Item
    No Preview Available
    Multi-objective optimization in real-time operation of rainwater harvesting systems
    Zhen, Y ; Smith-Miles, K ; Fletcher, TD ; Burns, MJ ; Coleman, RA (ELSEVIER, 2023)
  • Item
    No Preview Available
    Bayesian coarsening: rapid tuning of polymer model parameters
    Weeratunge, H ; Robe, D ; Menzel, A ; Phillips, AW ; Kirley, M ; Smith-Miles, K ; Hajizadeh, E (SPRINGER, 2023-01-01)
    Abstract A protocol based on Bayesian optimization is demonstrated for determining model parameters in a coarse-grained polymer simulation. This process takes as input the microscopic distribution functions and temperature-dependent density for a targeted polymer system. The process then iteratively considers coarse-grained simulations to sample the space of model parameters, aiming to minimize the discrepancy between the new simulations and the target. Successive samples are chosen using Bayesian optimization. Such a protocol can be employed to systematically coarse-grained expensive high-resolution simulations to extend accessible length and time scales to make contact with rheological experiments. The Bayesian coarsening protocol is compared to a previous machine-learned parameterization technique which required a high volume of training data. The Bayesian coarsening process is found to precisely and efficiently discover appropriate model parameters, in spite of rough and noisy fitness landscapes, due to the natural balance of exploration and exploitation in Bayesian optimization.
  • Item
    No Preview Available
    Optimal selection of benchmarking datasets for unbiased machine learning algorithm evaluation
    Pereira, JLJ ; Smith-Miles, K ; Muñoz, MA ; Lorena, AC (Springer Science and Business Media LLC, 2023-01-01)
  • Item
    No Preview Available
    Identification of herbarium specimen sheet components from high-resolution images using deep learning
    Thompson, KMM ; Turnbull, R ; Fitzgerald, E ; Birch, JLL (WILEY, 2023-08)
    Advanced computer vision techniques hold the potential to mobilise vast quantities of biodiversity data by facilitating the rapid extraction of text- and trait-based data from herbarium specimen digital images, and to increase the efficiency and accuracy of downstream data capture during digitisation. This investigation developed an object detection model using YOLOv5 and digitised collection images from the University of Melbourne Herbarium (MELU). The MELU-trained 'sheet-component' model-trained on 3371 annotated images, validated on 1000 annotated images, run using 'large' model type, at 640 pixels, for 200 epochs-successfully identified most of the 11 component types of the digital specimen images, with an overall model precision measure of 0.983, recall of 0.969 and moving average precision (mAP0.5-0.95) of 0.847. Specifically, 'institutional' and 'annotation' labels were predicted with mAP0.5-0.95 of 0.970 and 0.878 respectively. It was found that annotating at least 2000 images was required to train an adequate model, likely due to the heterogeneity of specimen sheets. The full model was then applied to selected specimens from nine global herbaria (Biodiversity Data Journal, 7, 2019), quantifying its generalisability: for example, the 'institutional label' was identified with mAP0.5-0.95 of between 0.68 and 0.89 across the various herbaria. Further detailed study demonstrated that starting with the MELU-model weights and retraining for as few as 50 epochs on 30 additional annotated images was sufficient to enable the prediction of a previously unseen component. As many herbaria are resource-constrained, the MELU-trained 'sheet-component' model weights are made available and application encouraged.
  • Item
    No Preview Available