## School of Mathematics and Statistics - Theses

### Search

## Search

Now showing items 1-12 of 188

####
Risk Analysis and Probabilistic Decision Making for Censored Failure Data

(2019)

Operation and maintenance of a fleet always require a high level of readiness, reduced cost, and improved safety. In order to achieve these goals, it is essential to develop and
determine an appropriate maintenance programme for the components in use. A failure
analysis involving failure model selection, robust parameter estimation, probabilistic
decision making, and assessing the cost-effectiveness of the decisions are the key to
the selection of a proper maintenance programme. Two significant challenges faced in
failure analysis studies are, minimizing the uncertainty associated with model selection
and making strategic decisions based on few observed failures. In this thesis, we try
to resolve some of these problems and evaluate the cost-effectiveness of the selections.
We focus on choosing the best model from a model space and robust estimation of
quantiles leading to the selection of optimal repair and replacement time of units.
We first explore the repair and replacement cost of a unit in a system. We design
a simulation study to assess the performance of the parameter estimation methods,
maximum likelihood estimation (MLE), and median rank regression method (MRR)
in estimating quantiles of the Weibull distribution. Then, we compare the models;
Weibull, gamma, log-normal, log-logistic, and inverse-Gaussian in failure analysis.
With an example, we show that the Weibull and the gamma distributions provide
competing fits to the failure data. Next, we demonstrate the use of Bayesian model
averaging in accounting for that model uncertainty. We derive an average model
for the failure observations with respective posterior model probabilities. Then, we
illustrate the cost-effectiveness of the selected model by comparing the distribution of
the total replacement and repair cost. In the second part of the thesis, we discuss the
prior information. Initially, we assume, the parameters of the Weibull distribution
are dependent by a function of the form rho=sigma/mu and re-parameterize the Weibull
distribution. Then we propose a new Jeffreys’ prior for the parameters mu and rho.
Finally, we designed a simulation study to assess the performance of the new Jeffreys’
prior compared to the MLE.

####
Mathematical models of calcium signalling in the context of cardiac hypertrophy

(2020)

Throughout the average human lifespan, our hearts beat over 2 billion times. With each beat, calcium floods the cytoplasm of every heart cell, causing it to contract until calcium re-uptake allows the heart to relax, ready for the next beat. However, calcium is known to be critical in other cell functions, including growth. Calcium plays a central role in mediating hypertrophic signalling in ventricular cardiomyocytes on top of its contractile function. How intracellular calcium can encode several different, specific signals at once is not well understood.
In heart cells, calcium release from ryanodine receptors (RyRs) triggers contraction. Under hypertrophic stimulation, calcium release from inositol 1,4,5-trisphosphate receptor (IP3R) channels modifies the calcium contraction signal, triggering dephosphorylation and nuclear import of the transcription factor nuclear factor of activated T cells (NFAT), with resulting gene expression linked to cell growth.
Several hypotheses have been proposed as to how the modified cytosolic calcium contraction signal transmits the hypertrophic signal to downstream signalling proteins, including changes to amplitude, duration, duty cycle, and signal localisation. We investigate the form of these signals within the cardiac myocyte using mathematical modelling. Using a compartmental heart cell model, we show that the effect of calcium channel interaction on the global calcium signal supports the idea that increased calcium duty cycle is a plausible mechanism for IP3-dependent hypertrophic signalling in cardiomyocytes.
A corresponding calcium signal within the nucleus must be present to maintain NFAT in the nucleus and thus allow NFAT to alter gene expression, initiating hypertrophic remodelling. Yet the nuclear membrane is permeable to calcium and this must all occur on a background of rising and falling calcium with each heartbeat. The mechanisms shaping calcium dynamics within the nucleus remain unclear.
We use a spatial model of calcium diffusion into the nucleus to determine the effects of buffers and cytosolic transient shape on nuclear calcium dynamics. Using experimental data, we estimate the diffusion coefficient and the effects of buffers on nuclear [Ca2+]. Additionally, we explore the effects of altered cytosolic calcium transients and calcium release on nuclear calcium. To approximate experimental measurements of nuclear calcium, we find that there must be perinuclear Ca2+ release and nonlinear diffusion. Comparisons of 1D and 3D models of calcium in the nucleus suggest that spatial variation in calcium concentration within the nucleus will not have a large effect on calcium-mediated gene regulation.
This work brings us closer to understanding the signalling pathway that leads to pathological hypertrophic cardiac remodelling.

####
Understanding the regulation of epidermal tissue thickness by cellular and subcellular processes using multiscale modelling

(2020)

The epidermis is the outermost layer of the skin, providing a protective barrier for our bodies. Two important aspects to the barrier function of the epidermis are maintenance of its barrier layer and constant cell turnover. The main barrier layer in the epidermis is the outermost layer, called the stratum corneum. This layer blocks both the entry of antigens and the loss of internal water and solutes. If antigens do enter the system, cell turnover has been hypothesised to propel them out the system by providing a constant upwards velocity of cells which carry the toxins with them.
The majority of severe diseases of the epidermis relate to a reduction in thickness of the stratum corneum. Decreased thickness reduces the barrier function of the layer, causing discomfort and inflammation. Due to its importance to barrier function, the maintenance of stratum corneum thickness, and consequently overall tissue thickness, is the focus of this thesis.
In order to maintain both stratum corneum thickness and overall tissue thickness it is necessary for the system to balance cell proliferation and cell loss. Cell loss in the epidermis occurs when dead cells at the top of the tissue are lost to the environment through a process called desquamation. Cell proliferation occurs in the base, or basal, layer. As the basal cells proliferate, cells above them are pushed upwards through the tissue, causing constant upwards movement in the tissue. Not only does this contribute directly to the barrier function through the cell turnover as discussed above, but the velocity of the cells is likely to be key in regulating the tissue thickness. Assuming the cell loss occurs at a fairly constant rate, the combination of the velocity and the loss rate determine tissue thickness.
In order to investigate these processes we develop a three dimensional discrete, multiscale, multicellular model, focussing on maintenance of cell proliferation and desquamation. Using this model, we are able to investigate how subcellular and cellular level processes interact to maintain a homeostatic tissue.
Our model is able to reproduce a system that self-regulates its thickness. The first aspect of this regulation is maintaining a constant rate of proliferation in the epidermis, and consequently a constant upwards velocity of cells. The second aspect is a maintained rate of desquamation. The model shows that hypothesised biological models for the degradation of cell-cell adhesion from the literature are able to provide a consistent rate of cell loss which balances proliferation. An investigation into a disorder which disrupts this desquamation model shows reduced tissue thickness, consequently diminishing the protective role of the tissue.
In developing the multiscale model we have begun to delve deeper into the relationship between subcellular and cellular processes and epidermal tissue structure. The model is developed with scope for the integration of further subcellular processes. This provides it with the potential for further experiments into the causes and effects of behaviours and diseases of the epidermis, with much higher time and cost efficiency than other experimental methods.

####
Biorthogonal Polynomial Sequences and the Asymmetric Simple Exclusion Process

(2019)

The diffusion algebra equations of the stationary state of the three parameter Asymmetric Simple Exclusion Process are represented as a linear functional, acting on a tensor algebra. From the linear functional, a pair of sequences (P and Q) of monic polynomials are constructed which are bi-orthogonal, that is, they are orthogonal with respect to each other and not necessarily themselves. The uniqueness and existence of the pair of sequences arises from the determinant of the bi-moment matrix whose elements satisfy a pair of q-recurrence relations. The determinant is evaluated using an LDU-decomposition. If the action of the linear functional is represented as an inner product, then the action of the polynomials Q on a boundary vector V, generates a basis whose orthogonal dual vectors are given by the action of P on the dual boundary vector W}. This basis gives the representation of the algebra which is associated with the Al-Salam-Chihara polynomials obtained by Sasamoto.
Several theorems associated with the three parameter asymmetric simple exclusion process are proven combinatorially. The theorems involve the linear functional which, for the three parameter case, is a substitution morphism on a q-Weyl algebra. The two polynomial sequences, P and Q, are represented in terms of q-binomial lattice paths.
A combinatorial representation for the value of the linear functional defining the matrix elements of a bi-moment matrix is established in terms of the value of a q-rook polynomial and utilised to provide combinatorial proofs for results pertaining to the linear functional. Combinatorial proofs are provided for theorems in terms of the p,q-binomial coefficients, which are closely related to the combinatorics of the three parameter ASEP.
The results for the three parameter diffusion algebra of the Asymmetric Simple Exclusion Process are extended to five parameters. A pair of basis changes are derived from the LDU decomposition of the bi-moment matrix. In order to derive the LDU decomposition a recurrence relation satisfied by the lower triangular matrix elements is conjectured. Associated with this pair of bases are three sequences of orthogonal polynomials. The first pair of orthogonal polynomials generate the new basis vectors (the boundary basis) by their action on the boundary vectors (written is the standard basis), whilst the third orthogonal polynomials are essentially the Askey-Wilson polynomials. All theses results are ultimately related to the LDU decomposition of a matrix.

####
Exploring the statistical aspects of expert elicited experiments

(2020)

We explore the statistical aspects of some of the known methods of analysing experts’ elicited data to identify potential improvements on the accuracy of their outcomes in this study. It can be identified that potential correlation structures induced in the probability predictions by the characteristics of experimental designs are ignored in computing experts’ Brier scores. We show that the accuracy of the standard error estimates of experts’ Brier scores can be improved by incorporating the within-question correlations of probability predictions in the second chapter of this thesis. Missing probability predictions of events can impact on assessing the prediction accuracy of experts using different sets of events (Merkle et al., 2016; Hanea et al., 2018). It is shown in the third chapter that multiple imputation method using a mixed-effects model with questions’ effects as random effects can effectively estimate missing predictions to enhance the comparability of experts’ Brier scores.
Testing experts’ calibration on eliciting credible intervals of unknown quantities using hit rates; observed proportions of elicited intervals that contain realized values of given quantities (McBride, Fidler, and Burgman, 2012), has a property of obtaining lower values of power to correctly identify well-calibrated experts and more importantly, the power tends to decrease as the number of elicited intervals increases. The equivalence test of a single binomial proportion can be used to overcome these problems as shown in the fourth chapter. There is a possibility of allocating higher weights to some of the not well-calibrated experts by the way experts’ calibration is assessed in the Cooke’s classical model (Cooke, 1991) to derive experts’ weights. We show that the multinomial equivalence test can be used to overcome this problem in the fifth chapter.
Experts’ weights that derived from experiments to combine experts’ elicited subjective probability distributions to obtain aggregated probability distributions of unknown quantities (O’Hagan, 2019) are random variables subject to uncertainty. We derive shrinkage experts’ weights with reduced mean squared errors in the sixth chapter to enhance the precision of the resulting aggregated distributions of quantities.

####
Nonparametric estimation for streaming data

(2020)

Streaming data are a type of high-frequency and nonstationary time series data. The collection of streaming data is sequential and potentially never-ending. Examples of streaming data, including data from sensor networks, mobile devices and the Internet, are prevalent in our daily lives. An estimator for streaming data needs to be computationally efficient so that it is relatively easy to update the estimator using newly arrived data. In addition, the estimator has to be adaptive to the nonstationarity of data. These constraints make streaming data analysis more challenging than analysing the conventional non-streaming data sets.
Although streaming data analysis has been discussed in the machine learning community for more than two decades, it has received limited attention from statistical researchers. Estimation methods that are both computationally efficient and theoretically justified are still lacking. In this thesis, we propose nonparametric density and regression estimation methods for streaming data, where the smoothing parameters are chosen in a computationally efficient and fully data-driven way. These methods extend some classical kernel smoothing techniques, such as the kernel density estimator and the Nadaraya-Watson regression estimator, to address the theoretical and computational challenges arising from streaming data analysis. Asymptotic analyses provide these methods with theoretical justification. Numerical studies have shown the superiority of our methods over conventional ones. Through some real-data examples, we show that these methods are potentially useful in modelling real-world problems. Finally, we discuss some directions for future research, including extending these methods to model higher-dimensional streaming data and to streaming data classification.

####
Stress testing mixed integer programming solvers through new test instance generation methods

(2019)

Optimisation algorithms require careful tuning and analysis to perform well in practice. Their performance is strongly affected by algorithm parameter choices, software, and hardware and must be analysed empirically. To conduct such analysis, researchers and developers require high-quality libraries of test instances. Improving the diversity of these test sets is essential to driving the development of well-tested algorithms.
This thesis is focused on producing synthetic test sets for Mixed Integer Programming (MIP) solvers. Synthetic data should be carefully designed to be unbiased, diverse with respect to measurable features of instances, have tunable properties to replicate real-world problems, and challenge the vast array of algorithms available. This thesis outlines a framework, methods and algorithms developed to ensure these requirements can be met with synthetically generated data for a given problem.
Over many years of development, MIP solvers have become increasingly complex. Their overall performance depends on the interactions of many different components. To cope with this complexity, we propose several extensions over existing approaches to generating optimisation test cases. First, we develop alternative encodings for problem instances which restrict consideration to relevant instances. This approach provides more control over instance features and reduces the computational effort required when we have to resort to search-based generation approaches. Second, we consider more detailed performance metrics for MIP solvers in order to produce test cases which are not only challenging but from which useful insights can be gained.
This work makes several key contributions:
1. Performance metrics are identified which are relevant to component algorithms in MIP solvers. This helps to define a more comprehensive performance metric space which looks beyond benchmarking statistics such as CPU time required to solve a problem. Using these more detailed performance metrics we aim to produce explainable and insightful predictions of algorithm performance in terms of instance features.
2. A framework is developed for encoding problem instances to support the design of new instance generators. The concepts of completeness and correctness defined in this framework guide the design process and ensure all problem instances of potential interest are captured in the scheme. Instance encodings can be generalised to develop search algorithms in problem space with the same guarantees as the generator.
3. Using this framework new generators are defined for LP and MIP instances which control feasibility and boundedness of the LP relaxation, and integer feasibility of the resulting MIP. Key features of the LP relaxation solution, which are directly controlled by the generator, are shown to affect problem difficulty in our analysis of the results. The encodings used to control these properties are extended into problem space search operators to generate further instances which discriminate between solver configurations.
This work represents the early stages of an iterative methodology required to generate diverse test sets which continue to challenge the state of the art. The framework, algorithms and codes developed in this thesis are intended to support continuing development in this area.

####
Intelligent Management of Elective Surgery Patient Flow

(2019)

Rapidly growing demand and soaring costs for healthcare services in Australia and across the world are jeopardising the sustainability of government-funded healthcare systems. We need to be innovative and more efficient in delivering healthcare services in order to keep the system sustainable. In this thesis, we utilise a number of scientific tools to improve the patient flow in a surgical suite of a hospital and subsequently develop a structured approach for intelligent patient flow management. First, we analyse and understand the patient flow process in a surgical suite. Then we obtain data from the partner hospital and extract valuable information from a large database. Next, we use machine learning techniques, such as classification and regression tree analysis, random forest, and k-nearest neighbour regression, to classify patients into lower variability resource user groups and fit discrete phase-type distributions to the clustered length of stay data.
We use length of stay scenarios sampled from the fitted distributions in our sequential stochastic mixed-integer programming model for tactical master surgery scheduling. Our mixed-integer programming model has the particularity that the scenarios are utilised in a chronologically sequential manner, not in parallel. Moreover, we exploit the randomness in the sample path to reduce the requirement of optimising the process for many scenarios which helps us obtain high-quality schedules while keeping the problem algorithmically tractable. Last, we model the patient flow process in a healthcare facility as a stochastic process and develop a model to predict the probability of the healthcare facility exceeding capacity the next day as a function of the number of inpatients and the next day scheduled patients, their resource user groups, and their elapsed length of stay. We evaluate the model's performance using the receiver operating characteristic curve and illustrate the computation of the optimal threshold probability by using cost-benefit analysis that helps the hospital management make decisions.

####
Copula-based spatio-temporal modelling for count data

(2019)

Modelling of spatio-temporal count data has received considerable attention in recent statistical research. However, the presence of massive correlation between locations, time points and variables imposes a great computational challenge. In existing literature, latent models under the Bayesian framework are predominately used. Despite numerous theoretical and practical advantages, likelihood analysis of spatio-temporal modelling on count data is less wide spread, due to the difficulty in identifying the general class of multivariate distributions for discrete responses.
In this thesis, we propose a Gaussian copula regression model (copSTM) for the analysis of multivariate spatio-temporal data on lattice. Temporal effects are modelled through the conditional marginal expectations of the response variables using an observation-driven time series model, while spatial and cross-variable correlations are captured in a block dependence structure, allowing for both positive and negative correlations. The proposed copSTM model is flexible and sufficiently generalizable to many situations. We provide pairwise composite likelihood inference tools. Numerical examples suggest that the proposed composite likelihood estimator produces satisfactory estimation performance.
While variable selection of generalized linear models is a well developed topic, model subsetting in applications of Gaussian copula models remains a relatively open research area. The main reason is the computational burden that is already quite heavy for simply fitting the model. It is therefore not computationally affordable to evaluate many candidate sub-models. This makes penalized likelihood approaches extremely inefficient because they need to search through different levels of penalty strength, apart from the fact suggested by our numerical experience that optimization of penalized composite likelihoods with many popular penalty terms (e.g LASSO and SCAD) usually does not converge in copula models. Thus, we propose to use a criterion-based selection approach that borrows strength from the Gibbs sampling technique.The methodology guarantees to converge to the model with the lowest criterion value, yet without searching through all possible models exhaustively.
Finally, we present an R package implementing the estimation and selection of the copSTM model in C++. We show examples comparing our package to many available R packages (on some special cases of the copSTM), confirming the correctness and efficiency of the package functions. The package copSTM provides a competitive toolkit option for the analysis spatio-temporal count data on lattice in terms of both model flexibility and computational efficiency.

####
Singular vectors for the WN algebras and the BRST cohomology for relaxed highest-weight Lk(sl(2)) modules

(2019)

This thesis presents the computation of singular vectors of the W_n algebras and the BRST cohomology of modules of the simple vertex operator algebra L_k(sl2) associated to the affine Lie algebra of sl2 in the relaxed category
We will first recall some general theory on vertex operator algebras. We will then introduce the module categories that are relevant for conformal field theory. They are the category O of highest-weight modules and the relaxed category which contains O as well as the relaxed highest-weight modules with spectral flow and non-split extensions. We will then introduce the W_n algebras and the simple vertex operator algebra L_k(sl2). Properties of the Heisenberg algebra, the bosonic and the fermionic ghosts will be discussed as they are required in the free field realisations of W_n and L_k(sl2) as well as the construction of the BRST complex.
We will then compute explicitly the singular vectors of W_n algebras in their Fock representations. In particular, singular vectors can be realised as the image of screening operators of the W_n algebras. One can then realise screening operators in terms of Jack functions when acting on a highest-weight state, thereby obtaining explicit formulae of the singular vectors in terms of symmetric functions.
We will then discuss the BRST construction and the BRST cohomology for modules in category O. Lastly we compute the BRST cohomology for L_k(sl2) modules in the relaxed category. In particular, we compute the BRST cohomology for the highest-weight modules with positive spectral flow for all degrees and the BRST cohomology for the highest-weight modules with negative spectral flow for one degree.

####
Missing data analysis, combinatorial model selection and structure learning

(2019)

This thesis examines three problems in statistics: the missing data problem in the context of extracting trends from time series data, the combinatorial model selection problem in regression analysis, and the structure learning problem in graphical modelling / system identification.
The goal of the first problem is to study how uncertainty in the missing data affects trend extraction. This work derives an analytical bound to characterise the error of the estimated trend in terms of the error of the imputation. It works for any imputation method and various trend-extraction methods, including a large subclass of linear filters and the Seasonal-Trend decomposition based on Loess (STL).
The second problem is to tackle the combinatorial complexity which arises from the best-subset selection in regression analysis. Given p variables, a model can be formed by taking a subset of the variables, and the total number of models p is $2^p$. This work shows that if a hierarchical structure can be established on the model space, then the proposed algorithm, Gibbs Stochastic Search (GSS), can recover the true model with probability one in the limit and high probability with finite samples. The core idea is that when a hierarchical structure exists, every evaluation of a wrong model would give information about the correct model. By aggregating these information, one may recover the correct model without exhausting the model space. As an extension, parallelisation of the algorithm is also considered.
The third problem is about inferring from data the systemic relationship between a set of variables. This work proposes a flexible class of multivariate distributions in a form of a directed acyclic graphical model, which uses a graph and models each node conditioning on the rest using a Generalised Linear Model (GLM), and it shows that while the number of possible graphs is $\Omega(2^{p \choose 2})$, a hierarchical structure exists and the GSS algorithm applies. Hence, a systemic relationship may be recovered from the data. Other applications like imputing missing data and simulating data with complex covariance structure are also investigated.

####
Exact solutions in multi-species exclusion processes

(2019)

The exclusion process has been the default model for the transportation phenomenon. One fundamental issue is to compute the exact formulae analytically. Such formulae enable us to obtain the limiting distribution through asymptotics analysis, and they also allow us to uncover relationships between different processes, and even between very different systems. Extensive results have been reported for single-species systems, but few for multi-component systems and mixtures. In this thesis, we focus on multi-species exclusion processes, and propose two approaches for exact solutions.
The first one is due to duality, which is defined by a function that co-varies in time with respect to the evolution of two processes. It relates physical quantities, such as the particle flow, in a system with many particles to one with few particles, so that the quantity of interest in the first process can be calculated explicitly via the second one. Historically, published dualities have mostly been found by trial and error. Only very recently have attempts been made to derive these functions algebraically. We propose a new method to derive dualities systematically, by exploiting the mathematical structure provided by the deformed quantum Knizhnik-Zamolodchikov equation. With this method, we not only recover the well-known self-duality in single-species asymmetric simple exclusion processes (ASEPs), and also obtain the duality for two-species ASEPs.
Solving the master equation is an alternative method. We consider an exclusion process with 2 species particles: the AHR (Arndt-Heinzl-Rittenberg) model and give a full derivation of its Green's function via coordinate Bethe ansatz. Hence using the Green's function, we obtain an integral formula for its joint current distributions, and then study its limiting distribution with step type initial conditions. We show that the long-time behaviour is governed by a product of the Gaussian and the Gaussian unitary ensemble (GUE) Tracy-Widom distributions, which is related to the random matrix theory. Such result agrees with the prediction made by the nonlinear fluctuating hydrodynamic theory (NLFHD). This is the first analytic verification of the prediction of NLFHD in a multi-species system.