Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    Automated analysis of time lapse microscopy images
    KAN, ANDREY ( 2012)
    Cells are the building blocks of life, and time lapse microscopy is a powerful way to study cells. Automated video acquisition and analysis of cells opens unprecedented opportunities, ranging from building novel mathematical models supported by rich data to automated drug screening. Unfortunately, accurate and completely automated analysis of cell images is a difficult task. Therefore human intervention is often required, for example, for tuning of segmentation and tracking algorithms or correcting the results of automated analysis. In this thesis, we aim to reduce the amount of manual work required, while preserving the accuracy of analysis. Two key tasks in automated analysis are cell segmentation and tracking. Segmentation is the process of locating cell outlines in cell images, while tracking refers to establishing cell identities across subsequent video frames. One of the main challenges of automated analysis is the substantial variability in cell appearance and dynamics across different videos and even within a single video. For example, there can be a few rapidly moving cells in the beginning of a video and a large number of cells stuck in a clump by the end of the video. Such variation has resulted in a large variety of cell segmentation and tracking algorithms. There has been a large body of work on automated cell segmentation and tracking. However, many methods make specific assumptions about cell morphology or dynamics, or involve a number of parameters that a user needs to set manually. This hampers the applicability of such methods across different videos. We first develop portable cell semi-segmentation and segmentation algorithms, where portability is achieved by using a flexible cell descriptor function. We then develop a novel cell tracking algorithm that has only one parameter, and hence can be easily adopted to different videos. Furthermore, we present a parameter-free variation of the algorithm. Our evaluation on real cell videos demonstrates that our algorithms are capable of achieving accurate results and outperforming other existing methods. Even the most sophisticated cell tracking algorithms make errors. A user can be required to manually review the tracking results and correct errors. To this end, we propose a semi-automated tracking framework that is capable of identifying video frames that are likely to contain errors. The user can then look only into these frames and not into all video frames. We find that our framework can significantly reduce the amount of manual work required to review and correct tracking results. Furthermore, in different videos, the most accurate results can be obtained by different methods and different parameter settings. It is often not clear which method should be chosen for a particular video. We address this problem with a novel method for ranking cell tracking systems without manual validation. Our method is capable of ranking cell trackers according to their fitness to a particular video, without the need for manual collection of the ground truth tracks. We simulate practical tracking scenarios and confirm the feasibility of our method. Finally, as an example of a biological assay, we consider evaluating the locomotion of Plasmodium parasites (that cause malaria) with application to automated anti-malaria drug screening. We track live parasites in a matrigel medium and develop a numerical description of parasite tracks. Our experiments show that this description captures changes in the locomotion in response to treatment with the toxin Cytochalasin D. Therefore our description can form a basis for automated drug screening, where various treatments are applied to different cell populations by a robot, and the resulting tracks are evaluated quantitatively. In summary, our thesis makes six major contributions highlighted above. These contributions can reduce the amount of manual work in cell image analysis, while achieving highly accurate results.
  • Item
    Thumbnail Image
    Scalable approaches for analysing high density single nucleotide polymorphism array data
    Wong, Gerard Kum Peng ( 2012)
    Prior to making inferences from the raw data produced by these microarrays, several challenges need to be addressed. First, it is important to limit the impact of noise on microarray measurements while maintaining data integrity. An unexplored aspect of noise is the extent of probeset sequence identity in SNP microarrays. Second, microarray-based datasets often have at least two orders of magnitude more probesets than the number of samples they describe. This poses a challenge for traditional statistical tests when used in this context. Third, the number of features in each dataset is large even when sample sizes are small, thus computationally efficient approaches are required to analyse these datasets. Finally, with improving resolution of SNP arrays, there is a need to exploit this improvement in resolution to identify finer-scaled mutations in the human genome. Most existing approaches deal with these challenges at an individual sample level and do not look for consensus change across the population to identify sites of DNA mutation. Other approaches artificially smooth or segment the data to obtain uniform segments of copy number change, and lose possible fine-scaled copy number changes in the process. Others are based on computationally expensive approaches that do not scale well to array resolution and sample size. Our first contribution is a comprehensive survey of the sequence identity of all probesets for all variants of the Affymetrix Genome-Wide Human SNP array. This survey assesses the target uniqueness of every probeset and provides a basis for the development of a set of gold standard labels of copy number change between genders. The derived sets of gold standard labels are a benchmark for assessing the performance of various algorithms in detecting recurrent copy number change. This benchmark is utilised in the evaluation of our second and third contribution. Our second contribution is a statistical approach called Detecting Recurrent Copy Number Changes Using Rank Order Statistics (DRECS), which is designed to identify regions of consensus copy number change across multiple samples in SNP array datasets. Through the use of rank-based statistics, DRECS efficiently draws on the statistical power of using multiple samples to identify fine-scaled copy number changes down to the width of a single probe in a computationally efficient way. Our third contribution is called Sum of Ranks Exact Test (SoRET), a non-parametric extension of DRECS. SoRET addresses SNP datasets with small sample sizes and makes no assumptions about the distribution from which the data was sampled. Its performance in terms of Type I and Type II errors is superior to competitive parametric and non-parametric statistical tests for small sample sizes. Our fourth contribution is a feature set reduction approach called FSR. FSR enables existing filter-based feature selection approaches to handle large dimensional microarray-type datasets by pruning irrelevant and redundant features. A novel scoring measure is developed to assess the strength of each feature in terms of sample class discrimination. FSR uses measures of entropy to efficiently gauge the contribution of higher order feature patterns to avoid combinatorial explosions in assessing the utility of features. In our tested datasets, classifiers trained on features selected from FSR-reduced feature sets have shown notably better predictive accuracy than classifiers trained on features selected from complete feature sets.