Minerva Elements Records

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 87
  • Item
  • Item
    Thumbnail Image
    ROAST: rotation gene set tests for complex microarray experiments
    Wu, D ; Lim, E ; Vaillant, F ; Asselin-Labat, M-L ; Visvader, JE ; Smyth, GK (OXFORD UNIV PRESS, 2010-09-01)
    MOTIVATION: A gene set test is a differential expression analysis in which a P-value is assigned to a set of genes as a unit. Gene set tests are valuable for increasing statistical power, organizing and interpreting results and for relating expression patterns across different experiments. Existing methods are based on permutation. Methods that rely on permutation of probes unrealistically assume independence of genes, while those that rely on permutation of sample are suitable only for two-group comparisons with a good number of replicates in each group. RESULTS: We present ROAST, a statistically rigorous gene set test that allows for gene-wise correlation while being applicable to almost any experimental design. Instead of permutation, ROAST uses rotation, a Monte Carlo technology for multivariate regression. Since the number of rotations does not depend on sample size, ROAST gives useful results even for experiments with minimal replication. ROAST allows for any experimental design that can be expressed as a linear model, and can also incorporate array weights and correlated samples. ROAST can be tuned for situations in which only a subset of the genes in the set are actively involved in the molecular pathway. ROAST can test for uni- or bi-direction regulation. Probes can also be weighted to allow for prior importance. The power and size of the ROAST procedure is demonstrated in a simulation study, and compared to that of a representative permutation method. Finally, ROAST is used to test the degree of transcriptional conservation between human and mouse mammary stems. AVAILABILITY: ROAST is implemented as a function in the Bioconductor package limma available from www.bioconductor.org.
  • Item
    Thumbnail Image
    A fast hybrid short read fragment assembly algorithm
    Schmidt, B ; Sinha, R ; Beresford-Smith, B ; Puglisi, SJ (OXFORD UNIV PRESS, 2009-09-01)
    The shorter and vastly more numerous reads produced by second-generation sequencing technologies require new tools that can assemble massive numbers of reads in reasonable time. Existing short-read assembly tools can be classified into two categories: greedy extension-based and graph-based. While the graph-based approaches are generally superior in terms of assembly quality, the computer resources required for building and storing a huge graph are very high. In this article, we present Taipan, an assembly algorithm which can be viewed as a hybrid of these two approaches. Taipan uses greedy extensions for contig construction but at each step realizes enough of the corresponding read graph to make better decisions as to how assembly should continue. We show that this approach can achieve an assembly quality at least as good as the graph-based approaches used in the popular Edena and Velvet assembly tools using a moderate amount of computing resources.
  • Item
    Thumbnail Image
    is-rSNP: a novel technique for in silico regulatory SNP detection
    Macintyre, G ; Bailey, J ; Haviv, I ; Kowalczyk, A (OXFORD UNIV PRESS, 2010-09)
    MOTIVATION: Determining the functional impact of non-coding disease-associated single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) is challenging. Many of these SNPs are likely to be regulatory SNPs (rSNPs): variations which affect the ability of a transcription factor (TF) to bind to DNA. However, experimental procedures for identifying rSNPs are expensive and labour intensive. Therefore, in silico methods are required for rSNP prediction. By scoring two alleles with a TF position weight matrix (PWM), it can be determined which SNPs are likely rSNPs. However, predictions in this manner are noisy and no method exists that determines the statistical significance of a nucleotide variation on a PWM score. RESULTS: We have designed an algorithm for in silico rSNP detection called is-rSNP. We employ novel convolution methods to determine the complete distributions of PWM scores and ratios between allele scores, facilitating assignment of statistical significance to rSNP effects. We have tested our method on 41 experimentally verified rSNPs, correctly predicting the disrupted TF in 28 cases. We also analysed 146 disease-associated SNPs with no known functional impact in an attempt to identify candidate rSNPs. Of the 11 significantly predicted disrupted TFs, 9 had previous evidence of being associated with the disease in the literature. These results demonstrate that is-rSNP is suitable for high-throughput screening of SNPs for potential regulatory function. This is a useful and important tool in the interpretation of GWAS. AVAILABILITY: is-rSNP software is available for use at: www.genomics.csse.unimelb.edu.au/is-rSNP.
  • Item
    Thumbnail Image
    MIRAGAA-a methodology for finding coordinated effects of microRNA expression changes and genome aberrations in cancer
    Gaire, RK ; Bailey, J ; Bearfoot, J ; Campbell, IG ; Stuckey, PJ ; Haviv, I (OXFORD UNIV PRESS, 2010-01-15)
    MOTIVATION: Cancer evolves through microevolution where random lesions that provide the biggest advantage to cancer stand out in their frequent occurrence in multiple samples. At the same time, a gene function can be changed by aberration of the corresponding gene or modification of microRNA (miRNA) expression, which attenuates the gene. In a large number of cancer samples, these two mechanisms might be distributed in a coordinated and almost mutually exclusive manner. Understanding this coordination may assist in identifying changes which significantly produce the same functional impact on cancer phenotype, and further identify genes that are universally required for cancer. Present methodologies for finding aberrations usually analyze single datasets, which cannot identify such pairs of coordinating genes and miRNAs. RESULTS: We have developed MIRAGAA, a statistical approach, to assess the coordinated changes of genome copy numbers and miRNA expression. We have evaluated MIRAGAA on The Cancer Genome Atlas (TCGA) Glioblastoma Multiforme datasets. In these datasets, a number of genome regions coordinating with different miRNAs are identified. Although well known for their biological significance, these genes and miRNAs would be left undetected for being less significant if the two datasets were analyzed individually. AVAILABILITY AND IMPLEMENTATION: The source code, implemented in R and java, is available from our project web site at http://www.csse.unimelb.edu.au/~rgaire/MIRAGAA/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
  • Item
    Thumbnail Image
    Gene function prediction based on genomic context clustering and discriminative learning: an application to bacteriophages
    Li, J ; Halgamuge, SK ; Kells, CI ; Tang, SL (BMC, 2007)
    BACKGROUND: Existing methods for whole-genome comparisons require prior knowledge of related species and provide little automation in the function prediction process. Bacteriophage genomes are an example that cannot be easily analyzed by these methods. This work addresses these shortcomings and aims to provide an automated prediction system of gene function. RESULTS: We have developed a novel system called SynFPS to perform gene function prediction over completed genomes. The prediction system is initialized by clustering a large collection of weakly related genomes into groups based on their resemblance in gene distribution. From each individual group, data are then extracted and used to train a Support Vector Machine that makes gene function predictions. Experiments were conducted with 9 different gene functions over 296 bacteriophage genomes. Cross validation results gave an average prediction accuracy of ~80%, which is comparable to other genomic-context based prediction methods. Functional predictions are also made on 3 uncharacterized genes and 12 genes that cannot be identified by sequence alignment. The software is publicly available at http://www.synteny.net/. CONCLUSION: The proposed system employs genomic context to predict gene function and detect gene correspondence in whole-genome comparisons. Although our experimental focus is on bacteriophages, the method may be extended to other microbial genomes as they share a number of similar characteristics with phage genomes such as gene order conservation.
  • Item
    Thumbnail Image
    Comparative analysis of long DNA sequences by per element information content using different contexts
    Dix, TI ; Powell, DR ; Allison, L ; Bernal, J ; Jaeger, S ; Stern, L (BMC, 2007)
    BACKGROUND: Features of a DNA sequence can be found by compressing the sequence under a suitable model; good compression implies low information content. Good DNA compression models consider repetition, differences between repeats, and base distributions. From a linear DNA sequence, a compression model can produce a linear information sequence. Linear space complexity is important when exploring long DNA sequences of the order of millions of bases. Compressing a sequence in isolation will include information on self-repetition. Whereas compressing a sequence Y in the context of another X can find what new information X gives about Y. This paper presents a methodology for performing comparative analysis to find features exposed by such models. RESULTS: We apply such a model to find features across chromosomes of Cyanidioschyzon merolae. We present a tool that provides useful linear transformations to investigate and save new sequences. Various examples illustrate the methodology, finding features for sequences alone and in different contexts. We also show how to highlight all sets of self-repetition features, in this case within Plasmodium falciparum chromosome 2. CONCLUSION: The methodology finds features that are significant and that biologists confirm. The exploration of long information sequences in linear time and space is fast and the saved results are self documenting.
  • Item
    Thumbnail Image
    Allocation strategies for utilization of space-shared resources in Bag of Tasks grids
    De Rose, CAF ; Ferreto, T ; Calheiros, RN ; Cirne, W ; Costa, LB ; Fireman, D (ELSEVIER, 2008-05)
  • Item
    Thumbnail Image
    Shuffle-Sum: Coercion-Resistant Verifiable Tallying for STV Voting
    Benaloh, J ; Moran, T ; Naish, L ; Ramchen, K ; Teague, V (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2009-12)
  • Item
    Thumbnail Image
    Speculation and e-commerce: The long and the short of IT
    Ferguson, C ; Finn, F ; Hall, J ; Pinnuck, M (Elsevier BV, 2010-06-01)