Medicine, Dentistry & Health Sciences Collected Works - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 7 of 7
  • Item
    Thumbnail Image
    Genomics - from Neanderthals to high-throughput sequencing
    Wakefield, MJ (BIOMED CENTRAL LTD, 2006)
    A report on 'The Biology of Genomes' meeting, Cold Spring Harbor, USA, 10-14 May 2006.
  • Item
    Thumbnail Image
    Vestige: Maximum likelihood phylogenetic footprinting
    Wakefield, MJ ; Maxwell, P ; Huttley, GA (BIOMED CENTRAL LTD, 2005-05-29)
    BACKGROUND: Phylogenetic footprinting is the identification of functional regions of DNA by their evolutionary conservation. This is achieved by comparing orthologous regions from multiple species and identifying the DNA regions that have diverged less than neutral DNA. Vestige is a phylogenetic footprinting package built on the PyEvolve toolkit that uses probabilistic molecular evolutionary modelling to represent aspects of sequence evolution, including the conventional divergence measure employed by other footprinting approaches. In addition to measuring the divergence, Vestige allows the expansion of the definition of a phylogenetic footprint to include variation in the distribution of any molecular evolutionary processes. This is achieved by displaying the distribution of model parameters that represent partitions of molecular evolutionary substitutions. Examination of the spatial incidence of these effects across regions of the genome can identify DNA segments that differ in the nature of the evolutionary process. RESULTS: Vestige was applied to a reference dataset of the SCL locus from four species and provided clear identification of the known conserved regions in this dataset. To demonstrate the flexibility to use diverse models of molecular evolution and dissect the nature of the evolutionary process Vestige was used to footprint the Ka/Ks ratio in primate BRCA1 with a codon model of evolution. Two regions of putative adaptive evolution were identified illustrating the ability of Vestige to represent the spatial distribution of distinct molecular evolutionary processes. CONCLUSION: Vestige provides a flexible, open platform for phylogenetic footprinting. Underpinned by the PyEvolve toolkit, Vestige provides a framework for visualising the signatures of evolutionary processes across the genome of numerous organisms simultaneously. By exploiting the maximum-likelihood statistical framework, the complex interplay between mutational processes, DNA repair and selection can be evaluated both spatially (along a sequence alignment) and temporally (for each branch of the tree) providing visual indicators to the attributes and functions of DNA sequences.
  • Item
    Thumbnail Image
    PyCogent: a toolkit for making sense from sequence
    Knight, R ; Maxwell, P ; Birmingham, A ; Carnes, J ; Caporaso, JG ; Easton, BC ; Eaton, M ; Hamady, M ; Lindsay, H ; Liu, Z ; Lozupone, C ; McDonald, D ; Robeson, M ; Sammut, R ; Smit, S ; Wakefield, MJ ; Widmann, J ; Wikman, S ; Wilson, S ; Ying, H ; Huttley, GA (BIOMED CENTRAL LTD, 2007)
    We have implemented in Python the COmparative GENomic Toolkit, a fully integrated and thoroughly tested framework for novel probabilistic analyses of biological sequences, devising workflows, and generating publication quality graphics. PyCogent includes connectors to remote databases, built-in generalized probabilistic techniques for working with biological sequences, and controllers for third-party applications. The toolkit takes advantage of parallel architectures and runs on a range of hardware and operating systems, and is available under the general public license from http://sourceforge.net/projects/pycogent.
  • Item
    Thumbnail Image
    Reconstructing an ancestral mammalian immune supercomplex from a marsupial major histocompatibility complex
    Belov, K ; Deakin, JE ; Papenfuss, AT ; Baker, ML ; Melman, SD ; Siddle, HV ; Gouin, N ; Goode, DL ; Sargeant, TJ ; Robinson, MD ; Wakefield, MJ ; Mahony, S ; Cross, JGR ; Benos, PV ; Samollow, PB ; Speed, TP ; Graves, JAM ; Miller, RD ; Ploegh, HL (PUBLIC LIBRARY SCIENCE, 2006-03)
    The first sequenced marsupial genome promises to reveal unparalleled insights into mammalian evolution. We have used the Monodelphis domestica (gray short-tailed opossum) sequence to construct the first map of a marsupial major histocompatibility complex (MHC). The MHC is the most gene-dense region of the mammalian genome and is critical to immunity and reproductive success. The marsupial MHC bridges the phylogenetic gap between the complex MHC of eutherian mammals and the minimal essential MHC of birds. Here we show that the opossum MHC is gene dense and complex, as in humans, but shares more organizational features with non-mammals. The Class I genes have amplified within the Class II region, resulting in a unique Class I/II region. We present a model of the organization of the MHC in ancestral mammals and its elaboration during mammalian evolution. The opossum genome, together with other extant genomes, reveals the existence of an ancestral "immune supercomplex" that contained genes of both types of natural killer receptors together with antigen processing genes and MHC genes.
  • Item
    Thumbnail Image
    Marsupials and monotremes sort genome treasures from junk
    Wakefield, MJ ; Graves, JAM (BMC, 2005)
    A recent landmark paper demonstrates the unique contribution of marsupials and monotremes to comparative genome analysis, filling an evolutionary gap between the eutherian mammals (including humans) and more distant vertebrate species.
  • Item
    Thumbnail Image
    Transcript length bias in RNA-seq data confounds systems biology
    Oshlack, A ; Wakefield, MJ (BMC, 2009-04-16)
    BACKGROUND: Several recent studies have demonstrated the effectiveness of deep sequencing for transcriptome analysis (RNA-seq) in mammals. As RNA-seq becomes more affordable, whole genome transcriptional profiling is likely to become the platform of choice for species with good genomic sequences. As yet, a rigorous analysis methodology has not been developed and we are still in the stages of exploring the features of the data. RESULTS: We investigated the effect of transcript length bias in RNA-seq data using three different published data sets. For standard analyses using aggregated tag counts for each gene, the ability to call differentially expressed genes between samples is strongly associated with the length of the transcript. CONCLUSION: Transcript length bias for calling differentially expressed genes is a general feature of current protocols for RNA-seq technology. This has implications for the ranking of differentially expressed genes, and in particular may introduce bias in gene set testing for pathway analysis and other multi-gene systems biology analyses. REVIEWERS: This article was reviewed by Rohan Williams (nominated by Gavin Huttley), Nicole Cloonan (nominated by Mark Ragan) and James Bullard (nominated by Sandrine Dudoit).
  • Item
    Thumbnail Image
    PyEvolve: a toolkit for statistical modelling of molecular evolution
    Butterfield, A ; Vedagiri, V ; Lang, E ; Lawrence, C ; Wakefield, MJ ; Isaev, A ; Huttley, GA (BIOMED CENTRAL LTD, 2004-01-05)
    BACKGROUND: Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences - ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation. RESULTS: Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from approximately 10 days to approximately 6 hours. CONCLUSION: PyEvolve provides flexible functionality that can be used either for statistical modelling of molecular evolution, or the development of new methods in the field. The toolkit can be used interactively or by writing and executing scripts. The toolkit uses efficient processes for specifying the parameterisation of statistical models, and implements numerous optimisations that make highly parameter rich likelihood functions solvable within hours on multi-cpu hardware. PyEvolve can be readily adapted in response to changing computational demands and hardware configurations to maximise performance. PyEvolve is released under the GPL and can be downloaded from http://cbis.anu.edu.au/software.