Mechanical Engineering - Research Publications

Permanent URI for this collection

http://hdl.handle.net/11343/359

Search Results

Now showing 1 - 10 of 10

Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition

Saeed, I ; Tang, S-L ; Halgamuge, SK (OXFORD UNIV PRESS, 2012-03)

An approach to infer the unknown microbial population structure within a metagenome is to cluster nucleotide sequences based on common patterns in base composition, otherwise referred to as binning. When functional roles are assigned to the identified populations, a deeper understanding of microbial communities can be attained, more so than gene-centric approaches that explore overall functionality. In this study, we propose an unsupervised, model-based binning method with two clustering tiers, which uses a novel transformation of the oligonucleotide frequency-derived error gradient and GC content to generate coarse groups at the first tier of clustering; and tetranucleotide frequency to refine these groups at the secondary clustering tier. The proposed method has a demonstrated improvement over PhyloPythia, S-GSOM, TACOA and TaxSOM on all three benchmarks that were used for evaluation in this study. The proposed method is then applied to a pyrosequenced metagenomic library of mud volcano sediment sampled in southwestern Taiwan, with the inferred population structure validated against complementary sequencing of 16S ribosomal RNA marker genes. Finally, the proposed method was further validated against four publicly available metagenomes, including a highly complex Antarctic whale-fall bone sample, which was previously assumed to be too complex for binning prior to functional analysis.
Bioinformatics Pipelines for Targeted Resequencing and Whole-Exome Sequencing of Human and Mouse Genomes: A Virtual Appliance Approach for Instant Deployment

Li, J ; Doyle, MA ; Saeed, I ; Wong, SQ ; Mar, V ; Goode, DL ; Caramia, F ; Doig, K ; Ryland, GL ; Thompson, ER ; Hunter, SM ; Halgamuge, SK ; Ellul, J ; Dobrovic, A ; Campbell, IG ; Papenfuss, AT ; McArthur, GA ; Tothill, RW ; Calogero, RA (PUBLIC LIBRARY SCIENCE, 2014-04-21)

Targeted resequencing by massively parallel sequencing has become an effective and affordable way to survey small to large portions of the genome for genetic variation. Despite the rapid development in open source software for analysis of such data, the practical implementation of these tools through construction of sequencing analysis pipelines still remains a challenging and laborious activity, and a major hurdle for many small research and clinical laboratories. We developed TREVA (Targeted REsequencing Virtual Appliance), making pre-built pipelines immediately available as a virtual appliance. Based on virtual machine technologies, TREVA is a solution for rapid and efficient deployment of complex bioinformatics pipelines to laboratories of all sizes, enabling reproducible results. The analyses that are supported in TREVA include: somatic and germline single-nucleotide and insertion/deletion variant calling, copy number analysis, and cohort-based analyses such as pathway and significantly mutated genes analyses. TREVA is flexible and easy to use, and can be customised by Linux-based extensions if required. TREVA can also be deployed on the cloud (cloud computing), enabling instant access without investment overheads for additional hardware. TREVA is available at http://bioinformatics.petermac.org/treva/.
Prokaryotic assemblages and metagenomes in pelagic zones of the South China Sea

Tseng, C-H ; Chiang, P-W ; Lai, H-C ; Shiah, F-K ; Hsu, T-C ; Chen, Y-L ; Wen, L-S ; Tseng, C-M ; Shieh, W-Y ; Saeed, I ; Halgamuge, S ; Tang, S-L (BIOMED CENTRAL LTD, 2015-03-20)

BACKGROUND: Prokaryotic microbes, the most abundant organisms in the ocean, are remarkably diverse. Despite numerous studies of marine prokaryotes, the zonation of their communities in pelagic zones has been poorly delineated. By exploiting the persistent stratification of the South China Sea (SCS), we performed a 2-year, large spatial scale (10, 100, 1000, and 3000 m) survey, which included a pilot study in 2006 and comprehensive sampling in 2007, to investigate the biological zonation of bacteria and archaea using 16S rRNA tag and shotgun metagenome sequencing. RESULTS: Alphaproteobacteria dominated the bacterial community in the surface SCS, where the abundance of Betaproteobacteria was seemingly associated with climatic activity. Gammaproteobacteria thrived in the deep SCS, where a noticeable amount of Cyanobacteria were also detected. Marine Groups II and III Euryarchaeota were predominant in the archaeal communities in the surface and deep SCS, respectively. Bacterial diversity was higher than archaeal diversity at all sampling depths in the SCS, and peaked at mid-depths, agreeing with the diversity pattern found in global water columns. Metagenomic analysis not only showed differential %GC values and genome sizes between the surface and deep SCS, but also demonstrated depth-dependent metabolic potentials, such as cobalamin biosynthesis at 10 m, osmoregulation at 100 m, signal transduction at 1000 m, and plasmid and phage replication at 3000 m. When compared with other oceans, urease at 10 m and both exonuclease and permease at 3000 m were more abundant in the SCS. Finally, enriched genes associated with nutrient assimilation in the sea surface and transposase in the deep-sea metagenomes exemplified the functional zonation in global oceans. CONCLUSIONS: Prokaryotic communities in the SCS stratified with depth, with maximal bacterial diversity at mid-depth, in accordance with global water columns. The SCS had functional zonation among depths and endemically enriched metabolic potentials at the study site, in contrast to other oceans.
Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness

Jayasundara, D ; Saeed, I ; Chang, BC ; Tang, S-L ; Halgamuge, SK (BMC, 2015-12-09)

BACKGROUND: Estimating the number of different species (richness) in a mixed microbial population has been a main focus in metagenomic research. Existing methods of species richness estimation ride on the assumption that the reads in each assembled contig correspond to only one of the microbial genomes in the population. This assumption and the underlying probabilistic formulations of existing methods are not useful for quasispecies populations where the strains are highly genetically related. RESULTS: On benchmark data sets, our estimation method provided accurate richness estimates (< 0.2 median estimation error) and improved the precision of ViQuaS by 2%-13% and F-score by 1%-9% without compromising the recall rates. We also demonstrate that our estimation method can be used to improve the precision and F-score of ShoRAH by 0%-7% and 0%-5% respectively. CONCLUSIONS: The proposed probabilistic estimation method can be used to estimate the richness of viral populations with a quasispecies behavior and to improve the accuracy of the quasispecies spectra reconstructed by the existing methods ViQuaS and ShoRAH in the presence of a moderate level of technical sequencing errors. AVAILABILITY: http://sourceforge.net/projects/viquas/.
Assessing Species Diversity Using Metavirome Data: Methods and Challenges

Herath, D ; Jayasundara, D ; Ackland, D ; Saeed, I ; Tang, S-L ; Halgamuge, S (ELSEVIER SCIENCE BV, 2017)

Assessing biodiversity is an important step in the study of microbial ecology associated with a given environment. Multiple indices have been used to quantify species diversity, which is a key biodiversity measure. Measuring species diversity of viruses in different environments remains a challenge relative to measuring the diversity of other microbial communities. Metagenomics has played an important role in elucidating viral diversity by conducting metavirome studies; however, metavirome data are of high complexity requiring robust data preprocessing and analysis methods. In this review, existing bioinformatics methods for measuring species diversity using metavirome data are categorised broadly as either sequence similarity-dependent methods or sequence similarity-independent methods. The former includes a comparison of DNA fragments or assemblies generated in the experiment against reference databases for quantifying species diversity, whereas estimates from the latter are independent of the knowledge of existing sequence data. Current methods and tools are discussed in detail, including their applications and limitations. Drawbacks of the state-of-the-art method are demonstrated through results from a simulation. In addition, alternative approaches are proposed to overcome the challenges in estimating species diversity measures using metavirome data.
ENVirT: inference of ecological characteristics of viruses from metagenomic data

Jayasundara, D ; Herath, D ; Senanayake, D ; Saeed, I ; Yang, C-Y ; Sun, Y ; Chang, BC ; Tang, S-L ; Halgamuge, SK (BioMed Central, 2019-02-04)

Background: Estimating the parameters that describe the ecology of viruses,particularly those that are novel, can be made possible using metagenomic approaches. However, the best-performing existing methods require databases to first estimate an average genome length of a viral community before being able to estimate other parameters, such as viral richness. Although this approach has been widely used, it can adversely skew results since the majority of viruses are yet to be catalogued in databases. Results: In this paper, we present ENVirT, a method for estimating the richness of novel viral mixtures, and for the first time we also show that it is possible to simultaneously estimate the average genome length without a priori information. This is shown to be a significant improvement over database-dependent methods, since we can now robustly analyze samples that may include novel viral types under-represented in current databases. We demonstrate that the viral richness estimates produced by ENVirT are several orders of magnitude higher in accuracy than the estimates produced by existing methods named PHACCS and CatchAll when benchmarked against simulated data. We repeated the analysis of 20 metavirome samples using ENVirT, which produced results in close agreement with complementary in virto analyses. Conclusions: These insights were previously not captured by existing computational methods. As such, ENVirT is shown to be an essential tool for enhancing our understanding of novel viral populations.
Exploratory analysis of high-throughput metabolomic data

Wijetunge, CD ; Li, Z ; Saeed, I ; Bowne, J ; Hsu, AL ; Roessner, U ; Bacic, A ; Halgamuge, SK (SPRINGER, 2013-12)
Comprehensive Insights Into Composition, Metabolic Potentials, and Interactions Among Archaeal, Bacterial, and Viral Assemblages in Meromictic Lake Shunet in Siberia

Wu, Y-T ; Yang, C-Y ; Chiang, P-W ; Tseng, C-H ; Chiu, H-H ; Saeed, I ; Baatar, B ; Rogozin, D ; Halgamuge, S ; Degermendzhi, A ; Tang, S-L (FRONTIERS MEDIA SA, 2018-08-20)

Microorganisms are critical to maintaining stratified biogeochemical characteristics in meromictic lakes; however, their community composition and potential roles in nutrient cycling are not thoroughly described. Both metagenomics and metaviromics were used to determine the composition and capacity of archaea, bacteria, and viruses along the water column in the landlocked meromictic Lake Shunet in Siberia. Deep sequencing of 265 Gb and high-quality assembly revealed a near-complete genome corresponding to Nonlabens sp. sh3vir. in a viral sample and 38 bacterial bins (0.2-5.3 Mb each). The mixolimnion (3.0 m) had the most diverse archaeal, bacterial, and viral communities, followed by the monimolimnion (5.5 m) and chemocline (5.0 m). The bacterial and archaeal communities were dominated by Thiocapsa and Methanococcoides, respectively, whereas the viral community was dominated by Siphoviridae. The archaeal and bacterial assemblages and the associated energy metabolism were significantly related to the various depths, in accordance with the stratification of physicochemical parameters. Reconstructed elemental nutrient cycles of the three layers were interconnected, including co-occurrence of denitrification and nitrogen fixation in each layer and involved unique processes due to specific biogeochemical properties at the respective depths. According to the gene annotation, several pre-dominant yet unknown and uncultured bacteria also play potentially important roles in nutrient cycling. Reciprocal BLAST analysis revealed that the viruses were specific to the host archaea and bacteria in the mixolimnion. This study provides insights into the bacterial, archaeal, and viral assemblages and the corresponding capacity potentials in Lake Shunet, one of the three meromictic lakes in central Asia. Lake Shunet was determined to harbor specific and diverse viral, bacterial, and archaeal communities that intimately interacted, revealing patterns shaped by indigenous physicochemical parameters.
A new peak detection algorithm for MALDI mass spectrometry data based on a modified Asymmetric Pseudo-Voigt model

Wijetunge, C ; Saeed, I ; BOUGHTON, BA ; Roessner, U ; Halgamuge, SK (BioMed Central, 2015-12-09)

Background Mass Spectrometry (MS) is a ubiquitous analytical tool in biological research and is used to measure the mass-to-charge ratio of bio-molecules. Peak detection is the essential first step in MS data analysis. Precise estimation of peak parameters such as peak summit location and peak area are critical to identify underlying bio-molecules and to estimate their abundances accurately. We propose a new method to detect and quantify peaks in mass spectra. It uses dual-tree complex wavelet transformation along with Stein's unbiased risk estimator for spectra smoothing. Then, a new method, based on the modified Asymmetric Pseudo-Voigt (mAPV) model and hierarchical particle swarm optimization, is used for peak parameter estimation. Results Using simulated data, we demonstrated the benefit of using the mAPV model over Gaussian, Lorentz and Bi-Gaussian functions for MS peak modelling. The proposed mAPV model achieved the best fitting accuracy for asymmetric peaks, with lower percentage errors in peak summit location estimation, which were 0.17% to 4.46% less than that of the other models. It also outperformed the other models in peak area estimation, delivering lower percentage errors, which were about 0.7% less than its closest competitor - the Bi-Gaussian model. In addition, using data generated from a MALDI-TOF computer model, we showed that the proposed overall algorithm outperformed the existing methods mainly in terms of sensitivity. It achieved a sensitivity of 85%, compared to 77% and 71% of the two benchmark algorithms, continuous wavelet transformation based method and Cromwell respectively. Conclusions The proposed algorithm is particularly useful for peak detection and parameter estimation in MS data with overlapping peak distributions and asymmetric peaks. The algorithm is implemented using MATLAB and the source code is freely available at http://mapv.sourceforge.net
EXIMS: an improved data analysis pipeline based on a new peak picking method for EXploring Imaging Mass Spectrometry data

Wijetunge, CD ; Saeed, I ; Boughton, BA ; Spraggins, JM ; Caprioli, RM ; Bacic, A ; Roessner, U ; Halgamuge, SK (OXFORD UNIV PRESS, 2015-10-01)

MOTIVATION: Matrix Assisted Laser Desorption Ionization-Imaging Mass Spectrometry (MALDI-IMS) in 'omics' data acquisition generates detailed information about the spatial distribution of molecules in a given biological sample. Various data processing methods have been developed for exploring the resultant high volume data. However, most of these methods process data in the spectral domain and do not make the most of the important spatial information available through this technology. Therefore, we propose a novel streamlined data analysis pipeline specifically developed for MALDI-IMS data utilizing significant spatial information for identifying hidden significant molecular distribution patterns in these complex datasets. METHODS: The proposed unsupervised algorithm uses Sliding Window Normalization (SWN) and a new spatial distribution based peak picking method developed based on Gray level Co-Occurrence (GCO) matrices followed by clustering of biomolecules. We also use gist descriptors and an improved version of GCO matrices to extract features from molecular images and minimum medoid distance to automatically estimate the number of possible groups. RESULTS: We evaluated our algorithm using a new MALDI-IMS metabolomics dataset of a plant (Eucalypt) leaf. The algorithm revealed hidden significant molecular distribution patterns in the dataset, which the current Component Analysis and Segmentation Map based approaches failed to extract. We further demonstrate the performance of our peak picking method over other traditional approaches by using a publicly available MALDI-IMS proteomics dataset of a rat brain. Although SWN did not show any significant improvement as compared with using no normalization, the visual assessment showed an improvement as compared to using the median normalization. AVAILABILITY AND IMPLEMENTATION: The source code and sample data are freely available at http://exims.sourceforge.net/. CONTACT: awgcdw@student.unimelb.edu.au or chalini_w@live.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mechanical Engineering - Research Publications

Permanent URI for this collection

Filters

Date

Author

Type

Settings

Sort By

Results per page

Statistics

Citations

Search Results