University Library
  • Login
A gateway to Melbourne's research publications
Minerva Access is the University's Institutional Repository. It aims to collect, preserve, and showcase the intellectual output of staff and students of the University of Melbourne for a global audience.
View Item 
  • Minerva Access
  • Engineering and Information Technology
  • Computing and Information Systems
  • Computing and Information Systems - Research Publications
  • View Item
  • Minerva Access
  • Engineering and Information Technology
  • Computing and Information Systems
  • Computing and Information Systems - Research Publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

    Investigating reproducibility and tracking provenance - A genomic workflow case study

    Thumbnail
    Download
    published version (1.756Mb)

    Citations
    Scopus
    Web of Science
    Altmetric
    25
    19
    Author
    Kanwal, S; Khan, FZ; Lonie, A; Sinnott, RO
    Date
    2017-07-12
    Source Title
    BMC Bioinformatics
    Publisher
    BMC
    University of Melbourne Author/s
    Kanwal, Sehrish; Lonie, Andrew; Sinnott, Richard; Khan, Farah
    Affiliation
    Computing and Information Systems
    Clinical Pathology
    Metadata
    Show full item record
    Document Type
    Journal Article
    Citations
    Kanwal, S., Khan, F. Z., Lonie, A. & Sinnott, R. O. (2017). Investigating reproducibility and tracking provenance - A genomic workflow case study. BMC BIOINFORMATICS, 18 (1), https://doi.org/10.1186/s12859-017-1747-0.
    Access Status
    Open Access
    URI
    http://hdl.handle.net/11343/256462
    DOI
    10.1186/s12859-017-1747-0
    Abstract
    BACKGROUND: Computational bioinformatics workflows are extensively used to analyse genomics data, with different approaches available to support implementation and execution of these workflows. Reproducibility is one of the core principles for any scientific workflow and remains a challenge, which is not fully addressed. This is due to incomplete understanding of reproducibility requirements and assumptions of workflow definition approaches. Provenance information should be tracked and used to capture all these requirements supporting reusability of existing workflows. RESULTS: We have implemented a complex but widely deployed bioinformatics workflow using three representative approaches to workflow definition and execution. Through implementation, we identified assumptions implicit in these approaches that ultimately produce insufficient documentation of workflow requirements resulting in failed execution of the workflow. This study proposes a set of recommendations that aims to mitigate these assumptions and guides the scientific community to accomplish reproducible science, hence addressing reproducibility crisis. CONCLUSIONS: Reproducing, adapting or even repeating a bioinformatics workflow in any environment requires substantial technical knowledge of the workflow execution environment, resolving analysis assumptions and rigorous compliance with reproducibility requirements. Towards these goals, we propose conclusive recommendations that along with an explicit declaration of workflow specification would result in enhanced reproducibility of computational genomic analyses.

    Export Reference in RIS Format     

    Endnote

    • Click on "Export Reference in RIS Format" and choose "open with... Endnote".

    Refworks

    • Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References


    Collections
    • Minerva Elements Records [53039]
    • Clinical Pathology - Research Publications [620]
    • Computing and Information Systems - Research Publications [1580]
    Minerva AccessDepositing Your Work (for University of Melbourne Staff and Students)NewsFAQs

    BrowseCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects
    My AccountLoginRegister
    StatisticsMost Popular ItemsStatistics by CountryMost Popular Authors