University Library
  • Login
A gateway to Melbourne's research publications
Minerva Access is the University's Institutional Repository. It aims to collect, preserve, and showcase the intellectual output of staff and students of the University of Melbourne for a global audience.
View Item 
  • Minerva Access
  • Veterinary and Agricultural Sciences
  • Agriculture and Food Systems
  • Agriculture and Food Systems - Research Publications
  • View Item
  • Minerva Access
  • Veterinary and Agricultural Sciences
  • Agriculture and Food Systems
  • Agriculture and Food Systems - Research Publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

    An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome

    Thumbnail
    Download
    published version (2.712Mb)

    Citations
    Scopus
    Web of Science
    Altmetric
    21
    20
    Author
    Ribeiro, A; Golicz, A; Hackett, CA; Milne, I; Stephen, G; Marshall, D; Flavell, AJ; Bayer, M
    Date
    2015-11-11
    Source Title
    BMC Bioinformatics
    Publisher
    BMC
    University of Melbourne Author/s
    Golicz, Agnieszka
    Affiliation
    Agriculture and Food Systems
    Metadata
    Show full item record
    Document Type
    Journal Article
    Citations
    Ribeiro, A., Golicz, A., Hackett, C. A., Milne, I., Stephen, G., Marshall, D., Flavell, A. J. & Bayer, M. (2015). An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome. BMC BIOINFORMATICS, 16 (1), https://doi.org/10.1186/s12859-015-0801-z.
    Access Status
    Open Access
    URI
    http://hdl.handle.net/11343/256688
    DOI
    10.1186/s12859-015-0801-z
    Abstract
    BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are widely used molecular markers, and their use has increased massively since the inception of Next Generation Sequencing (NGS) technologies, which allow detection of large numbers of SNPs at low cost. However, both NGS data and their analysis are error-prone, which can lead to the generation of false positive (FP) SNPs. We explored the relationship between FP SNPs and seven factors involved in mapping-based variant calling - quality of the reference sequence, read length, choice of mapper and variant caller, mapping stringency and filtering of SNPs by read mapping quality and read depth. This resulted in 576 possible factor level combinations. We used error- and variant-free simulated reads to ensure that every SNP found was indeed a false positive. RESULTS: The variation in the number of FP SNPs generated ranged from 0 to 36,621 for the 120 million base pairs (Mbp) genome. All of the experimental factors tested had statistically significant effects on the number of FP SNPs generated and there was a considerable amount of interaction between the different factors. Using a fragmented reference sequence led to a dramatic increase in the number of FP SNPs generated, as did relaxed read mapping and a lack of SNP filtering. The choice of reference assembler, mapper and variant caller also significantly affected the outcome. The effect of read length was more complex and suggests a possible interaction between mapping specificity and the potential for contributing more false positives as read length increases. CONCLUSIONS: The choice of tools and parameters involved in variant calling can have a dramatic effect on the number of FP SNPs produced, with particularly poor combinations of software and/or parameter settings yielding tens of thousands in this experiment. Between-factor interactions make simple recommendations difficult for a SNP discovery pipeline but the quality of the reference sequence is clearly of paramount importance. Our findings are also a stark reminder that it can be unwise to use the relaxed mismatch settings provided as defaults by some read mappers when reads are being mapped to a relatively unfinished reference sequence from e.g. a non-model organism in its early stages of genomic exploration.

    Export Reference in RIS Format     

    Endnote

    • Click on "Export Reference in RIS Format" and choose "open with... Endnote".

    Refworks

    • Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References


    Collections
    • Minerva Elements Records [53039]
    • Agriculture and Food Systems - Research Publications [659]
    Minerva AccessDepositing Your Work (for University of Melbourne Staff and Students)NewsFAQs

    BrowseCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects
    My AccountLoginRegister
    StatisticsMost Popular ItemsStatistics by CountryMost Popular Authors