University Library
  • Login
A gateway to Melbourne's research publications
Minerva Access is the University's Institutional Repository. It aims to collect, preserve, and showcase the intellectual output of staff and students of the University of Melbourne for a global audience.
View Item 
  • Minerva Access
  • Science
  • School of Mathematics and Statistics
  • School of Mathematics and Statistics - Research Publications
  • View Item
  • Minerva Access
  • Science
  • School of Mathematics and Statistics
  • School of Mathematics and Statistics - Research Publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

    Summarizing and correcting the GC content bias in high-throughput sequencing

    Thumbnail
    Download
    Published version (12.93Mb)

    Citations
    Scopus
    Altmetric
    403
    Author
    Benjamini, Y; Speed, TP
    Date
    2012-05-01
    Source Title
    Nucleic Acids Research
    Publisher
    OXFORD UNIV PRESS
    University of Melbourne Author/s
    Speed, Terence
    Affiliation
    School of Mathematics and Statistics
    Metadata
    Show full item record
    Document Type
    Journal Article
    Citations
    Benjamini, Y. & Speed, T. P. (2012). Summarizing and correcting the GC content bias in high-throughput sequencing. NUCLEIC ACIDS RESEARCH, 40 (10), https://doi.org/10.1093/nar/gks001.
    Access Status
    Open Access
    URI
    http://hdl.handle.net/11343/259398
    DOI
    10.1093/nar/gks001
    Abstract
    GC content bias describes the dependence between fragment count (read coverage) and GC content found in Illumina sequencing data. This bias can dominate the signal of interest for analyses that focus on measuring fragment abundance within a genome, such as copy number estimation (DNA-seq). The bias is not consistent between samples; and there is no consensus as to the best methods to remove it in a single sample. We analyze regularities in the GC bias patterns, and find a compact description for this unimodal curve family. It is the GC content of the full DNA fragment, not only the sequenced read, that most influences fragment count. This GC effect is unimodal: both GC-rich fragments and AT-rich fragments are underrepresented in the sequencing results. This empirical evidence strengthens the hypothesis that PCR is the most important cause of the GC bias. We propose a model that produces predictions at the base pair level, allowing strand-specific GC-effect correction regardless of the downstream smoothing or binning. These GC modeling considerations can inform other high-throughput sequencing analyses such as ChIP-seq and RNA-seq.

    Export Reference in RIS Format     

    Endnote

    • Click on "Export Reference in RIS Format" and choose "open with... Endnote".

    Refworks

    • Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References


    Collections
    • Minerva Elements Records [52443]
    • School of Mathematics and Statistics - Research Publications [840]
    Minerva AccessDepositing Your Work (for University of Melbourne Staff and Students)NewsFAQs

    BrowseCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects
    My AccountLoginRegister
    StatisticsMost Popular ItemsStatistics by CountryMost Popular Authors