Show simple item record

dc.contributor.authorDix, TI
dc.contributor.authorPowell, DR
dc.contributor.authorAllison, L
dc.contributor.authorBernal, J
dc.contributor.authorJaeger, S
dc.contributor.authorStern, L
dc.date.available2014-05-22T04:11:14Z
dc.date.issued2007-01-01
dc.identifierpii: 1471-2105-8-S2-S10
dc.identifier.citationDix, T. I., Powell, D. R., Allison, L., Bernal, J., Jaeger, S. & Stern, L. (2007). Comparative analysis of long DNA sequences by per element information content using different contexts. BMC BIOINFORMATICS, 8 (SUPPL. 2), https://doi.org/10.1186/1471-2105-8-S2-S10.
dc.identifier.issn1471-2105
dc.identifier.urihttp://hdl.handle.net/11343/31769
dc.description.abstractBACKGROUND: Features of a DNA sequence can be found by compressing the sequence under a suitable model; good compression implies low information content. Good DNA compression models consider repetition, differences between repeats, and base distributions. From a linear DNA sequence, a compression model can produce a linear information sequence. Linear space complexity is important when exploring long DNA sequences of the order of millions of bases. Compressing a sequence in isolation will include information on self-repetition. Whereas compressing a sequence Y in the context of another X can find what new information X gives about Y. This paper presents a methodology for performing comparative analysis to find features exposed by such models. RESULTS: We apply such a model to find features across chromosomes of Cyanidioschyzon merolae. We present a tool that provides useful linear transformations to investigate and save new sequences. Various examples illustrate the methodology, finding features for sequences alone and in different contexts. We also show how to highlight all sets of self-repetition features, in this case within Plasmodium falciparum chromosome 2. CONCLUSION: The methodology finds features that are significant and that biologists confirm. The exploration of long information sequences in linear time and space is fast and the saved results are self documenting.
dc.languageEnglish
dc.publisherBMC
dc.rights.urihttps://creativecommons.org/licenses/by/4.0
dc.subjectInformation Systems
dc.titleComparative analysis of long DNA sequences by per element information content using different contexts
dc.typeJournal Article
dc.identifier.doi10.1186/1471-2105-8-S2-S10
melbourne.peerreviewPeer Reviewed
melbourne.affiliationThe University of Melbourne
melbourne.affiliation.departmentComputer Science and Software Engineering
melbourne.source.titleBMC Bioinformatics
melbourne.source.volume8
melbourne.source.issueSUPPL. 2
dc.rights.licenseCC BY
dc.description.pagestart1
melbourne.publicationid89559
melbourne.elementsid296100
melbourne.openaccess.pmchttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892068
melbourne.contributor.authorStern, Linda
dc.identifier.eissn1471-2105
melbourne.accessrightsAccess this item via the Open Access location


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record