Show simple item record

dc.contributor.authorChen, Q
dc.contributor.authorBritto, R
dc.contributor.authorErill, I
dc.contributor.authorJeffery, CJ
dc.contributor.authorLiberzon, A
dc.contributor.authorMagrane, M
dc.contributor.authorOnami, J-I
dc.contributor.authorRobinson-Rechavi, M
dc.contributor.authorSponarova, J
dc.contributor.authorZobel, J
dc.contributor.authorVerspoor, K
dc.date.accessioned2020-12-09T22:14:57Z
dc.date.available2020-12-09T22:14:57Z
dc.date.issued2020-04
dc.identifierpii: S1672-0229(20)30063-2
dc.identifier.citationChen, Q., Britto, R., Erill, I., Jeffery, C. J., Liberzon, A., Magrane, M., Onami, J. -I., Robinson-Rechavi, M., Sponarova, J., Zobel, J. & Verspoor, K. (2020). Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases.. Genomics Proteomics and Bioinformatics, 18 (2), pp.91-103. https://doi.org/10.1016/j.gpb.2018.11.006.
dc.identifier.issn1672-0229
dc.identifier.urihttp://hdl.handle.net/11343/252901
dc.description.abstractBiological databases represent an extraordinary collective volume of work. Diligently built up over decades and comprising many millions of contributions from the biomedical research community, biological databases provide worldwide access to a massive number of records (also known as entries) [1]. Starting from individual laboratories, genomes are sequenced, assembled, annotated, and ultimately submitted to primary nucleotide databases such as GenBank [2], European Nucleotide Archive (ENA) [3], and DNA Data Bank of Japan (DDBJ) [4] (collectively known as the International Nucleotide Sequence Database Collaboration, INSDC). Protein records, which are the translations of these nucleotide records, are deposited into central protein databases such as the UniProt KnowledgeBase (UniProtKB) [5] and the Protein Data Bank (PDB) [6]. Sequence records are further accumulated into different databases for more specialized purposes: RFam [7] and PFam [8] for RNA and protein families, respectively; DictyBase [9] and PomBase [10] for model organisms; as well as ArrayExpress [11] and Gene Expression Omnibus (GEO) [12] for gene expression profiles. These databases are selected as examples; the list is not intended to be exhaustive. However, they are representative of biological databases that have been named in the “golden set” of the 24th Nucleic Acids Research database issue (in 2016). The introduction of that issue highlights the databases that “consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database” [13]. In addition, the associated information about sequences is also propagated into non-sequence databases, such as PubMed (https://www.ncbi.nlm.nih.gov/pubmed/) for scientific literature or Gene Ontology (GO) [14] for function annotations. These databases in turn benefit individual studies, many of which use these publicly available records as the basis for their own research.
dc.languageeng
dc.publisherElsevier
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0
dc.titleQuality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases.
dc.typeJournal Article
dc.identifier.doi10.1016/j.gpb.2018.11.006
melbourne.affiliation.departmentChancellery Research
melbourne.affiliation.departmentComputing and Information Systems
melbourne.source.titleGenomics Proteomics and Bioinformatics
melbourne.source.volume18
melbourne.source.issue2
melbourne.source.pages91-103
melbourne.identifier.arcDP150101550
dc.rights.licenseCC BY-NC-ND
melbourne.elementsid1457160
melbourne.contributor.authorVerspoor, Cornelia
melbourne.contributor.authorZobel, Justin
melbourne.contributor.authorChen, Qingyu
dc.identifier.eissn2210-3244
melbourne.identifier.fundernameidAustralian Research Council, DP150101550
melbourne.accessrightsOpen Access


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record