Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases.
Web of Science
AuthorChen, Q; Britto, R; Erill, I; Jeffery, CJ; Liberzon, A; Magrane, M; Onami, J-I; Robinson-Rechavi, M; Sponarova, J; Zobel, J; ...
Source TitleGenomics Proteomics and Bioinformatics
Computing and Information Systems
Document TypeJournal Article
CitationsChen, Q., Britto, R., Erill, I., Jeffery, C. J., Liberzon, A., Magrane, M., Onami, J. -I., Robinson-Rechavi, M., Sponarova, J., Zobel, J. & Verspoor, K. (2020). Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases.. Genomics Proteomics and Bioinformatics, 18 (2), pp.91-103. https://doi.org/10.1016/j.gpb.2018.11.006.
Access StatusOpen Access
ARC Grant codeARC/DP150101550
Biological databases represent an extraordinary collective volume of work. Diligently built up over decades and comprising many millions of contributions from the biomedical research community, biological databases provide worldwide access to a massive number of records (also known as entries) . Starting from individual laboratories, genomes are sequenced, assembled, annotated, and ultimately submitted to primary nucleotide databases such as GenBank , European Nucleotide Archive (ENA) , and DNA Data Bank of Japan (DDBJ)  (collectively known as the International Nucleotide Sequence Database Collaboration, INSDC). Protein records, which are the translations of these nucleotide records, are deposited into central protein databases such as the UniProt KnowledgeBase (UniProtKB)  and the Protein Data Bank (PDB) . Sequence records are further accumulated into different databases for more specialized purposes: RFam  and PFam  for RNA and protein families, respectively; DictyBase  and PomBase  for model organisms; as well as ArrayExpress  and Gene Expression Omnibus (GEO)  for gene expression profiles. These databases are selected as examples; the list is not intended to be exhaustive. However, they are representative of biological databases that have been named in the “golden set” of the 24th Nucleic Acids Research database issue (in 2016). The introduction of that issue highlights the databases that “consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database” . In addition, the associated information about sequences is also propagated into non-sequence databases, such as PubMed (https://www.ncbi.nlm.nih.gov/pubmed/) for scientific literature or Gene Ontology (GO)  for function annotations. These databases in turn benefit individual studies, many of which use these publicly available records as the basis for their own research.
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References