Show simple item record

dc.contributor.authorBouadjenek, MR
dc.contributor.authorVerspoor, K
dc.date.accessioned2020-12-21T03:53:53Z
dc.date.available2020-12-21T03:53:53Z
dc.date.issued2017-09-07
dc.identifierpii: 4107606
dc.identifier.citationBouadjenek, M. R. & Verspoor, K. (2017). Multi-field query expansion is effective for biomedical dataset retrieval. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2017, https://doi.org/10.1093/database/bax062.
dc.identifier.issn1758-0463
dc.identifier.urihttp://hdl.handle.net/11343/257401
dc.description.abstractIn the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one.
dc.languageEnglish
dc.publisherOXFORD UNIV PRESS
dc.titleMulti-field query expansion is effective for biomedical dataset retrieval
dc.typeJournal Article
dc.identifier.doi10.1093/database/bax062
melbourne.affiliation.departmentComputing and Information Systems
melbourne.source.titleDatabase: the journal of biological databases and curation
melbourne.source.volume2017
melbourne.identifier.arcDP150101550
dc.rights.licenseCC BY
melbourne.elementsid1269836
melbourne.contributor.authorVerspoor, Cornelia
melbourne.contributor.authorBouadjenek, Mohamed Reda
dc.identifier.eissn1758-0463
melbourne.identifier.fundernameidAUST RESEARCH COUNCIL, DP150101550
melbourne.accessrightsOpen Access


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record