Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 63
  • Item
    Thumbnail Image
    The randomized information coefficient: assessing dependencies in noisy data
    Romano, S ; Vinh, NX ; Verspoor, K ; Bailey, J (SPRINGER, 2018-03)
  • Item
    Thumbnail Image
    Better Health Explorer: Designing for Health Information Seekers
    Pang, C-I ; VERSPOOR, C ; Pearce, J ; Chang, S ; Ploderer, B ; Carter, M ; Gibbs, M ; Smith, MW ; Vetere, F (Association for Computing Machinery, 2015)
    A vast amount of health information has been published online, yet users often report difficulties in locating information in this particular domain. Based on our prior research, we consider four categories of online health information seekers who demonstrate mixed information needs. Although their searching needs are often well satisfied by entering keywords into search engines, their need to explore information is not so well supported, thus affecting their user experience and satisfaction. In this paper, we propose design principles for supporting the exploration of online health information. We present the rationale and the design process of a web app - Better Health Explorer - which is a proof-of-concept app tailored to health information exploration. This work contributes to the design of online health information systems as well as exploratory systems in general.
  • Item
    Thumbnail Image
    Better health information exploration
    Pang, PCI ; Verspoor, K ; Chang, S ; Pearce, J ; Sari, E ; Duh, H ; Brereton, M ; T, JL ; Awori, K ; Wan Bt Ahmad, FW (ACM, 2015-12-07)
    The provision of health information has to be clear and appealing to users. Research has shown that health information seekers do not all have the same attributes, skills or needs. In any given health-related app or website, there is a need to provide tools for accessing information in ways that appeal to users. This is not always supported by current web technologies. As such, based on prior research on health information seeking behaviour and needs, we designed and created a proofof- concept website named Better Health Explorer to experiment on health information seekers. The pilot results show a positive effect on supporting and improving the experience of seekers with exploratory search behaviour.
  • Item
    No Preview Available
    BioC interoperability track overview
    Comeau, DC ; Batista-Navarro, RT ; Dai, H-J ; Dogan, RI ; Yepes, AJ ; Khare, R ; Lu, Z ; Marques, H ; Mattingly, CJ ; Neves, M ; Peng, Y ; Rak, R ; Rinaldi, F ; Tsai, RT-H ; Verspoor, K ; Wiegers, TC ; Wu, CH ; Wilbur, WJ (OXFORD UNIV PRESS, 2014-06-30)
    BioC is a new simple XML format for sharing biomedical text and annotations and libraries to read and write that format. This promotes the development of interoperable tools for natural language processing (NLP) of biomedical text. The interoperability track at the BioCreative IV workshop featured contributions using or highlighting the BioC format. These contributions included additional implementations of BioC, many new corpora in the format, biomedical NLP tools consuming and producing the format and online services using the format. The ease of use, broad support and rapidly growing number of tools demonstrate the need for and value of the BioC format. Database URL: http://bioc.sourceforge.net/.
  • Item
    No Preview Available
    Approximate Subgraph Matching-Based Literature Mining for Biomedical Events and Relations
    Liu, H ; Hunter, L ; Keselj, V ; Verspoor, K ; Smalheiser, NR (PUBLIC LIBRARY SCIENCE, 2013-04-17)
    The biomedical text mining community has focused on developing techniques to automatically extract important relations between biological components and semantic events involving genes or proteins from literature. In this paper, we propose a novel approach for mining relations and events in the biomedical literature using approximate subgraph matching. Extraction of such knowledge is performed by searching for an approximate subgraph isomorphism between key contextual dependencies and input sentence graphs. Our approach significantly increases the chance of retrieving relations or events encoded within complex dependency contexts by introducing error tolerance into the graph matching process, while maintaining the extraction precision at a high level. When evaluated on practical tasks, it achieves a 51.12% F-score in extracting nine types of biological events on the GE task of the BioNLP-ST 2011 and an 84.22% F-score in detecting protein-residue associations. The performance is comparable to the reported systems across these tasks, and thus demonstrates the generalizability of our proposed approach.
  • Item
    No Preview Available
    BioC: a minimalist approach to interoperability for biomedical text processing
    Comeau, DC ; Dogan, RI ; Ciccarese, P ; Cohen, KB ; Krallinger, M ; Leitner, F ; Lu, Z ; Peng, Y ; Rinaldi, F ; Torii, M ; Valencia, A ; Verspoor, K ; Wiegers, TC ; Wu, CH ; Wilbur, WJ (OXFORD UNIV PRESS, 2013-09-18)
    A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net/
  • Item
    Thumbnail Image
    Annotating the biomedical literature for the human variome
    Verspoor, K ; Yepes, AJ ; Cavedon, L ; McIntosh, T ; Herten-Crabb, A ; Thomas, Z ; Plazzer, J-P (OXFORD UNIV PRESS, 2013-04-12)
    This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the database curators of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is intended to have application to genetic variation information in a range of diseases. The schema has been applied to a small corpus of full text journal publications on the subject of inherited colorectal cancer. We show that the inter-annotator agreement on annotation of this corpus ranges from 0.78 to 0.95 F-score across different entity types when exact matching is measured, and improves to a minimum F-score of 0.87 when boundary matching is relaxed. Relations show more variability in agreement, but several are reliable, with the highest, cohort-has-size, reaching 0.90 F-score. We also explore the relevance of the schema to the InSiGHT database curation process. The schema and the corpus represent an important new resource for the development of text mining solutions that address relationships among patient cohorts, disease and genetic variation, and therefore, we also discuss the role text mining might play in the curation of information related to the human variome. The corpus is available at http://opennicta.com/home/health/variome.
  • Item
    Thumbnail Image
    BioLemmatizer: a lemmatization tool for morphological processing of biomedical text
    Liu, H ; Christiansen, T ; Baumgartner, WA ; Verspoor, K (BMC, 2012-04)
    BACKGROUND: The wide variety of morphological variants of domain-specific technical terms contributes to the complexity of performing natural language processing of the scientific literature related to molecular biology. For morphological analysis of these texts, lemmatization has been actively applied in the recent biomedical research. RESULTS: In this work, we developed a domain-specific lemmatization tool, BioLemmatizer, for the morphological analysis of biomedical literature. The tool focuses on the inflectional morphology of English and is based on the general English lemmatization tool MorphAdorner. The BioLemmatizer is further tailored to the biological domain through incorporation of several published lexical resources. It retrieves lemmas based on the use of a word lexicon, and defines a set of rules that transform a word to a lemma if it is not encountered in the lexicon. An innovative aspect of the BioLemmatizer is the use of a hierarchical strategy for searching the lexicon, which enables the discovery of the correct lemma even if the input Part-of-Speech information is inaccurate. The BioLemmatizer achieves an accuracy of 97.5% in lemmatizing an evaluation set prepared from the CRAFT corpus, a collection of full-text biomedical articles, and an accuracy of 97.6% on the LLL05 corpus. The contribution of the BioLemmatizer to accuracy improvement of a practical information extraction task is further demonstrated when it is used as a component in a biomedical text mining system. CONCLUSIONS: The BioLemmatizer outperforms other tools when compared with eight existing lemmatizers. The BioLemmatizer is released as an open source software and can be downloaded from http://biolemmatizer.sourceforge.net.
  • Item
    Thumbnail Image
    A UIMA wrapper for the NCBO annotator
    Roeder, C ; Jonquet, C ; Shah, NH ; Baumgartner, WA ; Verspoor, K ; Hunter, L (OXFORD UNIV PRESS, 2010-07-15)
    SUMMARY: The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. This note describes our work, which wraps the National Center for Biomedical Ontology (NCBO) Annotator-an ontology-based annotation service-to make it available as a component in UIMA workflows. AVAILABILITY: This wrapper is freely available on the web at http://bionlp-uima.sourceforge.net/ as part of the UIMA tools distribution from the Center for Computational Pharmacology (CCP) at the University of Colorado School of Medicine. It has been implemented in Java for support on Mac OS X, Linux and MS Windows.
  • Item
    Thumbnail Image
    U-Compare bio-event meta-service: compatible BioNLP event extraction services
    Kano, Y ; Bjoerne, J ; Ginter, F ; Salakoski, T ; Buyko, E ; Hahn, U ; Cohen, KB ; Verspoor, K ; Roeder, C ; Hunter, LE ; Kilicoglu, H ; Bergler, S ; Van Landeghem, S ; Van Parys, T ; Van de Peer, Y ; Miwa, M ; Ananiadou, S ; Neves, M ; Pascual-Montano, A ; Ozgur, A ; Radev, DR ; Riedel, S ; Saetre, R ; Chun, H-W ; Kim, J-D ; Pyysalo, S ; Ohta, T ; Tsujii, J (BMC, 2011-12-18)
    BACKGROUND: Bio-molecular event extraction from literature is recognized as an important task of bio text mining and, as such, many relevant systems have been developed and made available during the last decade. While such systems provide useful services individually, there is a need for a meta-service to enable comparison and ensemble of such services, offering optimal solutions for various purposes. RESULTS: We have integrated nine event extraction systems in the U-Compare framework, making them intercompatible and interoperable with other U-Compare components. The U-Compare event meta-service provides various meta-level features for comparison and ensemble of multiple event extraction systems. Experimental results show that the performance improvements achieved by the ensemble are significant. CONCLUSIONS: While individual event extraction systems themselves provide useful features for bio text mining, the U-Compare meta-service is expected to improve the accessibility to the individual systems, and to enable meta-level uses over multiple event extraction systems such as comparison and ensemble.