Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 56
  • Item
    Thumbnail Image
    Better Health Explorer: Designing for Health Information Seekers
    Pang, C-I ; VERSPOOR, C ; Pearce, J ; Chang, S ; Ploderer, B ; Carter, M ; Gibbs, M ; Smith, MW ; Vetere, F (Association for Computing Machinery, 2015)
    A vast amount of health information has been published online, yet users often report difficulties in locating information in this particular domain. Based on our prior research, we consider four categories of online health information seekers who demonstrate mixed information needs. Although their searching needs are often well satisfied by entering keywords into search engines, their need to explore information is not so well supported, thus affecting their user experience and satisfaction. In this paper, we propose design principles for supporting the exploration of online health information. We present the rationale and the design process of a web app - Better Health Explorer - which is a proof-of-concept app tailored to health information exploration. This work contributes to the design of online health information systems as well as exploratory systems in general.
  • Item
    Thumbnail Image
    Better health information exploration
    Pang, PCI ; Verspoor, K ; Chang, S ; Pearce, J ; Sari, E ; Duh, H ; Brereton, M ; T, JL ; Awori, K ; Wan Bt Ahmad, FW (ACM, 2015-12-07)
    The provision of health information has to be clear and appealing to users. Research has shown that health information seekers do not all have the same attributes, skills or needs. In any given health-related app or website, there is a need to provide tools for accessing information in ways that appeal to users. This is not always supported by current web technologies. As such, based on prior research on health information seeking behaviour and needs, we designed and created a proofof- concept website named Better Health Explorer to experiment on health information seekers. The pilot results show a positive effect on supporting and improving the experience of seekers with exploratory search behaviour.
  • Item
    No Preview Available
    BioC interoperability track overview
    Comeau, DC ; Batista-Navarro, RT ; Dai, H-J ; Dogan, RI ; Yepes, AJ ; Khare, R ; Lu, Z ; Marques, H ; Mattingly, CJ ; Neves, M ; Peng, Y ; Rak, R ; Rinaldi, F ; Tsai, RT-H ; Verspoor, K ; Wiegers, TC ; Wu, CH ; Wilbur, WJ (OXFORD UNIV PRESS, 2014-06-30)
    BioC is a new simple XML format for sharing biomedical text and annotations and libraries to read and write that format. This promotes the development of interoperable tools for natural language processing (NLP) of biomedical text. The interoperability track at the BioCreative IV workshop featured contributions using or highlighting the BioC format. These contributions included additional implementations of BioC, many new corpora in the format, biomedical NLP tools consuming and producing the format and online services using the format. The ease of use, broad support and rapidly growing number of tools demonstrate the need for and value of the BioC format. Database URL: http://bioc.sourceforge.net/.
  • Item
    No Preview Available
    Approximate Subgraph Matching-Based Literature Mining for Biomedical Events and Relations
    Liu, H ; Hunter, L ; Keselj, V ; Verspoor, K ; Smalheiser, NR (PUBLIC LIBRARY SCIENCE, 2013-04-17)
    The biomedical text mining community has focused on developing techniques to automatically extract important relations between biological components and semantic events involving genes or proteins from literature. In this paper, we propose a novel approach for mining relations and events in the biomedical literature using approximate subgraph matching. Extraction of such knowledge is performed by searching for an approximate subgraph isomorphism between key contextual dependencies and input sentence graphs. Our approach significantly increases the chance of retrieving relations or events encoded within complex dependency contexts by introducing error tolerance into the graph matching process, while maintaining the extraction precision at a high level. When evaluated on practical tasks, it achieves a 51.12% F-score in extracting nine types of biological events on the GE task of the BioNLP-ST 2011 and an 84.22% F-score in detecting protein-residue associations. The performance is comparable to the reported systems across these tasks, and thus demonstrates the generalizability of our proposed approach.
  • Item
    No Preview Available
    BioC: a minimalist approach to interoperability for biomedical text processing
    Comeau, DC ; Dogan, RI ; Ciccarese, P ; Cohen, KB ; Krallinger, M ; Leitner, F ; Lu, Z ; Peng, Y ; Rinaldi, F ; Torii, M ; Valencia, A ; Verspoor, K ; Wiegers, TC ; Wu, CH ; Wilbur, WJ (OXFORD UNIV PRESS, 2013-09-18)
    A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net/
  • Item
    Thumbnail Image
    Annotating the biomedical literature for the human variome
    Verspoor, K ; Yepes, AJ ; Cavedon, L ; McIntosh, T ; Herten-Crabb, A ; Thomas, Z ; Plazzer, J-P (OXFORD UNIV PRESS, 2013-04-12)
    This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the database curators of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is intended to have application to genetic variation information in a range of diseases. The schema has been applied to a small corpus of full text journal publications on the subject of inherited colorectal cancer. We show that the inter-annotator agreement on annotation of this corpus ranges from 0.78 to 0.95 F-score across different entity types when exact matching is measured, and improves to a minimum F-score of 0.87 when boundary matching is relaxed. Relations show more variability in agreement, but several are reliable, with the highest, cohort-has-size, reaching 0.90 F-score. We also explore the relevance of the schema to the InSiGHT database curation process. The schema and the corpus represent an important new resource for the development of text mining solutions that address relationships among patient cohorts, disease and genetic variation, and therefore, we also discuss the role text mining might play in the curation of information related to the human variome. The corpus is available at http://opennicta.com/home/health/variome.
  • Item
    Thumbnail Image
    BioLemmatizer: a lemmatization tool for morphological processing of biomedical text
    Liu, H ; Christiansen, T ; Baumgartner, WA ; Verspoor, K (BMC, 2012-04)
    BACKGROUND: The wide variety of morphological variants of domain-specific technical terms contributes to the complexity of performing natural language processing of the scientific literature related to molecular biology. For morphological analysis of these texts, lemmatization has been actively applied in the recent biomedical research. RESULTS: In this work, we developed a domain-specific lemmatization tool, BioLemmatizer, for the morphological analysis of biomedical literature. The tool focuses on the inflectional morphology of English and is based on the general English lemmatization tool MorphAdorner. The BioLemmatizer is further tailored to the biological domain through incorporation of several published lexical resources. It retrieves lemmas based on the use of a word lexicon, and defines a set of rules that transform a word to a lemma if it is not encountered in the lexicon. An innovative aspect of the BioLemmatizer is the use of a hierarchical strategy for searching the lexicon, which enables the discovery of the correct lemma even if the input Part-of-Speech information is inaccurate. The BioLemmatizer achieves an accuracy of 97.5% in lemmatizing an evaluation set prepared from the CRAFT corpus, a collection of full-text biomedical articles, and an accuracy of 97.6% on the LLL05 corpus. The contribution of the BioLemmatizer to accuracy improvement of a practical information extraction task is further demonstrated when it is used as a component in a biomedical text mining system. CONCLUSIONS: The BioLemmatizer outperforms other tools when compared with eight existing lemmatizers. The BioLemmatizer is released as an open source software and can be downloaded from http://biolemmatizer.sourceforge.net.
  • Item
    Thumbnail Image
    A UIMA wrapper for the NCBO annotator
    Roeder, C ; Jonquet, C ; Shah, NH ; Baumgartner, WA ; Verspoor, K ; Hunter, L (OXFORD UNIV PRESS, 2010-07-15)
    SUMMARY: The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. This note describes our work, which wraps the National Center for Biomedical Ontology (NCBO) Annotator-an ontology-based annotation service-to make it available as a component in UIMA workflows. AVAILABILITY: This wrapper is freely available on the web at http://bionlp-uima.sourceforge.net/ as part of the UIMA tools distribution from the Center for Computational Pharmacology (CCP) at the University of Colorado School of Medicine. It has been implemented in Java for support on Mac OS X, Linux and MS Windows.
  • Item
    Thumbnail Image
    Towards a semantic lexicon for biological language processing
    Verspoor, K (HINDAWI LTD, 2005)
    This paper explores the use of the resources in the National Library of Medicine's Unified Medical Language System (UMLS) for the construction of a lexicon useful for processing texts in the field of molecular biology. A lexicon is constructed from overlapping terms in the UMLS SPECIALIST lexicon and the UMLS Metathesaurus to obtain both morphosyntactic and semantic information for terms, and the coverage of a domain corpus is assessed. Over 77% of tokens in the domain corpus are found in the constructed lexicon, validating the lexicon's coverage of the most frequent terms in the domain and indicating that the constructed lexicon is potentially an important resource for biological text processing.
  • Item
    Thumbnail Image
    Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
    Abi-Haidar, A ; Kaur, J ; Maguitman, A ; Radivojac, P ; Rechtsteiner, A ; Verspoor, K ; Wang, Z ; Rocha, LM (BMC, 2008)
    BACKGROUND: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. RESULTS: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. CONCLUSION: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.