Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 147
  • Item
    Thumbnail Image
    Ontology quality assurance through analysis of term transformations
    Verspoor, K ; Dvorkin, D ; Cohen, KB ; Hunter, L (OXFORD UNIV PRESS, 2009-06-15)
    MOTIVATION: It is important for the quality of biological ontologies that similar concepts be expressed consistently, or univocally. Univocality is relevant for the usability of the ontology for humans, as well as for computational tools that rely on regularity in the structure of terms. However, in practice terms are not always expressed consistently, and we must develop methods for identifying terms that are not univocal so that they can be corrected. RESULTS: We developed an automated transformation-based clustering methodology for detecting terms that use different linguistic conventions for expressing similar semantics. These term sets represent occurrences of univocality violations. Our method was able to identify 67 examples of univocality violations in the Gene Ontology. AVAILABILITY: The identified univocality violations are available upon request. We are preparing a release of an open source version of the software to be available at http://bionlp.sourceforge.net.
  • Item
    Thumbnail Image
    The textual characteristics of traditional and Open Access scientific journals are similar
    Verspoor, K ; Cohen, KB ; Hunter, L (BMC, 2009-06-15)
    BACKGROUND: Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods developed for analysis of Open Access full text publications will generalize to the biomedical literature as a whole. If this assumption is wrong, the cost to the community will be large, including not just wasted resources, but also flawed science. This paper examines that assumption. RESULTS: We collected two sets of documents, one consisting only of Open Access publications and the other consisting only of traditional journal publications. We examined them for differences in surface linguistic structures that have obvious consequences for the ease or difficulty of natural language processing and for differences in semantic content as reflected in lexical items. Regarding surface linguistic structures, we examined the incidence of conjunctions, negation, passives, and pronominal anaphora, and found that the two collections did not differ. We also examined the distribution of sentence lengths and found that both collections were characterized by the same mode. Regarding lexical items, we found that the Kullback-Leibler divergence between the two collections was low, and was lower than the divergence between either collection and a reference corpus. Where small differences did exist, log likelihood analysis showed that they were primarily in the area of formatting and in specific named entities. CONCLUSION: We did not find structural or semantic differences between the Open Access and traditional journal collections.
  • Item
    Thumbnail Image
    Reuse of terminological resources for efficient ontological engineering in Life Sciences
    Jimeno-Yepes, A ; Jimenez-Ruiz, E ; Berlanga-Llavori, R ; Rebholz-Schuhmann, D (BMC, 2009)
    This paper is intended to explore how to use terminological resources for ontology engineering. Nowadays there are several biomedical ontologies describing overlapping domains, but there is not a clear correspondence between the concepts that are supposed to be equivalent or just similar. These resources are quite precious but their integration and further development are expensive. Terminologies may support the ontological development in several stages of the lifecycle of the ontology; e.g. ontology integration. In this paper we investigate the use of terminological resources during the ontology lifecycle. We claim that the proper creation and use of a shared thesaurus is a cornerstone for the successful application of the Semantic Web technology within life sciences. Moreover, we have applied our approach to a real scenario, the Health-e-Child (HeC) project, and we have evaluated the impact of filtering and re-organizing several resources. As a result, we have created a reference thesaurus for this project, named HeCTh.
  • Item
    Thumbnail Image
    Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb
    Nagel, K ; Jimeno-Yepes, A ; Rebholz-Schuhmann, D (BMC, 2009)
    BACKGROUND: A protein annotation database, such as the Universal Protein Resource knowledge base (UniProtKb), is a valuable resource for the validation and interpretation of predicted 3D structure patterns in proteins. Existing studies have focussed on point mutation extraction methods from biomedical literature which can be used to support the time consuming work of manual database curation. However, these methods were limited to point mutation extraction and do not extract features for the annotation of proteins at the residue level. RESULTS: This work introduces a system that identifies protein residues in MEDLINE abstracts and annotates them with features extracted from the context written in the surrounding text. MEDLINE abstract texts have been processed to identify protein mentions in combination with taxonomic species and protein residues (F1-measure 0.52). The identified protein-species-residue triplets have been validated and benchmarked against reference data resources (UniProtKb, average F1-measure of 0.54). Then, contextual features were extracted through shallow and deep parsing and the features have been classified into predefined categories (F1-measure ranges from 0.15 to 0.67). Furthermore, the feature sets have been aligned with annotation types in UniProtKb to assess the relevance of the annotations for ongoing curation projects. Altogether, the annotations have been assessed automatically and manually against reference data resources. CONCLUSION: This work proposes a solution for the automatic extraction of functional annotation for protein residues from biomedical articles. The presented approach is an extension to other existing systems in that a wider range of residue entities are considered and that features of residues are extracted as annotations.
  • Item
    Thumbnail Image
    Connectivity, Coverage and Placement in Wireless Sensor Networks
    Li, J ; Andrew, LLH ; Foh, CH ; Zukerman, M ; Chen, H-H (MDPI, 2009-10)
    Wireless communication between sensors allows the formation of flexible sensor networks, which can be deployed rapidly over wide or inaccessible areas. However, the need to gather data from all sensors in the network imposes constraints on the distances between sensors. This survey describes the state of the art in techniques for determining the minimum density and optimal locations of relay nodes and ordinary sensors to ensure connectivity, subject to various degrees of uncertainty in the locations of the nodes.
  • Item
    Thumbnail Image
    A voting approach to identify a small number of highly predictive genes using multiple classifiers
    Hassan, MR ; Hossain, MM ; Bailey, J ; Macintyre, G ; Ho, JWK ; Ramamohanarao, K (BMC, 2009-01-30)
    BACKGROUND: Microarray gene expression profiling has provided extensive datasets that can describe characteristics of cancer patients. An important challenge for this type of data is the discovery of gene sets which can be used as the basis of developing a clinical predictor for cancer. It is desirable that such gene sets be compact, give accurate predictions across many classifiers, be biologically relevant and have good biological process coverage. RESULTS: By using a new type of multiple classifier voting approach, we have identified gene sets that can predict breast cancer prognosis accurately, for a range of classification algorithms. Unlike a wrapper approach, our method is not specialised towards a single classification technique. Experimental analysis demonstrates higher prediction accuracies for our sets of genes compared to previous work in the area. Moreover, our sets of genes are generally more compact than those previously proposed. Taking a biological viewpoint, from the literature, most of the genes in our sets are known to be strongly related to cancer. CONCLUSION: We show that it is possible to obtain superior classification accuracy with our approach and obtain a compact gene set that is also biologically relevant and has good coverage of different biological processes.
  • Item
    Thumbnail Image
    Unveiling Hidden Unstructured Regions in Process Models
    Polyvyanyy, A ; Garcia-Banuelos, L ; Weske, M ; Meersman, R ; Dillon, T ; Herrero, P (SPRINGER-VERLAG BERLIN, 2009-01-01)
    Process models define allowed process execution scenarios. The models are usually depicted as directed graphs, with gateway nodes regulating the control flow routing logic and with edges specifying the execution order constraints between tasks. While arbitrarily structured control flow patterns in process models complicate model analysis, they also permit creativity and full expressiveness when capturing non-trivial process scenarios. This paper gives a classification of arbitrarily structured process models based on the hierarchical process model decomposition technique. We identify a structural class of models consisting of block structured patterns which, when combined, define complex execution scenarios spanning across the individual patterns. We show that complex behavior can be localized by examining structural relations of loops in hidden unstructured regions of control flow. The correctness of the behavior of process models within these regions can be validated in linear time. These observations allow us to suggest techniques for transforming hidden unstructured regions into block-structured ones.
  • Item
    Thumbnail Image
    On application of structural decomposition for process model abstraction
    Polyvyanyy, A ; Smirnov, S ; Weske, M (Springer Verlag, 2009-12-01)
    Real world business process models may consist of hundreds of elements and have sophisticated structure. Although there are tasks where such models are valuable and appreciated, in general complexity has a negative influence on model comprehension and analysis. Thus, means for managing the complexity of process models are needed. One approach is abstraction of business process models - creation of a process model which preserves the main features of the initial elaborate process model, but leaves out insignificant details. In this paper we study the structural aspects of process model abstraction and introduce an abstraction approach based on process structure trees (PST). The developed approach assures that the abstracted process model preserves the ordering constraints of the initial model. It surpasses pattern-based process model abstraction approaches, allowing to handle graph-structured process models of arbitrary structure. We also provide an evaluation of the proposed approach.
  • Item
    Thumbnail Image
    Hypergraph-Based Modeling of Ad-Hoc Business Processes
    Polyvyanyy, A ; Weske, M ; Ardagna, D ; Mecella, M ; Yang, J (SPRINGER-VERLAG BERLIN, 2009-01-01)
    Process models are usually depicted as directed graphs, with nodes representing activities and directed edges control flow. While structured processes with pre-defined control flow have been studied in detail, flexible processes including ad-hoc activities need further investigation. This paper presents flexible process graph, a novel approach to model processes in the context of dynamic environment and adaptive process participants’ behavior. The approach allows defining execution constraints, which are more restrictive than traditional ad-hoc processes and less restrictive than traditional control flow, thereby balancing structured control flow with unstructured ad-hoc activities. Flexible process graph focuses on what can be done to perform a process. Process participants’ routing decisions are based on the current process state. As a formal grounding, the approach uses hypergraphs, where each edge can associate any number of nodes. Hypergraphs are used to define execution semantics of processes formally. We provide a process scenario to motivate and illustrate the approach.
  • Item
    Thumbnail Image
    The Triconnected Abstraction of Process Models
    Polyvyanyy, A ; Smirnov, S ; Weske, M ; Dayal, U ; Eder, J ; Koehler, J ; Reijers, HA (SPRINGER-VERLAG BERLIN, 2009-01-01)
    Companies use business process models to represent their working procedures in order to deploy services to markets, to analyze them, and to improve upon them. Competitive markets necessitate complex procedures, which lead to large process specifications with sophisticated structures. Real world process models can often incorporate hundreds of modeling constructs. While a large degree of detail complicates the comprehension of the processes, it is essential to many analysis tasks. This paper presents a technique to abstract, i.e., to simplify process models. Given a detailed model, we introduce abstraction rules which generalize process fragments in order to bring the model to a higher abstraction level. The approach is suited for the abstraction of large process specifications in order to aid model comprehension as well as decomposing problems of process model analysis. The work is based on process structure trees that have recently been introduced to the field of business process management.