Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 393
  • Item
    Thumbnail Image
    Towards a semantic lexicon for biological language processing
    Verspoor, K (HINDAWI LTD, 2005)
    This paper explores the use of the resources in the National Library of Medicine's Unified Medical Language System (UMLS) for the construction of a lexicon useful for processing texts in the field of molecular biology. A lexicon is constructed from overlapping terms in the UMLS SPECIALIST lexicon and the UMLS Metathesaurus to obtain both morphosyntactic and semantic information for terms, and the coverage of a domain corpus is assessed. Over 77% of tokens in the domain corpus are found in the constructed lexicon, validating the lexicon's coverage of the most frequent terms in the domain and indicating that the constructed lexicon is potentially an important resource for biological text processing.
  • Item
    Thumbnail Image
    Assessment of disease named entity recognition on a corpus of annotated sentences
    Jimeno, A ; Jimenez-Ruiz, E ; Lee, V ; Gaudan, S ; Berlanga, R ; Rebholz-Schuhmann, D (BMC, 2008)
    BACKGROUND: In recent years, the recognition of semantic types from the biomedical scientific literature has been focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other semantic types like diseases have not received the same level of attention. Different solutions have been proposed to identify disease named entities in the scientific literature. While matching the terminology with language patterns suffers from low recall (e.g., Whatizit) other solutions make use of morpho-syntactic features to better cover the full scope of terminological variability (e.g., MetaMap). Currently, MetaMap that is provided from the National Library of Medicine (NLM) is the state of the art solution for the annotation of concepts from UMLS (Unified Medical Language System) in the literature. Nonetheless, its performance has not yet been assessed on an annotated corpus. In addition, little effort has been invested so far to generate an annotated dataset that links disease entities in text to disease entries in a database, thesaurus or ontology and that could serve as a gold standard to benchmark text mining solutions. RESULTS: As part of our research work, we have taken a corpus that has been delivered in the past for the identification of associations of genes to diseases based on the UMLS Metathesaurus and we have reprocessed and re-annotated the corpus. We have gathered annotations for disease entities from two curators, analyzed their disagreement (0.51 in the kappa-statistic) and composed a single annotated corpus for public use. Thereafter, three solutions for disease named entity recognition including MetaMap have been applied to the corpus to automatically annotate it with UMLS Metathesaurus concepts. The resulting annotations have been benchmarked to compare their performance. CONCLUSIONS: The annotated corpus is publicly available at ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/diseases and can serve as a benchmark to other systems. In addition, we found that dictionary look-up already provides competitive results indicating that the use of disease terminology is highly standardized throughout the terminologies and the literature. MetaMap generates precise results at the expense of insufficient recall while our statistical method obtains better recall at a lower precision rate. Even better results in terms of precision are achieved by combining at least two of the three methods leading, but this approach again lowers recall. Altogether, our analysis gives a better understanding of the complexity of disease annotations in the literature. MetaMap and the dictionary based approach are available through the Whatizit web service infrastructure (Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A: Text processing through Web services: Calling Whatizit. Bioinformatics 2008, 24:296-298).
  • Item
    Thumbnail Image
    A bi-ordering approach to linking gene expression with clinical annotations in gastric cancer
    Shi, F ; Leckie, C ; MacIntyre, G ; Haviv, I ; Boussioutas, A ; Kowalczyk, A (BMC, 2010-09-23)
    BACKGROUND: In the study of cancer genomics, gene expression microarrays, which measure thousands of genes in a single assay, provide abundant information for the investigation of interesting genes or biological pathways. However, in order to analyze the large number of noisy measurements in microarrays, effective and efficient bioinformatics techniques are needed to identify the associations between genes and relevant phenotypes. Moreover, systematic tests are needed to validate the statistical and biological significance of those discoveries. RESULTS: In this paper, we develop a robust and efficient method for exploratory analysis of microarray data, which produces a number of different orderings (rankings) of both genes and samples (reflecting correlation among those genes and samples). The core algorithm is closely related to biclustering, and so we first compare its performance with several existing biclustering algorithms on two real datasets - gastric cancer and lymphoma datasets. We then show on the gastric cancer data that the sample orderings generated by our method are highly statistically significant with respect to the histological classification of samples by using the Jonckheere trend test, while the gene modules are biologically significant with respect to biological processes (from the Gene Ontology). In particular, some of the gene modules associated with biclusters are closely linked to gastric cancer tumorigenesis reported in previous literature, while others are potentially novel discoveries. CONCLUSION: In conclusion, we have developed an effective and efficient method, Bi-Ordering Analysis, to detect informative patterns in gene expression microarrays by ranking genes and samples. In addition, a number of evaluation metrics were applied to assess both the statistical and biological significance of the resulting bi-orderings. The methodology was validated on gastric cancer and lymphoma datasets.
  • Item
    Thumbnail Image
    On separation of concurrency and conflicts in acyclic process models
    Elliger, F ; Polyvyanyy, A ; Weske, M (Köllen Druck, 2010-12-01)
    Recently, a new approach for structuring acyclic process models has been introduced. The algorithm is based on a transformation between the Refined Process Structure Tree (RPST) of a control flow graph and the Modular Decomposition Tr ee (MDT) of ordering relations. In this paper, an extension of the algorithm is presented that allows to partially structure process models in the case when a process model cannot be structured completely. We distinguish four different types of unstructuredness of process models and show that only two are possible in practice. Fo r one of these two types of unstructuredness an algorithm is proposed that returns the maximally structured representation of a process model.
  • Item
    Thumbnail Image
    Unraveling Unstructured Process Models
    Dumas, M ; Garcia-Banuelos, L ; Polyvyanyy, A ; Mendling, J ; Weidlich, M ; Weske, M (SPRINGER-VERLAG BERLIN, 2010-01-01)
    A BPMN model is well-structured if splits and joins are always paired into single-entry-single-exit blocks. Well-structuredness is often a desirable property as it promotes readability and makes models easier to analyze. However, many process models found in practice are not well-structured, and it is not always feasible or even desirable to restrict process modelers to produce only well-structured models. Also, not all processes can be captured as well-structured process models. An alternative to forcing modelers to produce well-structured models, is to automatically transform unstructured models into well-structured ones when needed and possible. This talk reviews existing results on automatic transformation of unstructured process models into structured ones.
  • Item
    Thumbnail Image
    Aggregate Quality of Service Computation for Composite Services
    Dumas, M ; Garcia-Banuelos, L ; Polyvyanyy, A ; Yang, Y ; Zhang, L ; Maglio, PP ; Weske, M ; Yang, J ; Fantinato, M (SPRINGER-VERLAG BERLIN, 2010-01-01)
    This paper addresses the problem of computing the aggregate QoS of a composite service given the QoS of the services participating in the composition. Previous solutions to this problem are restricted to composite services with well-structured orchestration models. Yet, in existing languages such as WS-BPEL and BPMN, orchestration models may be unstructured. This paper lifts this limitation by providing equations to compute the aggregate QoS for general types of irreducible unstructured regions in orchestration models. In conjunction with existing algorithms for decomposing business process models into single-entry-single-exit regions, these functions allow us to cover a larger set of orchestration models than existing QoS aggregation techniques.
  • Item
    Thumbnail Image
    The Biconnected Verification of Workflow Nets
    Polyvyanyy, A ; Weidlich, M ; Weske, M ; Meersman, R ; Dillon, T ; Herrero, P (SPRINGER-VERLAG BERLIN, 2010-01-01)
    Formal representations of business processes are used for analysis of the process behavior. Workflow nets are a widely used formalism for describing the behavior of business processes. Structure theory of processes investigates the relation between the structure of a model and its behavior. In this paper, we propose to employ the connectivity property of workflow nets as an angle to their structural analysis. In particular, we show how soundness verification can be organized using biconnected components of a workflow net. This allows for efficient identification and localization of flaws in the behavior of workflow nets and for supporting process analysts with diagnostic information.
  • Item
    Thumbnail Image
    Structuring Acyclic Process Models
    Polyvyanyy, A ; Garcia-Banuelos, L ; Dumas, M ; Hull, R ; Mendling, J ; Tai, S (SPRINGER-VERLAG BERLIN, 2010-01-01)
    This paper addresses the problem of transforming a process model with an arbitrary topology into an equivalent well-structured process model. While this problem has received significant attention, there is still no full characterization of the class of unstructured process models that can be transformed into well-structured ones, nor an automated method to structure any process model that belongs to this class. This paper fills this gap in the context of acyclic process models. The paper defines a necessary and sufficient condition for an unstructured process model to have an equivalent structured model under fully concurrent bisimulation, as well as a complete structuring method.
  • Item
    Thumbnail Image
    Process Compliance Measurement Based on Behavioural Profiles
    Weidlich, M ; Polyvyanyy, A ; Desai, N ; Mendling, J ; Pernici, B (SPRINGER-VERLAG BERLIN, 2010-01-01)
    Process compliance measurement is getting increasing attention in companies due to stricter legal requirements and market pressure for operational excellence. On the other hand, the metrics to quantify process compliance have only been defined recently. A major criticism points to the fact that existing measures appear to be unintuitive. In this paper, we trace back this problem to a more foundational question: which notion of behavioural equivalence is appropriate for discussing compliance? We present a quantification approach based on behavioural profiles, which is a process abstraction mechanism. Behavioural profiles can be regarded as weaker than existing equivalence notions like trace equivalence, and they can be calculated efficiently. As a validation, we present a respective implementation that measures compliance of logs against a normative process model. This implementation is being evaluated in a case study with an international service provider.
  • Item
    Thumbnail Image
    Structural abstraction of process specifications
    Polyvyanyy, A (CEUR-WS.org, 2010-12-01)
    Software engineers constantly deal with problems of designing, analyzing, and improving process specifications, e.g., source code, service compositions, or process models. Process specifications are abstractions of behavior observed or intended to be implemented in reality which result from creative engineering practice. Usually, process specifications are formalized as directed graphs in which edges capture temporal relations between decisions, synchronization points, and work activities. Every process specification is a compromise between two points: On the one hand engineers strive to operate with less modeling constructs which conceal irrelevant details, while on the other hand the details are required to achieve the desired level of customization for envisioned process scenarios. In our research, we approach the problem of varying abstraction levels of process specifications. Formally, developed abstraction mechanisms exploit the structure of a process specification and allow the generalization of low-level details into concepts of a higher abstraction level. The reverse procedure can be addressed as process specialization.