Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 13
  • Item
    Thumbnail Image
    Distributed BLAST in a grid computing context
    Bayer, Micha ; SINNOTT, RICHARD (Springer, 2005)
    The Basic Local Alignment Search Tool (BLAST) is one of the best known sequence comparison programs available in bioinformatics. It is used to compare query sequences to a set of target sequences, with the intention of finding similar sequences in the target set. Here, we present a distributed BLAST service which operates over a set of heterogeneous Grid resources and is made available through a Globus toolkit v.3 Grid service. This work has been carried out in the context of the BRIDGES project, a UK e-Science project aimed at providing a Grid based environment for biomedical research. Input consisting of multiple query sequences is partitioned into sub-jobs on the basis of the number of idle compute nodes available and then processed on these in batches. To achieve this, we have implemented our own Java-based scheduler which distributes sub-jobs across an array of resources utilizing a variety of local job scheduling systems.
  • Item
    Thumbnail Image
    Development of grid frameworks for clinical trials and epidemiological studies
    SINNOTT, RICHARD ; STELL, ANTHONY ; Ajayi, Oluwafemi (IOS Press, 2006)
    E-Health initiatives such as electronic clinical trials and epidemiological studies require access to and usage of a range of both clinical and other data sets. Such data sets are typically only available over many heterogeneous domains where a plethora of often legacy based or in-house/bespoke IT solutions exist. Considerable efforts and investments are being made across the UK to upgrade the IT infrastructures across the National Health Service (NHS) such as the National Program for IT in the NHS (NPFIT) [1]. However, it is the case that currently independent and largely non-interoperable IT solutions exist across hospitals, trusts, disease registries and GP practices – this includes security as well as more general compute and data infrastructures. Grid technology allows issues of distribution and heterogeneity to be overcome, however the clinical trials domain places special demands on security and data which hitherto the Grid community have not satisfactorily addressed. These challenges are often common across many studies and trials hence the development of a re-usable framework for creation and subsequent management of such infrastructures is highly desirable. In this paper we present the challenges in developing such a framework and outline initial scenarios and prototypes developed within the MRC funded Virtual Organisations for Trials and Epidemiological Studies (VOTES) project [2].
  • Item
    Thumbnail Image
    Single sign-on and authorization for dynamic virtual organizations
    Sinnott, R. O. ; Ajayi, O. ; Stell, A. J. ; Watt, J. ; JIANG, J. (Springer, 2006)
    The vision of the Grid is to support the dynamic establishment and subsequent management of virtual organizations (VO). To achieve this presents many challenges for the Grid community with perhaps the greatest one being security. Whilst Public Key Infrastructures (PKI) provide a form of single sign-on through recognition of trusted certification authorities, they have numerous limitations. The Internet2 Shibboleth architecture and protocols provide an enabling technology overcoming some of the issues with PKIs however Shibboleth too suffers from various limitations that make its application for dynamic VO establishment and management difficult. In this paper we explore the limitations of PKIs and Shibboleth and present an infrastructure that incorporates single sign-on with advanced authorization of federated security infrastructures and yet is seamless and targeted to the needs of end users. We explore this infrastructure through an educational case study at the National e-Science Centre (NeSC) at the University of Glasgow and Edinburgh.
  • Item
    Thumbnail Image
    From access and integration to mining of secure genomic data sets across the grid
    Sinnott, Richard O. (Elsevier, 2007)
    The UK Department of Trade and Industry (DTI) funded BRIDGES project (Biomedical Research Informatics Delivered by Grid Enabled Services) has developed a Grid infrastructure to support cardiovascular research. This includes the provision of a compute Grid and a data Grid infrastructure with security at its heart. In this paper we focus on the BRIDGES data Grid. A primary aim of the BRIDGES data Grid is to help control the complexity in access to and integration of a myriad of genomic data sets through simple Grid based tools. We outline these tools, how they are delivered to the end user scientists. We also describe how these tools are to be extended in the BBSRC funded Grid Enabled Microarray Expression Profile Search (GEMEPS) to support a richer vocabulary of search capabilities to support mining of microarray data sets. As with BRIDGES, fine grain Grid security underpins GEMEPS.
  • Item
    Thumbnail Image
    Security-oriented data grids for microarray expression profiles
    SINNOTT, RICHARD ; BAYLISS, CHRISTOPHER ; Jiang, Jipu (IOS Press, 2007)
    Microarray experiments are one of the key ways in which gene activity can be identified and measured thereby shedding light and understanding for example on biological processes. The BBSRC funded Grid enabled Microarray Expression Profile Search (GEMEPS) project has developed an infrastructure which allows post-genomic life science researchers to ask and answer the following questions: who has undertaken microarray experiments that are in some way similar or relevant to mine; and how similar were these relevant experiments? Given that microarray experiments are expensive to undertake and may possess crucial information for future exploitation (both academically and commercially), scientists are wary of allowing unrestricted access to their data by the wider community until fully exploited locally. A key requirement is thus to have fine grained security that is easy to establish and simple (or ideally transparent) to use across inter-institutional virtual organisations. In this paper we present an enhanced security-oriented data Grid infrastructure that supports the definition of these kinds of queries and the analysis and comparison of microarray experiment results.
  • Item
    Thumbnail Image
    Data curation standards and social science occupational information resources
    Lambert, Paul ; Gayle, Vernon ; Tan, Larry ; Turner, Ken ; SINNOTT, RICHARD ; Prandy, Ken (UKOLN, University of Bath, 2007)
    Occupational information resources - data about the characteristics of different occupational positions - are widely used in the social sciences, across a range of disciplines and international contexts. They are available in many formats, most often constituting small electronic files that are made freely downloadable from academic web-pages. However there are several challenges associated with how occupational information resources are distributed to, and exploited by, social researchers. In this paper we describe features of occupational information resources, and indicate the role digital curation can play in exploiting them. We report upon the strategies used in the GEODE research project (Grid Enabled Occupational Data Environment, http://www.geode.stir.ac.uk). This project attempts to develop long-term standards for the distribution of occupational information resources, by providing a standardized framework-based electronic depository for occupational information resources, and by providing a data indexing service, based on e-Science middleware, which collates occupational information resources and makes them readily accessible to non-specialist social scientists.
  • Item
    Thumbnail Image
    User-oriented security supporting inter-disciplinary life science research across the grid
    Sinnott, R ; Ajayi, O ; Jiang, J ; Stell, A ; Watt, J (SPRINGER, 2007)
  • Item
    Thumbnail Image
    BioNessie: a grid enabled biochemical networks simulation environment
    LIU, XUAN ; Jiang, Jipu ; Ajayi, Oluwafemi ; Gu, Xu ; Gilbert, David ; SINNOTT, RICHARD (IOS Press, 2008)
    The simulation of biochemical networks provides insight and understanding about the underlying biochemical processes and pathways used by cells and organisms. BioNessie is a biochemical network simulator which has been developed at the University of Glasgow. This paper describes the simulator and focuses in particular on how it has been extended to benefit from a wide variety of high performance compute resources across the UK through Grid technologies to support larger scale simulations.
  • Item
    Thumbnail Image
    The brain monitoring with information technology (BrainIT) collaborative network: EC feasibility study results
    Piper, Ian ; Chambers, Iain ; Citerio, Giuseppe ; Enblad, Per ; Gregson, Barbara ; Howells, Tim ; Kiening, Karl ; Mattern, Julia ; Nilsson, Pelle ; Ragauskas, Arminas ; Sahuquillo, Juan ; Donald, R. ; Sinnott, R. ; Stell, A. (Springer, 2009)
    BACKGROUND: The BrainIT group works collaboratively on developing standards for collection and analyses of data from brain injured patients towards providing a more efficient infrastructure for assessing new health care technology. EC funding supported meetings over a year to discuss and define a core dataset to be collected with IT based methods from patients with traumatic brain injury. We now report on the results of a follow-up period of funding to test the feasibility for collection of the core dataset with IT based methods. METHODS: Over a three year period, data collection client and web-server based tools were developed and core data (grouped into 9 categories) were collected from 200 head-injured patients by local nursing staff. Data were uploaded by the BrainIT web and random samples of received data were selected automatically by computer for validation by data validation (DV) research nurse staff against gold standard sources held in the local centre. Validated data were compared with original data sent and percentage error rates calculated by data category. Feasibility was assessed in terms of the amount of missing data, accuracy of data collected and limitations reported by users of the IT methods. FINDINGS: Thirteen percent of data files required cleaning. Thirty “one-off” demographic and clinical data elements had significant amounts of missing data (> 15%). Validation nurses conducted 19,461 comparisons between uploaded database data with local data sources and error rates were generally less than or equal to 6%, the exception being the surgery data class where an unacceptably high error rate was found. Nearly 10,000 therapies were successfully recorded with start-times but approximately a third had inaccurate or missing end times which limits analyses assessing duration of therapy. Over 40,000 events and procedures were recorded but events with long durations (such as transfers) were more likely to have “end-times” missed. CONCLUSIONS: The BrainIT core dataset is a rich dataset for hypothesis generation and post-hoc analyses provided studies avoid known limitations in the dataset. Limitations in the current IT based data collection tools have been identified and have been addressed. Future academic led multi-centre data collection projects must decrease validation costs and likely will require more direct electronic access to hospital based clinical data sources for both validation purposes and for reducing the research nurse time needed for double data entry. This type of infrastructure will foster remote monitoring of patient management and protocol adherence in future trials of patient management and monitoring.
  • Item
    Thumbnail Image
    Enabling quantitative data analysis through e-infrastructures
    Tan, Koon Leai Larry ; Lambert, Paul S. ; Turner, Ken J. ; Blum, Jesse ; Gayle, Vernon ; Jones, Simon B. ; Sinnott, Richard O. ; Warner, Guy (Sage Publications, 2009)
    This article discusses how quantitative data analysis in the social sciences can engage with and exploit an e-Infrastructure. We highlight how a number of activities that are central to quantitative data analysis, referred to as ‘‘data management,’’ can benefit from e-Infrastructural support. We conclude by discussing how these issues are relevant to the Data Management through e-Social Science (DAMES) research Node, an ongoing project that aims to develop e-Infrastructural resources for quantitative data analysis in the social sciences.