Zoology - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    A pipeline for analysis of published abstracts for information on protein-protein interactions
    Ling, Maurice H. T. ; Lefevre, Christophe ; Nicholas, Kevin R. ( 2006)
    There have been large volumes of literature published on mouse intracellular protein-protein interactions. However, there have been little attempts to aggregate these information into a usable model or concept map of protein-protein interactions, such as protein-protein bindings and activations. We have established a process for the handling and analysis of published abstracts from PubMed to extract information on protein-protein interactions, using only open source software and tools. A Firebird database, Muscopedia, which forms the central point of this pipeline, is used to store the abstracts and its processed forms. Muscopedia is interfaced to Python programming language through a Python DB-API compliant library, kinterbasdb. Abstracts were downloaded from PubMed using NCBI’s Simple Object Access Protocol (SOAP) server and scanned for abbreviations using BioNLP server in Stanford University. Terms in the abstracts were substituted for their abbrevations , for example, the term “SOCS” is substituted for “suppressor of cytokine signaling”, before text processing using MontyLingua. MontyLingua is a natural language processing kit, which has ‘commonsense’ built in and is used to process each abstract into a list of subject-verb-object (SVO) structure. On average, each abstract will be processed into 30 to 40 SVOs. Information on protein-protein interactions can then be extracted from this set of SVOs by using suitable verbs. These information will then be used to construct a concept map of protein-protein interactions.