A pipeline for analysis of published abstracts for information on protein-protein interactions
AuthorLing, Maurice H. T.; Lefevre, Christophe; Nicholas, Kevin R.
Source TitleProceedings, 4th Asia-Pacific Bioinformatics Conference
AffiliationEngineering: Department of Computer Science and Software Engineering
Science: Department of Zoology
Document TypeConference Poster
CitationsLing, M. H. T., Lefevre, C., & Nicholas, K. R. (2006). A pipeline for analysis of published abstracts for information on protein-protein interactions. In, Proceedings, 4th Asia-Pacific Bioinformatics Conference, Taipei, Taiwan.
Access StatusOpen Access
This is a poster of a paper from 4th Asia-Pacific Bioinformatics Conference 2006 published by Association of Asian Society for Bioinformatics . http://binfo.ym.edu.tw/apbc2006/.
There have been large volumes of literature published on mouse intracellular protein-protein interactions. However, there have been little attempts to aggregate these information into a usable model or concept map of protein-protein interactions, such as protein-protein bindings and activations. We have established a process for the handling and analysis of published abstracts from PubMed to extract information on protein-protein interactions, using only open source software and tools. A Firebird database, Muscopedia, which forms the central point of this pipeline, is used to store the abstracts and its processed forms. Muscopedia is interfaced to Python programming language through a Python DB-API compliant library, kinterbasdb. Abstracts were downloaded from PubMed using NCBI’s Simple Object Access Protocol (SOAP) server and scanned for abbreviations using BioNLP server in Stanford University. Terms in the abstracts were substituted for their abbrevations , for example, the term “SOCS” is substituted for “suppressor of cytokine signaling”, before text processing using MontyLingua. MontyLingua is a natural language processing kit, which has ‘commonsense’ built in and is used to process each abstract into a list of subject-verb-object (SVO) structure. On average, each abstract will be processed into 30 to 40 SVOs. Information on protein-protein interactions can then be extracted from this set of SVOs by using suitable verbs. These information will then be used to construct a concept map of protein-protein interactions.
Keywordsmontylingua; text processing; natural language; parsing
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References