Computing and Information Systems - Research Publications
Now showing items 1-12 of 1565
Plasma Lipid Profiling Shows Similar Associations with Prediabetes and Type 2 Diabetes
(PUBLIC LIBRARY SCIENCE, 2013-09-27)
The relationship between lipid metabolism with prediabetes (impaired fasting glucose and impaired glucose tolerance) and type 2 diabetes mellitus is poorly defined. We hypothesized that a lipidomic analysis of plasma lipids might improve the understanding of this relationship. We performed lipidomic analysis measuring 259 individual lipid species, including sphingolipids, phospholipids, glycerolipids and cholesterol esters, on fasting plasma from 117 type 2 diabetes, 64 prediabetes and 170 normal glucose tolerant participants in the Australian Diabetes, Obesity and Lifestyle Study (AusDiab) then validated our findings on 1076 individuals from the San Antonio Family Heart Study (SAFHS). Logistic regression analysis of identified associations with type 2 diabetes (135 lipids) and prediabetes (134 lipids), after adjusting for multiple covariates. In addition to the expected associations with diacylglycerol, triacylglycerol and cholesterol esters, type 2 diabetes and prediabetes were positively associated with ceramide, and its precursor dihydroceramide, along with phosphatidylethanolamine, phosphatidylglycerol and phosphatidylinositol. Significant negative associations were observed with the ether-linked phospholipids alkylphosphatidylcholine and alkenylphosphatidylcholine. Most of the significant associations in the AusDiab cohort (90%) were subsequently validated in the SAFHS cohort. The aberration of the plasma lipidome associated with type 2 diabetes is clearly present in prediabetes, prior to the onset of type 2 diabetes. Lipid classes and species associated with type 2 diabetes provide support for a number of existing paradigms of dyslipidemia and suggest new avenues of investigation.
Breaking the Ice and Forging Links: The Importance of Socializing in Research
(PUBLIC LIBRARY SCIENCE, 2013-11-01)
When meeting someone for the first time-whether another PhD student, or the Founding Editor-in-chief of PLOS Computational Biology-nothing breaks the ice like eating pancakes or having drinks together. A social atmosphere provides a relaxed, informal environment where people can connect, share ideas, and form collaborations. Being able to build a network and thrive in a social environment is crucial to a successful scientific career. This article highlights the importance of bringing people together who speak the same scientific language in an informal setting. Using examples of events held by Regional Student Groups of the ISCB's Student Council, this article shows that socializing is much more than simply sharing a drink.
Using simple agent-based modeling to inform and enhance neighborhood walkability
BACKGROUND: Pedestrian-friendly neighborhoods with proximal destinations and services encourage walking and decrease car dependence, thereby contributing to more active and healthier communities. Proximity to key destinations and services is an important aspect of the urban design decision making process, particularly in areas adopting a transit-oriented development (TOD) approach to urban planning, whereby densification occurs within walking distance of transit nodes. Modeling destination access within neighborhoods has been limited to circular catchment buffers or more sophisticated network-buffers generated using geoprocessing routines within geographical information systems (GIS). Both circular and network-buffer catchment methods are problematic. Circular catchment models do not account for street networks, thus do not allow exploratory 'what-if' scenario modeling; and network-buffering functionality typically exists within proprietary GIS software, which can be costly and requires a high level of expertise to operate. METHODS: This study sought to overcome these limitations by developing an open-source simple agent-based walkable catchment tool that can be used by researchers, urban designers, planners, and policy makers to test scenarios for improving neighborhood walkable catchments. A simplified version of an agent-based model was ported to a vector-based open source GIS web tool using data derived from the Australian Urban Research Infrastructure Network (AURIN). The tool was developed and tested with end-user stakeholder working group input. RESULTS: The resulting model has proven to be effective and flexible, allowing stakeholders to assess and optimize the walkability of neighborhood catchments around actual or potential nodes of interest (e.g., schools, public transport stops). Users can derive a range of metrics to compare different scenarios modeled. These include: catchment area versus circular buffer ratios; mean number of streets crossed; and modeling of different walking speeds and wait time at intersections. CONCLUSIONS: The tool has the capacity to influence planning and public health advocacy and practice, and by using open-access source software, it is available for use locally and internationally. There is also scope to extend this version of the tool from a simple to a complex model, which includes agents (i.e., simulated pedestrians) 'learning' and incorporating other environmental attributes that enhance walkability (e.g., residential density, mixed land use, traffic volume).
Violent extremist group ecologies under stress
(NATURE PUBLISHING GROUP, 2013-03-27)
Violent extremist groups are currently making intensive use of Internet fora for recruitment to terrorism. These fora are under constant scrutiny by security agencies, private vigilante groups, and hackers, who sometimes shut them down with cybernetic attacks. However, there is a lack of experimental and formal understanding of the recruitment dynamics of online extremist fora and the effect of strategies to control them. Here, we utilize data on ten extremist fora that we collected for four years to develop a data-driven mathematical model that is the first attempt to measure whether (and how) these external attacks induce extremist fora to self-regulate. The results suggest that an increase in the number of groups targeted for attack causes an exponential increase in the cost of enforcement and an exponential decrease in its effectiveness. Thus, a policy to occasionally attack large groups can be very efficient for limiting violent output from these fora.
Discovery and analysis of consistent active subnetworks in cancers
Gene expression profiles can show significant changes when genetically diseased cells are compared with non-diseased cells. Biological networks are often used to identify active subnetworks (ASNs) of the diseases from the expression profiles to understand the reason behind the observed changes. Current methodologies for discovering ASNs mostly use undirected PPI networks and node centric approaches. This can limit their ability to find the meaningful ASNs when using integrated networks having comprehensive information than the traditional protein-protein interaction networks. Using appropriate scoring functions to assess both genes and their interactions may allow the discovery of better ASNs. In this paper, we present CASNet, which aims to identify better ASNs using (i) integrated interaction networks (mixed graphs), (ii) directions of regulations of genes, and (iii) combined node and edge scores. We simplify and extend previous methodologies to incorporate edge evaluations and lessen their sensitivity to significance thresholds. We formulate our objective functions using mixed integer programming (MIP) and show that optimal solutions may be obtained. We compare the ASNs obtained by CASNet and similar other approaches to show that CASNet can often discover more meaningful and stable regulatory ASNs. Our analysis of a breast cancer dataset finds that the positive feedback loops across 7 genes, AR, ESR1, MYC, E2F2, PGR, BCL2 and CCND1 are conserved across the basal/triple negative subtypes in multiple datasets that could potentially explain the aggressive nature of this cancer subtype. Furthermore, comparison of the basal subtype of breast cancer and the mesenchymal subtype of glioblastoma ASNs shows that an ASN in the vicinity of IL6 is conserved across the two subtypes. This result suggests that subtypes of different cancers can show molecular similarities indicating that the therapeutic approaches in different types of cancers may be shared.
BetaSearch: a new method for querying β-residue motifs.
(Springer Science and Business Media LLC, 2012-07-30)
BACKGROUND: Searching for structural motifs across known protein structures can be useful for identifying unrelated proteins with similar function and characterising secondary structures such as β-sheets. This is infeasible using conventional sequence alignment because linear protein sequences do not contain spatial information. β-residue motifs are β-sheet substructures that can be represented as graphs and queried using existing graph indexing methods, however, these approaches are designed for general graphs that do not incorporate the inherent structural constraints of β-sheets and require computationally-expensive filtering and verification procedures. 3D substructure search methods, on the other hand, allow β-residue motifs to be queried in a three-dimensional context but at significant computational costs. FINDINGS: We developed a new method for querying β-residue motifs, called BetaSearch, which leverages the natural planar constraints of β-sheets by indexing them as 2D matrices, thus avoiding much of the computational complexities involved with structural and graph querying. BetaSearch exhibits faster filtering, verification, and overall query time than existing graph indexing approaches whilst producing comparable index sizes. Compared to 3D substructure search methods, BetaSearch achieves 33 and 240 times speedups over index-based and pairwise alignment-based approaches, respectively. Furthermore, we have presented case-studies to demonstrate its capability of motif matching in sequentially dissimilar proteins and described a method for using BetaSearch to predict β-strand pairing. CONCLUSIONS: We have demonstrated that BetaSearch is a fast method for querying substructure motifs. The improvements in speed over existing approaches make it useful for efficiently performing high-volume exploratory querying of possible protein substructural motifs or conformations. BetaSearch was used to identify a nearly identical β-residue motif between an entirely synthetic (Top7) and a naturally-occurring protein (Charcot-Leyden crystal protein), as well as identifying structural similarities between biotin-binding domains of avidin, streptavidin and the lipocalin gamma subunit of human C8.
Annotating the biomedical literature for the human variome
(OXFORD UNIV PRESS, 2013-04-12)
This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the database curators of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is intended to have application to genetic variation information in a range of diseases. The schema has been applied to a small corpus of full text journal publications on the subject of inherited colorectal cancer. We show that the inter-annotator agreement on annotation of this corpus ranges from 0.78 to 0.95 F-score across different entity types when exact matching is measured, and improves to a minimum F-score of 0.87 when boundary matching is relaxed. Relations show more variability in agreement, but several are reliable, with the highest, cohort-has-size, reaching 0.90 F-score. We also explore the relevance of the schema to the InSiGHT database curation process. The schema and the corpus represent an important new resource for the development of text mining solutions that address relationships among patient cohorts, disease and genetic variation, and therefore, we also discuss the role text mining might play in the curation of information related to the human variome. The corpus is available at http://opennicta.com/home/health/variome.
EURO-WABB: an EU rare diseases registry for Wolfram syndrome, Alstrom syndrome and Bardet-Biedl syndrome
BACKGROUND: Wolfram, Alström and Bardet-Biedl (WABB) syndromes are rare diseases with overlapping features of multiple sensory and metabolic impairments, including diabetes mellitus, which have caused diagnostic confusion. There are as yet no specific treatments available, little or no access to well characterized cohorts of patients, and limited information on the natural history of the diseases. We aim to establish a Europe-wide registry for these diseases to inform patient care and research. METHODS: EURO-WABB is an international multicenter large-scale observational study capturing longitudinal clinical and outcome data for patients with WABB diagnoses. Three hundred participants will be recruited over 3 years from different sites throughout Europe. Comprehensive clinical, genetic and patient experience data will be collated into an anonymized disease registry. Data collection will be web-based, and forms part of the project's Virtual Research and Information Environment (VRIE). Participants who haven't undergone genetic diagnostic testing for their condition will be able to do so via the project. CONCLUSIONS: The registry data will be used to increase the understanding of the natural history of WABB diseases, to serve as an evidence base for clinical management, and to aid the identification of opportunities for intervention to stop or delay the progress of the disease. The detailed clinical characterisation will allow inclusion of patients into studies of novel treatment interventions, including targeted interventions in small scale open label studies; and enrolment into multi-national clinical trials. The registry will also support wider access to genetic testing, and encourage international collaborations for patient benefit.
BioLemmatizer: a lemmatization tool for morphological processing of biomedical text
BACKGROUND: The wide variety of morphological variants of domain-specific technical terms contributes to the complexity of performing natural language processing of the scientific literature related to molecular biology. For morphological analysis of these texts, lemmatization has been actively applied in the recent biomedical research. RESULTS: In this work, we developed a domain-specific lemmatization tool, BioLemmatizer, for the morphological analysis of biomedical literature. The tool focuses on the inflectional morphology of English and is based on the general English lemmatization tool MorphAdorner. The BioLemmatizer is further tailored to the biological domain through incorporation of several published lexical resources. It retrieves lemmas based on the use of a word lexicon, and defines a set of rules that transform a word to a lemma if it is not encountered in the lexicon. An innovative aspect of the BioLemmatizer is the use of a hierarchical strategy for searching the lexicon, which enables the discovery of the correct lemma even if the input Part-of-Speech information is inaccurate. The BioLemmatizer achieves an accuracy of 97.5% in lemmatizing an evaluation set prepared from the CRAFT corpus, a collection of full-text biomedical articles, and an accuracy of 97.6% on the LLL05 corpus. The contribution of the BioLemmatizer to accuracy improvement of a practical information extraction task is further demonstrated when it is used as a component in a biomedical text mining system. CONCLUSIONS: The BioLemmatizer outperforms other tools when compared with eight existing lemmatizers. The BioLemmatizer is released as an open source software and can be downloaded from http://biolemmatizer.sourceforge.net.
Detecting modification of biomedical events using a deep parsing approach
BACKGROUND: This work describes a system for identifying event mentions in bio-molecular research abstracts that are either speculative (e.g. analysis of IkappaBalpha phosphorylation, where it is not specified whether phosphorylation did or did not occur) or negated (e.g. inhibition of IkappaBalpha phosphorylation, where phosphorylation did not occur). The data comes from a standard dataset created for the BioNLP 2009 Shared Task. The system uses a machine-learning approach, where the features used for classification are a combination of shallow features derived from the words of the sentences and more complex features based on the semantic outputs produced by a deep parser. METHOD: To detect event modification, we use a Maximum Entropy learner with features extracted from the data relative to the trigger words of the events. The shallow features are bag-of-words features based on a small sliding context window of 3-4 tokens on either side of the trigger word. The deep parser features are derived from parses produced by the English Resource Grammar and the RASP parser. The outputs of these parsers are converted into the Minimal Recursion Semantics formalism, and from this, we extract features motivated by linguistics and the data itself. All of these features are combined to create training or test data for the machine learning algorithm. RESULTS: Over the test data, our methods produce approximately a 4% absolute increase in F-score for detection of event modification compared to a baseline based only on the shallow bag-of-words features. CONCLUSIONS: Our results indicate that grammar-based techniques can enhance the accuracy of methods for detecting event modification.
Retinal Image Matching Using Hierarchical Vascular Features
(HINDAWI LTD, 2011-01-01)
We propose a method for retinal image matching that can be used in image matching for person identification or patient longitudinal study. Vascular invariant features are extracted from the retinal image, and a feature vector is constructed for each of the vessel segments in the retinal blood vessels. The feature vectors are represented in a tree structure with maintaining the vessel segments actual hierarchical positions. Using these feature vectors, corresponding images are matched. The method identifies the same vessel in the corresponding images for comparing the desired feature(s). Initial results are encouraging and demonstrate that the proposed method is suitable for image matching and patient longitudinal study.
Epigenetic Regulation of Cell Type-Specific Expression Patterns in the Human Mammary Epithelium
(PUBLIC LIBRARY SCIENCE, 2011-04-01)
Differentiation is an epigenetic program that involves the gradual loss of pluripotency and acquisition of cell type-specific features. Understanding these processes requires genome-wide analysis of epigenetic and gene expression profiles, which have been challenging in primary tissue samples due to limited numbers of cells available. Here we describe the application of high-throughput sequencing technology for profiling histone and DNA methylation, as well as gene expression patterns of normal human mammary progenitor-enriched and luminal lineage-committed cells. We observed significant differences in histone H3 lysine 27 tri-methylation (H3K27me3) enrichment and DNA methylation of genes expressed in a cell type-specific manner, suggesting their regulation by epigenetic mechanisms and a dynamic interplay between the two processes that together define developmental potential. The technologies we developed and the epigenetically regulated genes we identified will accelerate the characterization of primary cell epigenomes and the dissection of human mammary epithelial lineage-commitment and luminal differentiation.