Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 161
  • Item
    No Preview Available
    TREE-BASED STATISTICAL MACHINE TRANSLATION: EXPERIMENTS WITH THE ENGLISH AND BRAZILIAN PORTUGUESE PAIR
    Beck, D ; Caseli, H (SBIC, 2013)
    Machine Learning paradigms have dominated recent research in Machine Translation. Current state-of-the-art approaches rely only on statistical methods that gather all necessary knowledge from parallel corpora. However, this lack on explicit linguistic knowledge makes them unable to model some linguistic phenomena. In this work, we focus on models that take into account the syntactic information from the languages involved on the translation process. We follow a novel approach that preprocess parallel corpora using syntactic parsers and uses translation models composed by Tree Transducers. We perform experiments with English and Brazilian Portuguese, providing the first known results in syntax-based Statistical Machine Translation for this language pair. These results show that this approach is able to better model phenomena like long-distance reordering and give directions to future improvements in building syntax-based translation models for this pair.
  • Item
    Thumbnail Image
    Plasma lipid profiling in a large population-based cohort
    Weir, JM ; Wong, G ; Barlow, CK ; Greeve, MA ; Kowalczyk, A ; Almasy, L ; Comuzzie, AG ; Mahaney, MC ; Jowett, JBM ; Shaw, J ; Curran, JE ; Blangero, J ; Meikle, PJ (ELSEVIER, 2013-10)
    We have performed plasma lipid profiling using liquid chromatography electrospray ionization tandem mass spectrometry on a population cohort of more than 1,000 individuals. From 10 μl of plasma we were able to acquire comparative measures of 312 lipids across 23 lipid classes and subclasses including sphingolipids, phospholipids, glycerolipids, and cholesterol esters (CEs) in 20 min. Using linear and logistic regression, we identified statistically significant associations of lipid classes, subclasses, and individual lipid species with anthropometric and physiological measures. In addition to the expected associations of CEs and triacylglycerol with age, sex, and body mass index (BMI), ceramide was significantly higher in males and was independently associated with age and BMI. Associations were also observed for sphingomyelin with age but this lipid subclass was lower in males. Lysophospholipids were associated with age and higher in males, but showed a strong negative association with BMI. Many of these lipids have previously been associated with chronic diseases including cardiovascular disease and may mediate the interactions of age, sex, and obesity with disease risk.
  • Item
    Thumbnail Image
    Abstract Interpretation over Non-Lattice Abstract Domains
    Gange, G ; Navas, JA ; Schachte, P ; Søndergaard, H ; Stuckey, PJ ; Logozzo, F ; Fahndrich, M (Springer, 2013)
    The classical theoretical framework for static analysis of programs is abstract interpretation. Much of the power and elegance of that framework rests on the assumption that an abstract domain is a lattice. Nonetheless, and for good reason, the literature on program analysis provides many examples of non-lattice domains, including non-convex numeric domains. The lack of domain structure, however, has negative consequences, both for the precision of program analysis and for the termination of standard Kleene iteration. In this paper we explore these consequences and present general remedies.
  • Item
    Thumbnail Image
    PhenDisco: phenotype discovery system for the database of genotypes and phenotypes.
    Doan, S ; Lin, K-W ; Conway, M ; Ohno-Machado, L ; Hsieh, A ; Feupe, SF ; Garland, A ; Ross, MK ; Jiang, X ; Farzaneh, S ; Walker, R ; Alipanah, N ; Zhang, J ; Xu, H ; Kim, H-E (Oxford University Press (OUP), 2014)
    The database of genotypes and phenotypes (dbGaP) developed by the National Center for Biotechnology Information (NCBI) is a resource that contains information on various genome-wide association studies (GWAS) and is currently available via NCBI's dbGaP Entrez interface. The database is an important resource, providing GWAS data that can be used for new exploratory research or cross-study validation by authorized users. However, finding studies relevant to a particular phenotype of interest is challenging, as phenotype information is presented in a non-standardized way. To address this issue, we developed PhenDisco (phenotype discoverer), a new information retrieval system for dbGaP. PhenDisco consists of two main components: (1) text processing tools that standardize phenotype variables and study metadata, and (2) information retrieval tools that support queries from users and return ranked results. In a preliminary comparison involving 18 search scenarios, PhenDisco showed promising performance for both unranked and ranked search comparisons with dbGaP's search engine Entrez. The system can be accessed at http://pfindr.net.
  • Item
    Thumbnail Image
    Feasibility of using Clinical Element Models (CEM) to standardize phenotype variables in the database of genotypes and phenotypes (dbGaP).
    Lin, K-W ; Tharp, M ; Conway, M ; Hsieh, A ; Ross, M ; Kim, J ; Kim, H-E ; Raghava, GPS (Public Library of Science (PLoS), 2013)
    The database of Genotypes and Phenotypes (dbGaP) contains various types of data generated from genome-wide association studies (GWAS). These data can be used to facilitate novel scientific discoveries and to reduce cost and time for exploratory research. However, idiosyncrasies and inconsistencies in phenotype variable names are a major barrier to reusing these data. We addressed these challenges in standardizing phenotype variables by formalizing their descriptions using Clinical Element Models (CEM). Designed to represent clinical data, CEMs were highly expressive and thus were able to represent a majority (77.5%) of the 215 phenotype variable descriptions. However, their high expressivity also made it difficult to directly apply them to research data such as phenotype variables in dbGaP. Our study suggested that simplification of the template models makes it more straightforward to formally represent the key semantics of phenotype variables.
  • Item
  • Item
    Thumbnail Image
    A proximity-aware load balancing in peer-to-peer-based volunteer computing systems
    Ghafarian, T ; Deldari, H ; Javadi, B ; Buyya, R (SPRINGER, 2013-08)
  • Item
    Thumbnail Image
    A time decoupling approach for studying forum dynamics
    Kan, A ; Chan, J ; Hayes, C ; Hogan, B ; Bailey, J ; Leckie, C (SPRINGER, 2013-11)
  • Item
    Thumbnail Image
    Agent-Based Simulation of Holocene Monsoon Precipitation Patterns and Hunter-Gatherer Population Dynamics in Semi-arid Environments
    Balbo, AL ; Rubio-Campillo, X ; Rondelli, B ; Ramirez, M ; Lancelotti, C ; Torrano, A ; Salpeteur, M ; Lipovetzky, N ; Reyes-Garcia, V ; Montanola, C ; Madella, M (SPRINGER, 2014-06)
  • Item
    Thumbnail Image
    An enhanced XCS rule discovery module using feature ranking
    Abedini, M ; Kirley, M (SPRINGER HEIDELBERG, 2013-06)