Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 293
  • Item
    No Preview Available
    TREE-BASED STATISTICAL MACHINE TRANSLATION: EXPERIMENTS WITH THE ENGLISH AND BRAZILIAN PORTUGUESE PAIR
    Beck, D ; Caseli, H (SBIC, 2013)
    Machine Learning paradigms have dominated recent research in Machine Translation. Current state-of-the-art approaches rely only on statistical methods that gather all necessary knowledge from parallel corpora. However, this lack on explicit linguistic knowledge makes them unable to model some linguistic phenomena. In this work, we focus on models that take into account the syntactic information from the languages involved on the translation process. We follow a novel approach that preprocess parallel corpora using syntactic parsers and uses translation models composed by Tree Transducers. We perform experiments with English and Brazilian Portuguese, providing the first known results in syntax-based Statistical Machine Translation for this language pair. These results show that this approach is able to better model phenomena like long-distance reordering and give directions to future improvements in building syntax-based translation models for this pair.
  • Item
    Thumbnail Image
    Plasma lipid profiling in a large population-based cohort
    Weir, JM ; Wong, G ; Barlow, CK ; Greeve, MA ; Kowalczyk, A ; Almasy, L ; Comuzzie, AG ; Mahaney, MC ; Jowett, JBM ; Shaw, J ; Curran, JE ; Blangero, J ; Meikle, PJ (ELSEVIER, 2013-10)
    We have performed plasma lipid profiling using liquid chromatography electrospray ionization tandem mass spectrometry on a population cohort of more than 1,000 individuals. From 10 μl of plasma we were able to acquire comparative measures of 312 lipids across 23 lipid classes and subclasses including sphingolipids, phospholipids, glycerolipids, and cholesterol esters (CEs) in 20 min. Using linear and logistic regression, we identified statistically significant associations of lipid classes, subclasses, and individual lipid species with anthropometric and physiological measures. In addition to the expected associations of CEs and triacylglycerol with age, sex, and body mass index (BMI), ceramide was significantly higher in males and was independently associated with age and BMI. Associations were also observed for sphingomyelin with age but this lipid subclass was lower in males. Lysophospholipids were associated with age and higher in males, but showed a strong negative association with BMI. Many of these lipids have previously been associated with chronic diseases including cardiovascular disease and may mediate the interactions of age, sex, and obesity with disease risk.
  • Item
    Thumbnail Image
    Abstract Interpretation over Non-Lattice Abstract Domains
    Gange, G ; Navas, JA ; Schachte, P ; Søndergaard, H ; Stuckey, PJ ; Logozzo, F ; Fahndrich, M (Springer, 2013)
    The classical theoretical framework for static analysis of programs is abstract interpretation. Much of the power and elegance of that framework rests on the assumption that an abstract domain is a lattice. Nonetheless, and for good reason, the literature on program analysis provides many examples of non-lattice domains, including non-convex numeric domains. The lack of domain structure, however, has negative consequences, both for the precision of program analysis and for the termination of standard Kleene iteration. In this paper we explore these consequences and present general remedies.
  • Item
    Thumbnail Image
    Feasibility of using Clinical Element Models (CEM) to standardize phenotype variables in the database of genotypes and phenotypes (dbGaP).
    Lin, K-W ; Tharp, M ; Conway, M ; Hsieh, A ; Ross, M ; Kim, J ; Kim, H-E ; Raghava, GPS (Public Library of Science (PLoS), 2013)
    The database of Genotypes and Phenotypes (dbGaP) contains various types of data generated from genome-wide association studies (GWAS). These data can be used to facilitate novel scientific discoveries and to reduce cost and time for exploratory research. However, idiosyncrasies and inconsistencies in phenotype variable names are a major barrier to reusing these data. We addressed these challenges in standardizing phenotype variables by formalizing their descriptions using Clinical Element Models (CEM). Designed to represent clinical data, CEMs were highly expressive and thus were able to represent a majority (77.5%) of the 215 phenotype variable descriptions. However, their high expressivity also made it difficult to directly apply them to research data such as phenotype variables in dbGaP. Our study suggested that simplification of the template models makes it more straightforward to formally represent the key semantics of phenotype variables.
  • Item
    Thumbnail Image
    A highly optimized algorithm for continuous intersection join queries over moving objects
    Zhang, R ; Qi, J ; Lin, D ; Wang, W ; Wong, RC-W (SPRINGER, 2012-08)
  • Item
    Thumbnail Image
    A proximity-aware load balancing in peer-to-peer-based volunteer computing systems
    Ghafarian, T ; Deldari, H ; Javadi, B ; Buyya, R (SPRINGER, 2013-08)
  • Item
    Thumbnail Image
    A time decoupling approach for studying forum dynamics
    Kan, A ; Chan, J ; Hayes, C ; Hogan, B ; Bailey, J ; Leckie, C (SPRINGER, 2013-11)
  • Item
    Thumbnail Image
    An enhanced XCS rule discovery module using feature ranking
    Abedini, M ; Kirley, M (SPRINGER HEIDELBERG, 2013-06)
  • Item
    Thumbnail Image
    Automatic keyphrase extraction from scientific articles
    Kim, SN ; Medelyan, O ; Kan, M-Y ; Baldwin, T (SPRINGER, 2013-09)
  • Item
    Thumbnail Image
    Conservative scales in packing problems
    Belov, G ; Kartak, VM ; Rohling, H ; Scheithauer, G (SPRINGER, 2013-03)