Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 234
  • Item
    Thumbnail Image
    Machine Learning Models for Vaccine Development and Immunotherapy
    Moreira da Silva, Bruna ( 2023-11)
    Therapeutic antibodies offer exceptional specificity and affinity to detect and eliminate antigens, making them valuable as therapeutics and in diagnostics. The antigen recognition and neutralisation is based on the efficient binding to epitopes, antigen regions recognised by antibodies that elicit an immune response. The identification and mapping of epitopes, however, are yet dependent on resource-intensive experimental techniques that do not scale adequately given the vast search space and diversity of antigens. Epitope identification and prioritisation is a cornerstone of immunotherapies, antibody design, and vaccine development. Consistent progress of computational approaches has been observed to improve in silico epitope prediction at scale, specifically driven by machine learning algorithms in the past decade. Yet, low predictive power and skewed data sets towards specific pathogens can still be observed. This thesis focused on better exploring publicly available experimental antibody-antigen data, improving modelling and identification of distinguishing epitope features that de- rive meaningful biological insights. On this basis, I have curated high-quality data from multiple resources, resulting in large scale and non-redundant epitope data sets. Besides, I proposed novel featurisation techniques grounded on graph-based approaches to model and discriminate epitopes from the remainder antigen surface, that were demonstrated to differentiate both classes. In addition, I have leveraged machine learning algorithms and data analysis for better predictive and explainable models, which have been translated and made available as easy-to-use web servers with Application Programming Interfaces for programmatic access and integration into Bioinformatics pipelines. By exploring these advanced computational methods, this thesis significantly contributes to improving the prediction of B-cell epitopes, leading to a better understanding of antibody targets, which I believe will facilitate the ongoing development of therapeutics and diagnostics.
  • Item
    Thumbnail Image
    Detection and Analysis of Climate Change Scepticism
    Bhatia, Shraey ( 2024-01)
    Climate change, predominantly driven by human activities, poses a threat through effects like rising sea levels, melting ice caps, extreme droughts, and species extinction.The IPCC’s 5th and 6th reports highlight the urgency of limiting global warming, with the latter projecting a concerning 1.5 degree C rise by 2040. Despite scientific consensus, the digital sphere is inundated with content that fuels scepticism, often sponsored by specific lobby groups. These articles, under the umbrella term of climate change scepticism (CCS), weave a blend of misinformation, propaganda, hoaxes and sensationalism, undermining collective climate action. This thesis aims to offer strategies to address this misleading narrative. In this thesis we probe CCS through 4 dimensions: (1) understanding the underlying themes in the data, (2) detecting CCS articles, (3) understanding and detecting the framing and neutralization tactics used to construct CCS narratives and (4) fact-checking the veracity of claims, elucidating reasons for potential inaccuracies. A notable challenge in addressing the aforementioned tasks is the limited availability of data. Throughout this thesis, we leverage advancements in natural language processing (NLP) to mitigate this. Pre-trained language models (PLMs) and their scaled counterparts, large language models (LLMs), have revolutionized our capacity to comprehend and generate text that mirrors human language. These models, adept at learning from real-world knowledge and semantics from extensive datasets prove extraordinarily effective over a diverse range of language tasks. Topic models distil document collections into key themes, represented by groups of words or “topics” without the need of human labelling or any a priori notion of the content of collection. In essence, they offer a means of exposing underlying themes in the documents.Each document typically aligns with one or several themes, but capturing the essence of the collection’s context remains a challenge. In this thesis, we introduce methods that enhance the quality of topic outputs to better mirror the context of document collections. For detection of CCS articles, there was no dataset available in this domain. We bridge this gap by scraping and compiling a dataset articles known to exhibit climate change scepticism. By extending training of PLMs on this dataset, we enhance their ability to discern stylistic and linguistic elements of CCS which allows the models to not only distinguish between CCS and non-CCS articles but also to highlight misleading spans indicative of scepticism. To delve deeper into the intricacies of CCS narratives, we must analyze their argumentative framing. This is accomplished by employing techniques of framing and neutralization which translates it into a multi-task classification task. We propose an annotation task and collect human judgements. Given that data collection can be resource-intensive, we leverage unlabelled data in a semi-supervised setting achieving substantial performance gains. Finally, we dive into the task of explanation generation to detail the reasons behind a claim’s inaccuracies. Using LLMs in a retrieval-augmented approach, we connect the LLM to an external knowledge source like peer reviewed papers via a retriever. This retriever fetches pertinent “facts” related to the claim, enabling the LLM to both verify and explain the claim grounded to these facts. LLMs are prone to generate ungrounded information, commonly referred to as “hallucinations”. We investigate approaches to detect such inaccuracies, then introduce methods to reduce these hallucinations, and finally employ LLM-based evaluations to assess the quality of the produced content.
  • Item
    Thumbnail Image
    Developing Delirium Prediction Models in Clinical Settings: Integrating Machine Learning and Natural Language Processing with Patient Data, Clinical Notes, and Antipsychotics Data
    Amjad, Sobia ( 2023-12)
    This thesis details the effort to improve delirium identification. Delirium is an acute confusion in 11% to 42% of hospitalised patients and 60-80% of elderly patients. It is a serious neuropsychiatric syndrome frequently underdiagnosed, resulting in extended hospital stays, higher mortality rates and increased health care costs. Furthermore, because it is a cognitive condition, there is no clear definition, leading to high levels of overlap with other neuropsychiatric disorders. Electronic health records (EHR) data has significantly advanced health research, offering automated tools for predictive analysis. The objective is to leverage EHR to develop machine learning models (ML), including those employing natural language processing (NLP) techniques for delirium prediction. The ML model assesses whether a patient will likely receive an ICD-10 code for delirium within 24 to 48 hours of hospital presentation, encouraging timely interventions for patients at risk in the wards. The data collected from tertiary-level hospitals explores the key features that contribute to the performance of predictive models. The study focuses on optimising early delirium prediction, thereby refining the diagnostic process for delirium and enhancing both the timeliness and accuracy of results. The first study, "Machine Learning-Based Delirium Prediction in Hospitalized Patients Using Routinely Available Administrative and Clinical Data," developed models using two hospital data, proving the generalizability of our predictive models. This study worked with data from around 200,000 patients. The challenge was that delirium-positive cases were rare, less than 5\%. Thus, we directed our research towards using language in clinical notes to identify delirium. The second study, "Advancing Delirium Classification: A Clinical Notes-based Natural Language Processing-Supported Machine Learning Model," applied NLP to classify delirium at the clinical notes level. This method allowed us to explore delirium-suggestive words within the notes, thus enhancing classification accuracy beyond traditional dictionary-based approaches. After applying methods on structured clinical data and notes, we implemented those methods for the patient-level delirium prediction. Also, we incorporated antipsychotic data into our prediction model in the third study entitled "EHR-Based Delirium Prediction: A Unified Data-driven Model with Clinical Data, Notes, and Antipsychotics". However, just like we faced a challenge with the rare cases of delirium in our first project, we also dealt with the issue of having limited data on antipsychotics and notes. Despite these challenges, we managed to navigate the complex nature of healthcare data. We developed a model that leverages all available data from the initial 24 to 48 hours of hospital presentation, which delivered promising results. Our study also looked carefully at how to use drug-related information without letting it create bias in our delirium predictions. Classification techniques included Logistic Regression, Extreme gradient boosting, Support Vector Machines, and Random Forests. The Logistic regression model is discussed in main studies due to its common use in medical research and acceptance within the broader artificial intelligence community. We employed visualizations to explain predictions and highlight the transparency of our predictive models. Moreover, we presented visual graphs for each study to explain the model and demonstrate their reliability for delirium prediction to gain the trust of healthcare professionals.
  • Item
    Thumbnail Image
    Machine Learning Models for Vaccine Development and Immunotherapy
    Moreira da Silva, Bruna ( 2023-11)
    Therapeutic antibodies offer exceptional specificity and affinity to detect and eliminate antigens, making them valuable as therapeutics and in diagnostics. The antigen recognition and neutralisation is based on the efficient binding to epitopes, antigen regions recognised by antibodies that elicit an immune response. The identification and mapping of epitopes, however, are yet dependent on resource-intensive experimental techniques that do not scale adequately given the vast search space and diversity of antigens. Epitope identification and prioritisation is a cornerstone of immunotherapies, antibody design, and vaccine development. Consistent progress of computational approaches has been observed to improve in silico epitope prediction at scale, specifically driven by machine learning algorithms in the past decade. Yet, low predictive power and skewed data sets towards specific pathogens can still be observed. This thesis focused on better exploring publicly available experimental antibody-antigen data, improving modelling and identification of distinguishing epitope features that derive meaningful biological insights. On this basis, I have curated high-quality data from multiple resources, resulting in large scale and non-redundant epitope data sets. Besides, I proposed novel featurisation techniques grounded on graph-based approaches to model and discriminate epitopes from the remainder antigen surface, that were demonstrated to differentiate both classes. In addition, I have leveraged machine learning algorithms and data analysis for better predictive and explainable models, which have been translated and made available as easy-to-use web servers with Application Programming Interfaces for programmatic access and integration into Bioinformatics pipelines. By exploring these advanced computational methods, this thesis significantly contributes to improving the prediction of B-cell epitopes, leading to a better understanding of antibody targets, which I believe will facilitate the ongoing development of therapeutics and diagnostics.
  • Item
    Thumbnail Image
    Planning and Goal Recognition in Humans and Machines
    Zhang, Chenyuan ( 2023-12)
    The rapid advancement of artificial intelligence, exemplified by systems such as AlphaGo and large language models, has great potential to contribute to the development of human-like intelligence. However, fundamental differences exist between the underlying mechanisms of these systems and those of biological organisms. For instance, humans can achieve impressive performance with limited data and computing resources, while existing algorithms often require significant amounts of data and computing power for real-time operations. One of the reasons for this disparity is the human ability to plan in a model-based sense, making computational models that can capture human planning behavior valuable to bridge the gap between existing AI systems and human-like intelligence. This thesis explores the effectiveness of planning algorithms in modeling human behavior. Existing literature often overlooks timing information, and I develop a novel tree-based model that aims to capture both human action selection and human reaction times. The thesis also introduces a timing-sensitive goal recognition framework that incorporates timing information, and uses this framework to model human goal inference. My findings indicate that a Bayesian framework that incorporates a prior based on goal difficulty and a likelihood derived from an online planner accurately predicts human goal inference. This thesis underscores the promise of planning algorithms in mimicking human behavior and their utility in human-robot collaborations. More generally, it suggests that planning algorithms have an important role to play in advancing human-like intelligence.
  • Item
    Thumbnail Image
    Lexical Semantics of the Long Tail
    Wada, Takashi ( 2023-12)
    Natural language data is characterised by containing a variety of long-tail instances. For instance, whilst there exists an abundance of text data on the web for major languages such as English, there is a dearth of data for a great number of minor languages. Furthermore, when we look at the corpus data in each language, it usually consists of a very small number of high-frequency words and a plethora of long-tail expressions that are not commonly used in text, such as scientific jargon and multiword expressions. Generally, those long-tail instances draw little attention from the research community, largely because they often have a biased interest in a handful of resource-rich languages and models' overall performance on a specific task, which is, in many cases, not heavily influenced by the long-tail instances in text. In this thesis, we aim to shed light on the long-tail instances in language and explore NLP models that represent their lexical semantics effectively. In particular, we focus on the three types of long-tail instances, namely, extremely low-resource languages, rare words, and multiword expressions. Firstly, for extremely low-resource languages, we propose a new cross-lingual word embedding model that works well with very limited data, and show its effectiveness on the task of aligning semantically equivalent words between high- and low-resource languages. For evaluation, we conduct experiments that involve three endangered languages, namely Yongning Na, Shipibo-Konibo and Griko, and demonstrate that our model performs well on real-world language data. Secondly, with regard to rare words, we first investigate how well recent embedding models can capture lexical semantics in general on lexical substitution, where given a target word in context, a model is tasked with retrieving its synonymous words. To this end, we propose a new lexical substitution method that effectively makes use of existing embedding models, and show that it performs very well on English and Italian, especially for retrieving low-frequency substitutes. We also reveal a couple of limitations of current embedding models: (1) they are highly affected by morphophonetic and morphosyntactic biases, such as article–noun agreement in English and Italian; and (2) they often represent rare words poorly when they are segmented into multiple subwords. To address the second limitation, we propose a new method that performs very well in predicting synonyms of rare words, and demonstrate its effectiveness on lexical substitution and simplification. Lastly, to represent multiword expressions (MWEs) effectively, we propose a new method that paraphrases MWEs with more literal expressions that are easier to understand, e.g. swan song with final performance. Compared to previous approaches that resort to human-crafted resources such as dictionaries, our model is fully unsupervised and relies on monolingual data only, making it applicable to resource-poor languages. For evaluation, we perform experiments in two high-resource languages (English and Portuguese) and one low-resource language (Galician), and demonstrate that our model generates high-quality paraphrases of MWEs in all languages, and aids pre-trained sentence embedding models to encode sentences that contain MWEs by paraphrasing them with literal expressions.
  • Item
    Thumbnail Image
    Computational modeling of the epidemiological dynamics of the skin pathogens Group A Streptococcus and Sarcoptes scabiei
    Tellioglu, Nefel ( 2023-11)
    Sarcoptes scabiei is a skin pathogen that causes substantial health burdens in humans. An estimated 455 million people are affected by scabies annually, resulting in an estimate of 3.8 million disability-adjusted life years annually. Scratching from scabies can result in further bacterial skin infections including Group A Streptococcus (GAS) infections which increases the burden of scabies. GAS infections can lead to severe health conditions such as acute rheumatic fever and rheumatic heart disease. Each year, around 18 million people worldwide suffer from severe GAS-related diseases, resulting in 500,000 deaths. Sarcoptes scabiei and Group A Streptococcus are endemic in many underprivileged populations such as Indigenous communities in Australia. A number of factors are likely to play a role in the high burden of skin pathogens in these settings including heterogeneities in the pathogen population (pathogens having multiple strains with varying characteristics) and host population (populations with varying disease prevalence and transmission rate). While these factors make it difficult to manage disease burden, computational models can help us to understand transmission mechanisms as well as control health burden. In this thesis, I focus on Sarcoptes scabiei and GAS and aim to understand the underlying transmission mechanisms of these skin pathogens and to provide insights into the efficacy of community-specific control strategies to reduce the disease burden using computational modelling. I focus on three key research questions in which I investigate the impact of pathogen and host heterogeneities on disease transmission and identify effective control strategies using computational models. Controlling the spread of pathogens with multiple strains can be challenging due to the strain interactions. It is uncertain how strain interactions play a role in the persistence of high strain diversity in endemic settings and what that implied for future interventions. As my first research question, I focused on “What role do within-host dynamics play in maintaining high diversity of pathogen strains?”. I developed an individual-based model with a synthetic population representing the characteristics of Indigenous communities in Australia. I discovered that within-host competition among strains can impact epidemiological dynamics. My findings revealed that within-host and between-host competition reduces strain diversity when operating independently. However, when they function together, they could significantly increase the diversity of strains. My model suggested that an intervention that reduces the transmission of all the strains had the potential to later increase the level of pathogen diversity, complicating the efficacy of further interventions. In addition, I discussed how this modelling framework can be adapted to investigate the impact of GAS strain interactions on population-level dynamics. To apply mass drug administrations in the areas that need them the most, it is essential to estimate the prevalence of scabies at the community level. Currently, there is no standardisation of approaches to estimate scabies prevalence. Given that prevalence and transmission mechanisms differ among communities, there is a need to thoroughly understand how sampling procedures aiming to asses scabies prevalence interplay with epidemiology. As my second research question, I focused on “Which sampling strategies - individual, household, or school-based - are most effective for estimating the prevalence of scabies in a population?”. I developed another computational model and explored the effectiveness of sampling methods to estimate prevalence of scabies in remote Indigenous communities in Australia. I found that when there is an underlying household-specific heterogeneity in scabies prevalence, the household sampling approach introduces more variance than simple random sampling. I concluded that while the simple random sampling approach seems to be more effective than other sampling methods in estimating scabies prevalence, the efficacy of surveillance strategies depends on how prevalence is distributed within the community. In addition, I built a table for use of future surveillance studies to estimate the sampling percentage based on population size, an accuracy threshold and a priori knowledge of scabies prevalence. To reduce scabies burden in communities with high endemic levels of scabies, three to five annual mass drug administration rounds are recommended by the experts convened by the World Health Organization. Because current guidelines are only based on expert opinions, WHO recommends quantitative evaluations to assess the likely efficacy of MDA recommendations. As my third research question, I focused on “Which mass drug administration strategy is most effective for controlling scabies?”. I developed an individual-based model to evaluate the efficacy of mass drug administration strategies in decreasing burden of scabies in Liberia. I found that while MDAs can be effective in the short and medium term, prevalence will rise over longer time periods until it reaches pre-MDA levels. The modelling results also indicated that low level of scabies prevalence can be sustained long-term when MDAs are combined with behavioral and systemic changes, such as improvements in education and access to the health care system, that shorten the time involved in effective scabies treatment. In this thesis, I conclude that understanding the complex dynamics of skin pathogens remains a challenging problem because of the heterogeneities in host and pathogen populations. While this thesis provides practical results for controlling skin pathogens, it also highlights the need for developing pathogen-specific and community-specific models to reduce the burden of skin pathogens.
  • Item
    Thumbnail Image
    Generalization Lessons from Biomedical Relation Extraction using Pretrained Transformer Models
    Elangovan, Aparna ( 2023-12)
    Curating structured knowledge for storing in biomedical knowledge databases, requires human experts to annotate relationships, thus making maintenance of these databases expensive and difficult to scale to the large quantities of information presented in scientific publications. It is challenging to ensure that the information is comprehensive and up-to-date. Hence, we investigate the generalization capabilities of state-of-the-art natural language processing (NLP) techniques to automate relation extraction to aid human curation. In NLP, deep learning-based architectures, in particular pretrained transformer models with millions of parameters enabling them to achieve state of the art (SOTA) performance, have been dominating leaderboards on public benchmark datasets, usually achieved by fine-tuning pretrained transformer models on the target dataset task. In our research, we investigate the generalizability of such SOTA models – fine-tuned pretrained transformer models – in biomedical relation extraction for real-world applications where the performance expectations of these models need to be applicable beyond the official test sets. While our experiments focus on the current SOTA models, our findings have broader implications on generalization of NLP models and their performance evaluations. We ask the following research questions: 1. How generalizable are fine-tuned pretrained transformer models in biomedical relation extraction? 2. What factors lead to poor generalizability despite high test set performance of fine-tuned pretrained transformer models? 3. How can we improve qualitative aspects of the training data to improve real-world generalization performance of fine-tuned pretrained transformer models? The contributions are: 1) We identify a large performance gap compared to the test set when a SOTA fine-tuned pretrained transformer model is applied at large scale. This substantial generalization gap has neither been verified nor reported in prior large scale biomedical relation extraction studies. 2) We identify that high similarity between training and test sets, even with random splits, can result in inflated performance measurements. We suggest stratifying the test set based on similarity to the training set to provide a more effective interpretation of the results to understand memorization versus generalization capabilities of a model. Furthermore, we also find that the fine-tuned pretrained transformer models appear to rely on spurious correlations that are present in both training and test sets, obtaining inflated test set performance. 3) We also find that, for a given quantity of training data, qualitative aspects can boost performance when fine-tuning pretrained transformers. More specifically, we find that incorporating training samples that are quite similar to one another but have different ground truth labels – we call them human-adversarials – in low to moderate proportions can boost generalization performance by up to 20 points on fine-tuned pretrained transformer models such as BERT, BioBERT and RoBERTa. On the other hand, training samples that are quite similar to one another and have the same ground truth labels – we call them human-affables – can potentially degrade generalization performance. We thus demonstrate that merely aiming for higher quantities of training data is not sufficient to improve generalization. 4) As a result of our findings 1 & 2, we propose to the NLP community that confirming linguistic capabilities as the cause of performance gains even within the context of the test set is crucial to generalization, adapting generalization principles from clinical studies. We thus advocate for effective test sets & evaluation strategies, including adapting concepts such as randomized controlled trials from clinical studies for NLP to establish causation, as our experiments demonstrate how a test set constructed using the standard practice of random splits may not be sufficient to measure generalization capabilities of a model. Overall, in this thesis, we closely examine model generalization and aim to strengthen how machine learning models are evaluated. While we do so in the context of biomedical relation extraction, where generalizability is critical – our findings are applicable to the entire field of machine learning model evaluation in NLP.
  • Item
    Thumbnail Image
    Autonomous Resource Management for Serverless Computing
    Mampage, Anupama Ruwanthika ( 2023-11)
    Serverless computing is gaining momentum as the latest cloud deployment model for applications, with many major global companies shifting towards a complete adoption of this new computing paradigm. As the name implies, the burden of server management is non-existent to the end user under this model, with the cloud vendor taking full responsibility of infrastructure management. The users simply deploy the logic of their application in the form of code segments called 'functions', with a rough estimate on the resource requirements and the rest is taken care of by the serverless platform. This greatly reduces the time-to-market and upfront costs for client products with no expertise required in initial server configurations and subsequent operations. The rapid auto-scalability feature allows customers to scale their businesses fast, without the need for any prior infrastructure requirement planning. The pay-per-use billing model enables a fair ground for applications with spontaneous fluctuations in traffic loads. However, the 'serverless' nature to the end user unequivocally leaves the entire set of end to end server management responsibilities with the cloud vendor. In contrast to conventional Infrastructure-as-a-Service (IaaS) cloud model, where the provider only handles the physical infrastructure maintenance, now the complete virtual machine maintenance is also part of the provider service offering. Moreover, unlike the users managing their allocated resources for the execution of their own applications, the cloud provider has to undertake the same set of tasks with far lesser knowledge available to them. Not only do they have to successfully manage the infrastructure for an application belonging to a separate party, but they need to accommodate the needs of thousands of user applications on the same shared platform, considering the impact of their co-resident behaviors as well. Thus this is a tremendous feat for cloud vendors if accomplished to the satisfaction of all the parties involved. Further, while auto-scaling and granular billing features are vastly favorable to end users, the cloud vendors are at a grave disadvantage if measures are not taken to maintain high efficiency in their underlying resources. Hence, having proper techniques for resource management in place which are capable of meeting all of these challenges is of utmost importance. The existing commercial and open-source serverless platforms mostly follow primitive resource management policies at the moment, which have a vast potential to be optimized. Further, although there is a huge interest in the research community in this area of study, most of the existing works are limited in their generalizability to be adaptable to practical serverless computing environments, considering the multi-tenant and rapidly changing nature of these systems. Moreover, a vast majority of these works are negligent on the service provider perspective of their offered solutions, which is a key factor determining the probable adoption of the same in vendor platforms. This thesis investigates novel techniques for the smooth running of all resource handling operations including resource provisioning, resource scheduling and scaling, which are dynamic and intelligent enough to handle the complexities of this computing environment. Proposed approaches strive to gain a thorough understanding on the behavior of the serverless computing infrastructure, along with the different application workloads, in developing strategies beneficial for both end users and cloud vendors. In essence, this thesis advances the state-of-the-art by making the following contributions: - A comprehensive taxonomy and literature review on the aspect of resource management in serverless computing environments, along with a discussion on identified research gaps, laying the ground work for future research work. - A dynamic resource management and a function request placement technique for meeting user specific application requirements and maintaining high resource efficiency. - A Deep Reinforcement Learning (DRL) based workload and system aware technique for scheduling applications in resource-constrained, multi-tenant serverless computing environments, along with flexibility in achieving a desirable level of application performance and resource cost optimization. - A framework for horizontally and vertically scaling allocated resources to function instances based on a multi-agent DRL model for elevating application performance while maintaining provider resource cost efficiency. - A deployment and scheduling approach for applications in a hybrid serverless and serverful environment based on DRL, with the aim of reducing application latency and incurred user cost. - A detailed discussion outlining challenges and the potential for future work for the efficient use of serverless computing environments.
  • Item
    Thumbnail Image
    Lazy Constraint Generation and Tractable Approximations for Large Scale Planning Problems
    Singh, Anubhav ( 2023-12)
    In our research, we explore two orthogonal but related methodologies of solving planning instances: planning algorithms based on direct but lazy, incremental heuristic search over transition systems and planning as satisfiability. We address numerous challenges associated with solving large planning instances within practical time and memory constraints. This is particularly relevant when solving real-world problems, which often have numeric domains and resources and, therefore, have a large ground representation of the planning instance. Our first contribution is an approximate novelty search, which introduces two novel methods. The first approximates novelty via sampling and Bloom filters, and the other approximates the best-first search using an adaptive policy that decides whether to forgo the expansion of nodes in the open list. For our second work, we present an encoding of the partial order causal link (POCL) formulation of the temporal planning problems into a CP model that handles the instances with required concurrency, which cannot be solved using sequential planners. Our third significant contribution is on lifted sequential planning with lazy constraint generation, which scales very well on large instances with numeric domains and resources. Lastly, we propose a novel way of using novelty approximation as a polynomial reachability propagator, which we use to train the activity heuristics used by the CP solvers.