School of BioSciences - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 6 of 6
  • Item
    Thumbnail Image
    Forecasting species range dynamics with process-explicit models: matching methods to applications
    Briscoe, NJ ; Elith, J ; Salguero-Gomez, R ; Lahoz-Monfort, JJ ; Camac, JS ; Giljohann, KM ; Holden, MH ; Hradsky, BA ; Kearney, MR ; McMahon, SM ; Phillips, BL ; Regan, TJ ; Rhodes, JR ; Vesk, PA ; Wintle, BA ; Yen, JDL ; Guillera-Arroita, G ; Early, R (WILEY, 2019-11)
    Knowing where species occur is fundamental to many ecological and environmental applications. Species distribution models (SDMs) are typically based on correlations between species occurrence data and environmental predictors, with ecological processes captured only implicitly. However, there is a growing interest in approaches that explicitly model processes such as physiology, dispersal, demography and biotic interactions. These models are believed to offer more robust predictions, particularly when extrapolating to novel conditions. Many process-explicit approaches are now available, but it is not clear how we can best draw on this expanded modelling toolbox to address ecological problems and inform management decisions. Here, we review a range of process-explicit models to determine their strengths and limitations, as well as their current use. Focusing on four common applications of SDMs - regulatory planning, extinction risk, climate refugia and invasive species - we then explore which models best meet management needs. We identify barriers to more widespread and effective use of process-explicit models and outline how these might be overcome. As well as technical and data challenges, there is a pressing need for more thorough evaluation of model predictions to guide investment in method development and ensure the promise of these new approaches is fully realised.
  • Item
    Thumbnail Image
    Model averaging in ecology: a review of Bayesian, information-theoretic, and tactical approaches for predictive inference
    Dormann, CF ; Calabrese, JM ; Guillera-Arroita, G ; Matechou, E ; Bahn, V ; Barton, K ; Beale, CM ; Ciuti, S ; Elith, J ; Gerstner, K ; Guelat, J ; Keil, P ; Lahoz-Monfort, JJ ; Pollock, LJ ; Reineking, B ; Roberts, DR ; Schroeder, B ; Thuiller, W ; Warton, DI ; Wintle, BA ; Wood, SN ; Wuest, RO ; Hartig, F (WILEY, 2018-11)
    Abstract In ecology, the true causal structure for a given problem is often not known, and several plausible models and thus model predictions exist. It has been claimed that using weighted averages of these models can reduce prediction error, as well as better reflect model selection uncertainty. These claims, however, are often demonstrated by isolated examples. Analysts must better understand under which conditions model averaging can improve predictions and their uncertainty estimates. Moreover, a large range of different model averaging methods exists, raising the question of how they differ in their behaviour and performance. Here, we review the mathematical foundations of model averaging along with the diversity of approaches available. We explain that the error in model‐averaged predictions depends on each model's predictive bias and variance, as well as the covariance in predictions between models, and uncertainty about model weights. We show that model averaging is particularly useful if the predictive error of contributing model predictions is dominated by variance, and if the covariance between models is low. For noisy data, which predominate in ecology, these conditions will often be met. Many different methods to derive averaging weights exist, from Bayesian over information‐theoretical to cross‐validation optimized and resampling approaches. A general recommendation is difficult, because the performance of methods is often context dependent. Importantly, estimating weights creates some additional uncertainty. As a result, estimated model weights may not always outperform arbitrary fixed weights, such as equal weights for all models. When averaging a set of models with many inadequate models, however, estimating model weights will typically be superior to equal weights. We also investigate the quality of the confidence intervals calculated for model‐averaged predictions, showing that they differ greatly in behaviour and seldom manage to achieve nominal coverage. Our overall recommendations stress the importance of non‐parametric methods such as cross‐validation for a reliable uncertainty quantification of model‐averaged predictions.
  • Item
    Thumbnail Image
    A standard protocol for reporting species distribution models
    Zurell, D ; Franklin, J ; Koenig, C ; Bouchet, PJ ; Dormann, CF ; Elith, J ; Fandos, G ; Feng, X ; Guillera-Arroita, G ; Guisan, A ; Lahoz-Monfort, JJ ; Leitao, PJ ; Park, DS ; Peterson, AT ; Rapacciuolo, G ; Schmatz, DR ; Schroeder, B ; Serra-Diaz, JM ; Thuiller, W ; Yates, KL ; Zimmermann, NE ; Merow, C (WILEY, 2020-09)
    Species distribution models (SDMs) constitute the most common class of models across ecology, evolution and conservation. The advent of ready‐to‐use software packages and increasing availability of digital geoinformation have considerably assisted the application of SDMs in the past decade, greatly enabling their broader use for informing conservation and management, and for quantifying impacts from global change. However, models must be fit for purpose, with all important aspects of their development and applications properly considered. Despite the widespread use of SDMs, standardisation and documentation of modelling protocols remain limited, which makes it hard to assess whether development steps are appropriate for end use. To address these issues, we propose a standard protocol for reporting SDMs, with an emphasis on describing how a study's objective is achieved through a series of modeling decisions. We call this the ODMAP (Overview, Data, Model, Assessment and Prediction) protocol, as its components reflect the main steps involved in building SDMs and other empirically‐based biodiversity models. The ODMAP protocol serves two main purposes. First, it provides a checklist for authors, detailing key steps for model building and analyses, and thus represents a quick guide and generic workflow for modern SDMs. Second, it introduces a structured format for documenting and communicating the models, ensuring transparency and reproducibility, facilitating peer review and expert evaluation of model quality, as well as meta‐analyses. We detail all elements of ODMAP, and explain how it can be used for different model objectives and applications, and how it complements efforts to store associated metadata and define modelling standards. We illustrate its utility by revisiting nine previously published case studies, and provide an interactive web‐based application to facilitate its use. We plan to advance ODMAP by encouraging its further refinement and adoption by the scientific community.
  • Item
    Thumbnail Image
    Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models
    Hao, T ; Elith, J ; Lahoz-Monfort, JJ ; Guillera-Arroita, G (WILEY, 2020-04)
    Predictive performance is important to many applications of species distribution models (SDMs). The SDM ‘ensemble’ approach, which combines predictions across different modelling methods, is believed to improve predictive performance, and is used in many recent SDM studies. Here, we aim to compare the predictive performance of ensemble species distribution models to that of individual models, using a large presence–absence dataset of eucalypt tree species. To test model performance, we divided our dataset into calibration and evaluation folds using two spatial blocking strategies (checkerboard‐pattern and latitudinal slicing). We calibrated and cross‐validated all models within the calibration folds, using both repeated random division of data (a common approach) and spatial blocking. Ensembles were built using the software package ‘biomod2’, with standard (‘untuned’) settings. Boosted regression tree (BRT) models were also fitted to the same data, tuned according to published procedures. We then used evaluation folds to compare ensembles against both their component untuned individual models, and against the BRTs. We used area under the receiver‐operating characteristic curve (AUC) and log‐likelihood for assessing model performance. In all our tests, ensemble models performed well, but not consistently better than their component untuned individual models or tuned BRTs across all tests. Moreover, choosing untuned individual models with best cross‐validation performance also yielded good external performance, with blocked cross‐validation proving better suited for this choice, in this study, than repeated random cross‐validation. The latitudinal slice test was only possible for four species; this showed some individual models, and particularly the tuned one, performing better than ensembles. This study shows no particular benefit to using ensembles over individual tuned models. It also suggests that further robust testing of performance is required for situations where models are used to predict to distant places or environments.
  • Item
    No Preview Available
    Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure
    Roberts, DR ; Bahn, V ; Ciuti, S ; Boyce, MS ; Elith, J ; Guillera-Arroita, G ; Hauenstein, S ; Lahoz-Monfort, JJ ; Schroeder, B ; Thuiller, W ; Warton, DI ; Wintle, BA ; Hartig, F ; Dormann, CF (WILEY, 2017-08)
    Ecological data often show temporal, spatial, hierarchical (random effects), or phylogenetic structure. Modern statistical approaches are increasingly accounting for such dependencies. However, when performing cross‐validation, these structures are regularly ignored, resulting in serious underestimation of predictive error. One cause for the poor performance of uncorrected (random) cross‐validation, noted often by modellers, are dependence structures in the data that persist as dependence structures in model residuals, violating the assumption of independence. Even more concerning, because often overlooked, is that structured data also provides ample opportunity for overfitting with non‐causal predictors. This problem can persist even if remedies such as autoregressive models, generalized least squares, or mixed models are used. Block cross‐validation, where data are split strategically rather than randomly, can address these issues. However, the blocking strategy must be carefully considered. Blocking in space, time, random effects or phylogenetic distance, while accounting for dependencies in the data, may also unwittingly induce extrapolations by restricting the ranges or combinations of predictor variables available for model training, thus overestimating interpolation errors. On the other hand, deliberate blocking in predictor space may also improve error estimates when extrapolation is the modelling goal. Here, we review the ecological literature on non‐random and blocked cross‐validation approaches. We also provide a series of simulations and case studies, in which we show that, for all instances tested, block cross‐validation is nearly universally more appropriate than random cross‐validation if the goal is predicting to new data or predictor space, or for selecting causal predictors. We recommend that block cross‐validation be used wherever dependence structures exist in a dataset, even if no correlation structure is visible in the fitted model residuals, or if the fitted models account for such correlations.
  • Item
    No Preview Available
    Is my species distribution model fit for purpose? Matching data and models to applications
    Guillera-Arroita, G ; Lahoz-Monfort, JJ ; Elith, J ; Gordon, A ; Kujala, H ; Lentini, PE ; McCarthy, MA ; Tingley, R ; Wintle, BA (WILEY, 2015-03)
    Abstract Species distribution models (SDMs) are used to inform a range of ecological, biogeographical and conservation applications. However, users often underestimate the strong links between data type, model output and suitability for end‐use. We synthesize current knowledge and provide a simple framework that summarizes how interactions between data type and the sampling process (i.e. imperfect detection and sampling bias) determine the quantity that is estimated by a SDM. We then draw upon the published literature and simulations to illustrate and evaluate the information needs of the most common ecological, biogeographical and conservation applications of SDM outputs. We find that, while predictions of models fitted to the most commonly available observational data (presence records) suffice for some applications, others require estimates of occurrence probabilities, which are unattainable without reliable absence records. Our literature review and simulations reveal that, while converting continuous SDM outputs into categories of assumed presence or absence is common practice, it is seldom clearly justified by the application's objective and it usually degrades inference. Matching SDMs to the needs of particular applications is critical to avoid poor scientific inference and management outcomes. This paper aims to help modellers and users assess whether their intended SDM outputs are indeed fit for purpose.