diff --git a/CITATION.cff b/CITATION.cff index 0f858091..bccec7c8 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -8,7 +8,7 @@ message: 'To cite package "ibis.iSDM" in publications use:' type: software license: CC-BY-4.0 title: 'ibis.iSDM: Modelling framework for integrated biodiversity distribution scenarios' -version: 0.0.5 +version: 0.0.6 abstract: Integrated framework of modelling the distribution of species and ecosystems in a suitability framing. This package allows the estimation of integrated species distribution models (iSDM) based on several sources of evidence and provided presence-only @@ -64,7 +64,7 @@ references: - family-names: Hesselbarth given-names: Maximilian H.K. year: '2023' - version: 0.0.4 + version: 0.0.5 - type: software title: assertthat abstract: 'assertthat: Easy Pre and Post Assertions' @@ -217,17 +217,6 @@ references: given-names: David email: dpierce@ucsd.edu year: '2023' -- type: software - title: ncmeta - abstract: 'ncmeta: Straightforward ''NetCDF'' Metadata' - notes: Imports - url: https://github.com/hypertidy/ncmeta - repository: https://CRAN.R-project.org/package=ncmeta - authors: - - family-names: Sumner - given-names: Michael - email: mdsumner@gmail.com - year: '2023' - type: software title: parallel abstract: 'R: A Language and Environment for Statistical Computing' @@ -239,6 +228,35 @@ references: year: '2023' institution: name: R Foundation for Statistical Computing +- type: software + title: Matrix + abstract: 'Matrix: Sparse and Dense Matrix Classes and Methods' + notes: Imports + url: https://R-forge.R-project.org/tracker/?atid=294&group_id=61 + repository: https://CRAN.R-project.org/package=Matrix + authors: + - family-names: Bates + given-names: Douglas + orcid: https://orcid.org/0000-0001-8316-9503 + - family-names: Maechler + given-names: Martin + email: mmaechler+Matrix@gmail.com + orcid: https://orcid.org/0000-0002-8685-9910 + - family-names: Jagan + given-names: Mikael + orcid: https://orcid.org/0000-0002-3542-2938 + year: '2023' +- type: software + title: ncmeta + abstract: 'ncmeta: Straightforward ''NetCDF'' Metadata' + notes: Imports + url: https://github.com/hypertidy/ncmeta + repository: https://CRAN.R-project.org/package=ncmeta + authors: + - family-names: Sumner + given-names: Michael + email: mdsumner@gmail.com + year: '2023' - type: software title: posterior abstract: 'posterior: Tools for Working with Posterior Distributions' @@ -427,6 +445,7 @@ references: given-names: Fabian E. email: bachlfab@gmail.com year: '2023' + version: '>= 2.6.0' - type: software title: glmnet abstract: 'glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models' @@ -450,6 +469,7 @@ references: - family-names: Yang given-names: James year: '2023' + version: '>= 4.1' - type: software title: glmnetUtils abstract: 'glmnetUtils: Utilities for ''Glmnet''' @@ -693,6 +713,42 @@ references: given-names: Andrew year: '2023' version: '>= 2.1.1' +- type: software + title: gnlm + abstract: 'gnlm: Generalized Nonlinear Regression Models' + notes: Suggests + url: http://www.commanster.eu/rcode.html + repository: https://CRAN.R-project.org/package=gnlm + authors: + - family-names: Swihart + given-names: Bruce + email: bruce.swihart@gmail.com + - family-names: Lindsey + given-names: Jim + email: jlindsey@gen.unimaas.nl + year: '2023' +- type: software + title: geosphere + abstract: 'geosphere: Spherical Trigonometry' + notes: Suggests + url: https://github.com/rspatial/geosphere/issues/ + repository: https://CRAN.R-project.org/package=geosphere + authors: + - family-names: Hijmans + given-names: Robert J. + email: r.hijmans@gmail.com + year: '2023' +- type: software + title: cubelyr + abstract: 'cubelyr: A Data Cube ''dplyr'' Backend' + notes: Suggests + url: https://github.com/hadley/cubelyr + repository: https://CRAN.R-project.org/package=cubelyr + authors: + - family-names: Wickham + given-names: Hadley + email: hadley@rstudio.com + year: '2023' - type: software title: testthat abstract: 'testthat: Unit Testing for R' @@ -704,6 +760,7 @@ references: given-names: Hadley email: hadley@posit.co year: '2023' + version: '>= 3.0.0' - type: software title: xgboost abstract: 'xgboost: Extreme Gradient Boosting' diff --git a/DESCRIPTION b/DESCRIPTION index a190485e..8f332cc5 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Package: ibis.iSDM Type: Package Title: Modelling framework for integrated biodiversity distribution scenarios -Version: 0.0.5 +Version: 0.0.6 Authors@R: c(person(given = "Martin", family = "Jung", @@ -30,6 +30,8 @@ Imports: graphics, methods, ncdf4, + parallel, + Matrix, ncmeta, parallel, posterior, @@ -63,7 +65,9 @@ Suggests: rstan (>= 2.21.0), rstantools (>= 2.1.1), gnlm, - testthat, + geosphere, + cubelyr, + testthat (>= 3.0.0), xgboost URL: https://iiasa.github.io/ibis.iSDM/ BugReports: https://github.com/iiasa/ibis.iSDM/issues @@ -98,6 +102,7 @@ Collate: 'add_priors.R' 'bdproto-engine.R' 'bdproto-settings.R' + 'check.R' 'utils-spatial.R' 'data.R' 'distribution.R' @@ -113,6 +118,7 @@ Collate: 'engine_xgboost.R' 'ensemble.R' 'ibis.iSDM-package.R' + 'limiting.R' 'misc.R' 'partial.R' 'plot.R' diff --git a/NAMESPACE b/NAMESPACE index 11297b02..3ef04b5b 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -83,6 +83,7 @@ export(add_pseudoabsence) export(alignRasters) export(as.Id) export(bivplot) +export(check) export(distribution) export(emptyraster) export(engine_bart) @@ -106,11 +107,13 @@ export(is.Raster) export(is.Waiver) export(is.formula) export(is.stars) +export(limiting) export(load_model) export(myLog) export(new_id) export(new_waiver) export(partial) +export(partial_density) export(posterior_predict_stanfit) export(predictor_derivate) export(predictor_filter) @@ -172,10 +175,12 @@ exportMethods(add_predictor_range) exportMethods(add_predictors) exportMethods(add_predictors_globiom) exportMethods(add_priors) +exportMethods(check) exportMethods(distribution) exportMethods(ensemble) exportMethods(ensemble_partial) exportMethods(get_priors) +exportMethods(limiting) exportMethods(load_model) exportMethods(priors) exportMethods(project) @@ -193,7 +198,6 @@ exportMethods(validate) exportMethods(write_model) exportMethods(write_output) exportMethods(write_summary) -import(posterior) import(sf) import(terra) importFrom(foreach,"%do%") diff --git a/NEWS.md b/NEWS.md index 0171207e..b95f493a 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,12 +1,30 @@ +# ibis.iSDM 0.0.6 (current dev branch) + +### New features +* `partial_density` function implemented #57 +* Re-specification of limits with implementation of minimum convex polygon limits to `distribution`. +* Added `check` function for assessing assumptions and fits for various objects #45 +* Added minor internal helper functions to duplicate `stars` objects via `st_rep`. +* Implemented local limiting factor function (`limiting`) #37 + +### Minor improvements and bug fixes +* Further smaller documentation fixes towards a CRAN submission #38 +* Bug fix to method `buffer` in pseudo-absence settings. +* Minor bug fixes to `ensemble` uncertainty calculations. + # ibis.iSDM 0.0.5 +### New features * Addition of 5 parameter logistic curve offsets with parameter search to `add_offset`. + +### Minor improvements and bug fixes * Further smaller documentation fixes towards a CRAN submission #38 * Bug with with `write_model`, now converting `terra` objects to `data.frames` between import/export. * Smaller bug fixes, for example in `similarity`, addition of variable name sanitization to predictors by default. # ibis.iSDM 0.0.4 +### Minor improvements and bug fixes * Smaller bug fixes with regards to writing outputs and adding pseudo-absences. * Added short convenience function to convert prediction outputs #48 * Converted from `raster` to `terra` #17 @@ -14,13 +32,17 @@ # ibis.iSDM 0.0.3 +### New features +* Aded Boruta for iterative feature selection of predictor variables. + +### Minor improvements and bug fixes * Removed Magittr dependency #41 * Smaller improvements to documentation and removing of CRAN preventing function calls. * Made the separation from hyperparameter search functions clearer and added new option to filter highly correlated covariates via `train`. -* Added Boruta for iterative feature selection of predictor variables. # ibis.iSDM 0.0.2 +### Minor improvements and bug fixes * Smaller documentation fixes, including to make sure examples and returns are in all exported function documentations. * Preparation for cran release #38, including fixing some common issues and checks. * Some smaller bug fixes to `validate` to make Boyce more robust. diff --git a/R/add_biodiversity.R b/R/add_biodiversity.R index ce8c0c6d..d93f9764 100644 --- a/R/add_biodiversity.R +++ b/R/add_biodiversity.R @@ -6,7 +6,7 @@ NULL #' @description This function adds a presence-only biodiversity dataset to a #' distribution object. #' @param x [distribution()] (i.e. [`BiodiversityDistribution-class`]) object. -#' @param poipo A [`data.frame`], [`sf`] or [`Spatial`]) object of presence-only point occurrences. +#' @param poipo A [`data.frame`] or [`sf`] object of presence-only point occurrences. #' @param name The name of the biodiversity dataset used as internal identifier. #' @param field_occurrence A [`numeric`] or [`character`] location of biodiversity point records. #' @param formula A [`character`] or [`formula`] object to be passed. Default is to use all covariates (if specified). @@ -14,9 +14,9 @@ NULL #' @param link A [`character`] to overwrite the default link function (Default: \code{NULL}). #' @param weight A [`numeric`] value acting as a multiplier with regards to any weights used in the modelling. #' Larger weights indicate higher weighting relative to any other datasets. By default set to \code{1} if only -#' one dataset is added. A [`vector`] is also supported but must be of the same length as [`poipo`]. +#' one dataset is added. A [`vector`] is also supported but must be of the same length as \code{"poipo"}. #' **Note: Weights are reformated to the inverse for models with area offsets (e.g. 5 is converted to 1/5).** -#' @param separate_intercept A [`boolean`] value stating whether a separate intercept is to be added in +#' @param separate_intercept A [`logical`] value stating whether a separate intercept is to be added in #' shared likelihood models for engines [engine_inla], [engine_inlabru] and [engine_stan]. Otherwise ignored. #' @param docheck [`logical`] on whether additional checks should be performed (e.g. intersection tests) (Default: \code{TRUE}). #' @param pseudoabsence_settings Either \code{NULL} or a [`pseudoabs_settings()`] created settings object. @@ -33,10 +33,11 @@ NULL #' * Guisan A. and Zimmerman N. 2000. Predictive habitat distribution models in ecology. Ecol. Model. 135: 147–186. #' * Renner, I. W., J. Elith, A. Baddeley, W. Fithian, T. Hastie, S. J. Phillips, G. Popovic, and D. I. Warton. 2015. Point process models for presence-only analysis. Methods in Ecology and Evolution 6:366–379. #' @seealso -#' See other functions for adding biodiversity data, e.g. [add_biodiversity] +#' See other functions for adding biodiversity data, i.e. [`add_biodiversity_poipa`] #' @family add_biodiversity #' @returns Adds biodiversity data to [distribution] object. #' @keywords biodiversity +#' @aliases add_biodiversity_poipo #' @examples #' \dontrun{ #' background <- terra::rast("inst/extdata/europegrid_50km.tif") @@ -62,7 +63,7 @@ methods::setGeneric( #' @name add_biodiversity_poipo #' @rdname add_biodiversity_poipo -#' @usage \S4method{add_biodiversity_poipo}{BiodiversityDistribution,sf}(x, poipo) +#' @usage \S4method{add_biodiversity_poipo}{BiodiversityDistribution,sf,ANY,character,ANY,character,ANY,numeric,logical,logical,ANY}(x,poipo,name,field_occurrence,formula,family,link,weight,separate_intercept,docheck,pseudoabsence_settings,...) methods::setMethod( "add_biodiversity_poipo", methods::signature(x = "BiodiversityDistribution", poipo = "sf"), @@ -144,10 +145,10 @@ methods::setMethod( #' #' @details #' By default, the logit link function is used in a logistic regression setting -#' unless the specific [engine] does not support generalised linear regressions (e.g. [engine_bart]). +#' unless the specific engine does not support generalised linear regressions (e.g. [engine_bart]). #' #' @param x [distribution()] (i.e. [`BiodiversityDistribution-class`]) object. -#' @param poipa A [`data.frame`], [`sf`] or [`Spatial`]) object of presence-absence point occurrences. +#' @param poipa A [`data.frame`] or [`sf`] object of presence-absence point occurrences. #' @param name The name of the biodiversity dataset used as internal identifier. #' @param field_occurrence A [`numeric`] or [`character`] location of biodiversity point records indicating presence/absence. #' By default set to \code{"Observed"} and an error will be thrown if a [`numeric`] column with that name does not exist. @@ -156,14 +157,16 @@ methods::setMethod( #' @param link A [`character`] to overwrite the default link function (Default: \code{NULL}). #' @param weight A [`numeric`] value acting as a multiplier with regards to any weights used in the modelling. #' Larger weights indicate higher weighting relative to any other datasets. By default set to \code{1} if only -#' one dataset is added. A [`vector`] is also supported but must be of the same length as [`poipa`]. -#' @param separate_intercept A [`boolean`] value stating whether a separate intercept is to be added in. +#' one dataset is added. A [`vector`] is also supported but must be of the same length as +#' parameter \code{"poipa"}. +#' @param separate_intercept A [`logical`] value stating whether a separate intercept is to be added in. #' shared likelihood models for engines [engine_inla], [engine_inlabru] and [engine_stan]. #' @param docheck [`logical`] on whether additional checks should be performed (e.g. intersection tests) (Default: \code{TRUE}). #' @param ... Other parameters passed down. #' #' @family add_biodiversity #' @keywords biodiversity +#' @aliases add_biodiversity_poipa #' @references #' * Renner, I. W., J. Elith, A. Baddeley, W. Fithian, T. Hastie, S. J. Phillips, G. Popovic, and D. I. Warton. 2015. Point process models for presence-only analysis. Methods in Ecology and Evolution 6:366–379. #' * Guisan A. and Zimmerman N. 2000. Predictive habitat distribution models in ecology. Ecol. Model. 135: 147–186. @@ -189,7 +192,7 @@ methods::setGeneric( #' @name add_biodiversity_poipa #' @rdname add_biodiversity_poipa -#' @usage \S4method{add_biodiversity_poipa}{BiodiversityDistribution,sf}(x, poipa) +#' @usage \S4method{add_biodiversity_poipa}{BiodiversityDistribution,sf,character,character,ANY,character,character,numeric,logical,logical}(x,poipa,name,field_occurrence,formula,family,link,weight,separate_intercept,docheck) methods::setMethod( "add_biodiversity_poipa", methods::signature(x = "BiodiversityDistribution", poipa = "sf"), @@ -264,7 +267,7 @@ methods::setMethod( #' some engines particular through the way that points are generated. #' #' @param x [distribution()] (i.e. [`BiodiversityDistribution-class`]) object. -#' @param polpo A [`sf`] or [`Spatial`]) polygon object of presence-only occurrences. +#' @param polpo A [`sf`] polygon object of presence-only occurrences. #' @param name The name of the biodiversity dataset used as internal identifier. #' @param field_occurrence A [`numeric`] or [`character`] location of biodiversity point records. #' @param formula A [`character`] or [`formula`] object to be passed. Default is to use all covariates (if specified). @@ -272,14 +275,14 @@ methods::setMethod( #' @param link A [`character`] to overwrite the default link function (Default: \code{NULL}). #' @param weight A [`numeric`] value acting as a multiplier with regards to any weights used in the modelling. #' Larger weights indicate higher weighting relative to any other datasets. By default set to \code{1} if only -#' one dataset is added. A [`vector`] is also supported but must be of the same length as [`polpo`]. +#' one dataset is added. A [`vector`] is also supported but must be of the same length as \code{"polpo"}. #' @param simulate Simulate poipo points within its boundaries. Result are passed to [`add_biodiversity_poipo`] (Default: \code{FALSE}). #' @param simulate_points A [`numeric`] number of points to be created by simulation (Default: \code{100}). #' @param simulate_bias A [`SpatRaster`] layer describing an eventual preference for simulation (Default: \code{NULL}). #' @param simulate_strategy A [`character`] stating the strategy for sampling. Can be set to either. -#' \code{'random'} or \code{'regular'}, the latter requiring a raster supplied in the [simulate_weights] +#' \code{'random'} or \code{'regular'}, the latter requiring a raster supplied in the \code{'simulate_weights'} #' parameter. -#' @param separate_intercept A [`boolean`] value stating whether a separate intercept is to be added in +#' @param separate_intercept A [`logical`] value stating whether a separate intercept is to be added in #' shared likelihood models for engines [engine_inla], [engine_inlabru] and [engine_stan]. #' @param docheck [`logical`] on whether additional checks should be performed (e.g. intersection tests) (Default: \code{TRUE}). #' @param pseudoabsence_settings Either \code{NULL} or a [`pseudoabs_settings()`] created settings object. @@ -295,6 +298,7 @@ methods::setMethod( #' For an integration of range data as predictor or offset, see [`add_predictor_range()`] and [`add_offset_range()`] instead. #' @family add_biodiversity #' @keywords biodiversity +#' @aliases add_biodiversity_polpo #' @returns Adds biodiversity data to [distribution] object. #' #' @examples @@ -318,7 +322,7 @@ methods::setGeneric( #' @name add_biodiversity_polpo #' @rdname add_biodiversity_polpo -#' @usage \S4method{add_biodiversity_polpo}{BiodiversityDistribution,sf}(x, polpo) +#' @usage \S4method{add_biodiversity_polpo}{BiodiversityDistribution,sf,ANY,character,ANY,character,ANY,numeric,logical,numeric,ANY,character,logical,logical,ANY}(x,polpo,name,field_occurrence,formula,family,link,weight,simulate,simulate_points,simulate_bias,simulate_strategy,separate_intercept,docheck,pseudoabsence_settings,...) methods::setMethod( "add_biodiversity_polpo", methods::signature(x = "BiodiversityDistribution", polpo = "sf"), @@ -440,7 +444,7 @@ methods::setMethod( #' species is absent. #' #' @param x [distribution()] (i.e. [`BiodiversityDistribution-class`]) object. -#' @param polpa A [`sf`] or [`Spatial`]) polygon object of presence-absence occurrences. +#' @param polpa A [`sf`] polygon object of presence-absence occurrences. #' @param name The name of the biodiversity dataset used as internal identifier. #' @param field_occurrence A [`numeric`] or [`character`] location of biodiversity point records. #' @param formula A [`character`] or [`formula`] object to be passed. Default is to use all covariates (if specified). @@ -448,14 +452,14 @@ methods::setMethod( #' @param link A [`character`] to overwrite the default link function (Default: \code{NULL}). #' @param weight A [`numeric`] value acting as a multiplier with regards to any weights used in the modelling. #' Larger weights indicate higher weighting relative to any other datasets. By default set to \code{1} if only -#' one dataset is added. A [`vector`] is also supported but must be of the same length as [`polpa`]. +#' one dataset is added. A [`vector`] is also supported but must be of the same length as \code{"polpa"}. #' @param simulate Simulate poipa points within its boundaries. Result are passed to [`add_biodiversity_poipa`] (Default: \code{FALSE}). #' @param simulate_points A [`numeric`] number of points to be created by simulation. #' @param simulate_bias A [`SpatRaster`] layer describing an eventual preference for simulation (Default: \code{NULL}). #' @param simulate_strategy A [`character`] stating the strategy for sampling. Can be set to either. -#' \code{'random'} or \code{'regular'}, the latter requiring a raster supplied in the [simulate_weights] +#' \code{'random'} or \code{'regular'}, the latter requiring a raster supplied in the \code{'simulate_weights'} #' parameter. -#' @param separate_intercept A [`boolean`] value stating whether a separate intercept is to be added in +#' @param separate_intercept A [`logical`] value stating whether a separate intercept is to be added in #' shared likelihood models for engines [engine_inla], [engine_inlabru] and [engine_stan]. #' @param docheck [`logical`] on whether additional checks should be performed (e.g. intersection tests) (Default: \code{TRUE}). #' @param pseudoabsence_settings Either \code{NULL} or a [`pseudoabs_settings()`] created settings object. @@ -470,6 +474,7 @@ methods::setMethod( #' For an integration of range data as predictor or offset, see [`add_predictor_range()`] and [`add_offset_range()`] instead. #' @family add_biodiversity #' @keywords biodiversity +#' @aliases add_biodiversity_polpa #' @returns Adds biodiversity data to [distribution] object. #' #' @examples @@ -493,7 +498,7 @@ methods::setGeneric( #' @name add_biodiversity_polpa #' @rdname add_biodiversity_polpa -#' @usage \S4method{add_biodiversity_polpa}{BiodiversityDistribution, sf}(x, polpa) +#' @usage \S4method{add_biodiversity_polpa}{BiodiversityDistribution,sf,ANY,character,ANY,character,ANY,numeric,logical,numeric,ANY,character,logical,logical,ANY}(x,polpa,name,field_occurrence,formula,family,link,weight,simulate,simulate_points,simulate_bias,simulate_strategy,separate_intercept,docheck,pseudoabsence_settings,...) methods::setMethod( "add_biodiversity_polpa", methods::signature(x = "BiodiversityDistribution", polpa = "sf"), @@ -638,13 +643,14 @@ methods::setMethod( #' Format biodiversity dataset to standardized format #' -#' @param x A [`data.frame`], [`sf`] or [`Spatial`]) object of biodiversity information. +#' @param x A [`data.frame`] or [`sf`] object of biodiversity information. #' @param field_occurrence A [`numeric`] or [`character`] location of biodiversity records. #' @param field_space A [`vector`] on the column names (Default: \code{'x'}, \code{'y'}). #' @param ... Other parameters passed down. #' #' @import sf #' @name format_biodiversity_data +#' @aliases format_biodiversity_data #' @rdname format_biodiversity_data #' @keywords internal #' @noRd @@ -680,7 +686,6 @@ format_biodiversity_data <- function(x, field_occurrence, field_space = c("x","y tibble::as_tibble() } else { if(inherits(x, "Spatial")) x <- sf::st_as_sf(x) # First convert to sf - #if(inherits(x,'sf')) coords <- sf::st_coordinates(x) |> tibble::as_tibble() if(unique(sf::st_geometry_type(x)) %in% c("POINT","MULTIPOINT")){ # Take target column and append coordinates to it diff --git a/R/add_constraint.R b/R/add_constraint.R index 2dd8be88..c36c9a06 100644 --- a/R/add_constraint.R +++ b/R/add_constraint.R @@ -1,673 +1,685 @@ -#' @include utils.R bdproto-biodiversityscenario.R -NULL - -#' Add a constraint to an existing \code{scenario} -#' -#' @description This function adds a constrain to a [`BiodiversityScenario-class`] object to -#' constrain (future) projections. These constrains can for instance be constraints on a possible -#' dispersal distance, connectivity between identified patches or limitations on species adaptability. -#' **Most constrains require pre-calculated thresholds to present in the [`BiodiversityScenario-class`] object!** -#' @param mod A [`BiodiversityScenario`] object with specified predictors. -#' @param method A [`character`] indicating the type of constraints to be added to the scenario. See details for more -#' information. -#' @param value For many dispersal [`constrain`] this is set as [`numeric`] value specifying a -#' fixed constrain or constant in units \code{"m"} (Default: \code{NULL}). For kissmig the value needs to -#' give the number of iteration steps (or within year migration steps). -#' For adaptability constraints this parameter specifies the extent (in units of standard deviation) to which extrapolations -#' should be performed. -#' @param type A [`character`] indicating the type used in the method. See for instance \code{`kissmig`}. -#' @param layer A [`SpatRaster`] object that can be used for boundary constraints (Default: \code{NULL}). -#' @param pext [`numeric`] indicator for \code{`kissmig`} of the probability a colonized cell becomes uncolonised, -#' i.e., the species gets locally extinct (Default: \code{0.1}). -#' @param pcor [`numeric`] probability that corner cells are considered in the 3x3 neighbourhood (Default: \code{0.2}). -#' @param ... passed on parameters. See also the specific methods for adding constraints. -#' -#' @seealso [`add_constraint_dispersal`], [`add_constraint_connectivity`], [`add_constraint_adaptability`], [`add_constraint_boundary`] -#' @details -#' Constraints can be added to scenario objects to increase or decrease the suitability of a given area for the -#' target feature. This function acts as a wrapper to add these constraints. -#' Currently supported are the following options: -#' **Dispersal**: -#' * \code{sdd_fixed} - Applies a fixed uniform dispersal distance per modelling timestep. -#' * \code{sdd_nexpkernel} - Applies a dispersal distance using a negative exponential kernel from its origin. -#' * \code{kissmig} - Applies the kissmig stochastic dispersal model. Requires \code{`kissmig`} package. Applied at each modelling time step. -#' * \code{migclim} - Applies the dispersal algorithm MigClim to the modelled objects. Requires [`MigClim`] package. -#' -#' A comprehensive overview of the benefits of including dispersal constrains in species distribution models -#' can be found in Bateman et al. (2013). -#' -#' **Connectivity**: -#' * \code{hardbarrier} - Defines a hard barrier to any dispersal events. By definition this sets all values larger -#' than \code{0} in the barrier layer to \code{0} in the projection. Barrier has to be provided through the \code{"resistance"} -#' parameter. -#' * \code{resistance} - Allows the provision of a static or dynamic layer that is multiplied with the projection at each -#' time step. Can for example be used to reduce the suitability of any given area (using pressures not included in the model). -#' The respective layer(s) have to be provided through the \code{"resistance"} parameter. Provided layers are incorporated as -#' \code{abs(resistance - 1)} and multiplied with the prediction. -#' -#' **Adaptability**: -#' * \code{nichelimit} - Specifies a limit on the environmental niche to only allow a modest amount of extrapolation beyond the known occurrences. This -#' can be particular useful to limit the influence of increasing marginal responses and avoid biologically unrealistic projections. -#' -#' **Boundary**: -#' * \code{boundary} - Applies a hard boundary constraint on the projection, thus disallowing an expansion of a range outside -#' the provide layer. Similar as specifying projection limits (see [`distribution`]), but can be used to specifically -#' constrain a projection within a certain area (e.g. a species range or an island). -#' -#' @returns Adds constraints data to a [`BiodiversityScenario`] object. -#' @references -#' * Bateman, B. L., Murphy, H. T., Reside, A. E., Mokany, K., & VanDerWal, J. (2013). Appropriateness of full‐, partial‐and no‐dispersal scenarios in climate change impact modelling. Diversity and Distributions, 19(10), 1224-1234. -#' * Nobis MP and Normand S (2014) KISSMig - a simple model for R to account for limited migration in analyses of species distributions. Ecography 37: 1282-1287. -#' * Mendes, P., Velazco, S. J. E., de Andrade, A. F. A., & Júnior, P. D. M. (2020). Dealing with overprediction in species distribution models: How adding distance constraints can improve model accuracy. Ecological Modelling, 431, 109180. -#' @examples -#' \dontrun{ -#' # Assumes that a trained 'model' object exists -#' mod <- scenario(model) |> -#' add_predictors(env = predictors, transform = 'scale', derivates = "none") |> -#' add_constrain_dispersal(method = "kissmig", value = 2, pext = 0.1) |> -#' project() -#' } -#' @name add_constraint -#' @family constraint -#' @aliases add_constraint -#' @keywords scenario -#' @exportMethod add_constraint -#' @export -NULL -methods::setGeneric("add_constraint", - signature = methods::signature("mod"), - function(mod, method, ...) standardGeneric("add_constraint")) - -#' @name add_constraint -#' @rdname add_constraint -#' @usage \S4method{add_constraint}{BiodiversityScenario, character}(mod, method) -methods::setMethod( - "add_constraint", - methods::signature(mod = "BiodiversityScenario"), - function(mod, method, ...) { - assertthat::assert_that( - inherits(mod, "BiodiversityScenario"), - !is.Waiver(mod$get_predictors()), - is.character(method) - ) - # Match method - method <- match.arg(arg = method, - choices = c("sdd_fixed", "sdd_nexpkernel", "kissmig", "migclim", - "hardbarrier","resistance", "boundary", - "nichelimit"), several.ok = FALSE) - - # Now call the respective functions individually - o <- switch(method, - # Fixed dispersal - "sdd_fixed" = add_constraint_dispersal(mod, method = "sdd_fixed", ...), - # Short-distance dispersal - "sdd_nexpkernel" = add_constraint_dispersal(mod, method = "sdd_nexpkernel", ...), - # Add kissmig dispersal - "kissmig" = add_constraint_dispersal(mod, method = "kissmig", ...), - # Using the migclim package - "migclim" = add_constraint_dispersal(mod, method = "migclim", ...), - # --- # - "hardbarrier" = add_constraint_connectivity(mod, method = "hardbarrier", ...), - # --- # - "resistance" = add_constraint_connectivity(mod, method = "resistance", ...), - # --- # - "nichelimit" = add_constraint_adaptability(mod, method = "nichelimit", ...), - # --- # - "boundary" = add_constraint_boundary(mod, ...) - ) - return(o) - } -) - -# ------------------------ # -#### Dispersal constraints #### - -#' @title Adds a dispersal constrain to a scenario object. -#' @name add_constraint_dispersal -#' @aliases add_constraint_dispersal -#' @inheritParams add_constraint -#' @family constraint -#' @keywords scenario -#' @exportMethod add_constraint_dispersal -#' @export -NULL -methods::setGeneric("add_constraint_dispersal", - signature = methods::signature("mod"), - function(mod, method, value = NULL, type = NULL, ...) standardGeneric("add_constraint_dispersal")) - -#' @name add_constraint_dispersal -#' @rdname add_constraint_dispersal -#' @usage \S4method{add_constraint_dispersal}{BiodiversityScenario, character, numeric}(mod, method, value) -methods::setMethod( - "add_constraint_dispersal", - methods::signature(mod = "BiodiversityScenario"), - function(mod, method, value = NULL, type = NULL, ...){ - assertthat::assert_that( - inherits(mod, "BiodiversityScenario"), - !is.Waiver(mod$get_predictors()), - is.character(method), - is.null(value) || is.numeric(value), - is.null(type) || is.character(type) - ) - # Match method - method <- match.arg(arg = method, - choices = c("sdd_fixed", "sdd_nexpkernel", "kissmig", "migclim"), several.ok = FALSE) - - # Other arguments supplied - dots <- list(...) - argnames <- names(dots) - - # Check if there is already a dispersal constrain, if yes raise warning - if(!is.Waiver(mod$get_constraints())){ - # If there are any dispersal constrains in there, raise warning - if(any("dispersal" %in% names(mod$get_constraints()))){ - if(getOption('ibis.setupmessages')) myLog('[Estimation]', 'yellow', 'Overwriting existing dispersal constraint.') - } - } - - # Add processing method # - # --- # - cr <- list() - if(method == "sdd_fixed"){ - # Short-distance dispersal (Fixed) - assertthat::assert_that( - is.numeric(value), msg = "Fixed short distance dispersal needs an annual mean disperal distance value." - ) - cr[['dispersal']] <- list(method = method, - params = c("mean_dispersal_distance" = value)) - } else if(method == "sdd_nexpkernel") { - # Negative exponential kernel - assertthat::assert_that( - is.numeric(value), msg = "Short distance negative exponential kernal dispersal needs an annual mean disperal distance value." - ) - cr[['dispersal']] <- list(method = method, - params = c("mean_dispersal_distance" = value)) - } else if(method == "kissmig"){ - # Check parameters to be correct - check_package("kissmig") - # Gather some default parameters - if(is.null(type)) type <- "DIS" else match.arg(type, c("DIS", "FOC", "LOC", "NOC"), several.ok = FALSE) - assertthat::assert_that( - is.numeric(value), - value > 0, msg = "For kissmig the value needs to give the number of iteration steps (or within time migration steps)." - ) - # probability [0,1] a colonized cell becomes uncolonized between iteration steps, i.e., the species gets locally extinct - if("pext" %in% argnames) pext <- dots[["pext"]] else pext <- 0.1 - # probability [0,1] corner cells are considered in the 3x3 cell neighborhood. Following Nobis & Nomand 2014, 0.2 is recommended for circular spread - if("pcor" %in% argnames) pcor <- dots[["pcor"]] else pcor <- 0.2 - - if(getOption('ibis.setupmessages')) myLog('[Estimation]', 'green', 'KISSMIG options: iterations=',value,'| pext=', pext,'| pcor=', pcor) - - cr[['dispersal']] <- list(method = method, - params = c("iteration" = value, - "type" = type, - "signed" = FALSE, - "pext" = pext, - "pcor" = pcor - )) - - } - if(method == "migclim"){ - # Using the MigClim package for calculating any transitions and - # This requires prior calculated Thresholds! - out <- add_constraint_MigClim(mod = mod, ...) - return(out) - } else { - # --- # - new <- mod$set_constraints(cr) - return( - bdproto(NULL, new) - ) - } - - } -) - -#' Short-distance fixed dispersal function -#' @param baseline_threshold The [`SpatRaster`] with presence/absence information from a previous year. -#' @param new_suit A new [`SpatRaster`] object. -#' @param value A [`numeric`] value of the fixed dispersal threshold. In unit \code{'meters'}. -#' @param resistance A resistance [`SpatRaster`] object with values to be omitted during distance calculation (Default: \code{NULL}). -#' @noRd -#' @keywords internal -.sdd_fixed <- function(baseline_threshold, new_suit, value, resistance = NULL){ - assertthat::assert_that( - is.Raster(baseline_threshold), is.Raster(new_suit), - is_comparable_raster(baseline_threshold, new_suit), - is.numeric(value), - is.null(resistance) || is.Raster(resistance), - # Check that baseline threshold raster is binomial - length(unique(baseline_threshold))==2 - ) - - # Set resistance layer to 0 if set to zero. - if(is.Raster(resistance)){ - baseline_threshold[resistance == 1] <- 2 - # Set resistance to the value omitted - resistance <- 2 - baseline_threshold <- terra::mask(baseline_threshold, resistance) - } - # Grow baseline raster by the amount of value at max - # Furthermore divide by value to get a normalized distance - dis <- terra::gridDist(baseline_threshold, target = 1) - ras_dis <- terra::clamp(dis, lower = 0, upper = value) / value - # Invert - ras_dis <- abs(ras_dis - 1) - - # Now multiply the net suitability projection with this mask - # Thus removing any grid cells outside - out <- new_suit * ras_dis - return(out) -} - -#' Short-distance negative exponential kernel dispersal function -#' @param baseline_threshold The [`SpatRaster`] with presence/absence information from a previous year. -#' @param new_suit A new [`SpatRaster`] object. -#' @param value A [`numeric`] value of the fixed dispersal threshold. In unit \code{'meters'}. -#' @param normalize Should a normalising constant be used for the exponential dispersal parameter (Default: \code{FALSE}). -#' @param resistance A resistance [`SpatRaster`] object with values to be omitted during distance calculation (Default: \code{NULL}). -#' @noRd -#' @keywords internal -.sdd_nexpkernel <- function(baseline_threshold, new_suit, value, normalize = FALSE, resistance = NULL){ - assertthat::assert_that( - is.Raster(baseline_threshold), is.Raster(new_suit), - is_comparable_raster(baseline_threshold, new_suit), - is.numeric(value), - is.null(resistance) || is.Raster(resistance), - # Check that baseline threshold raster is binomial - length(unique(baseline_threshold)[,1])==2 - ) - - # Set resistance layer to 0 if set to zero. - if(is.Raster(resistance)){ - baseline_threshold[resistance == 1] <- 2 - # Set resistance to the value omitted - resistance <- 2 - baseline_threshold <- terra::mask(baseline_threshold, resistance) - } - - # Inverse of mean dispersal distance - alpha <- 1/value - - # Grow baseline raster by using an exponentially weighted kernel - ras_dis <- terra::gridDist(baseline_threshold, target = 1) - if(normalize){ - # Normalized (with a constant) negative exponential kernel - ras_dis <- terra::app(ras_dis, fun = function(x) (1 / (2 * pi * value ^ 2)) * exp(-x / value) ) - } else { - ras_dis <- terra::app(ras_dis, fun = function(x) exp(-alpha * x)) - } - - # Now multiply the net suitability projection with this mask - # Thus removing any non-suitable grid cells (0) and changing the value of those within reach - out <- new_suit * ras_dis - return(out) -} - -#' Keep it simple migration calculation. -#' @param baseline_threshold The [`SpatRaster`] with presence/absence information from a previous year. -#' @param new_suit A new [`SpatRaster`] object. -#' @param params A [vector] or [list] with passed on parameter values. -#' @param resistance A resistance [`SpatRaster`] object with values to be omitted during distance calculation (Default: \code{NULL}). -#' @noRd -#' @keywords internal -.kissmig_dispersal <- function(baseline_threshold, new_suit, params, resistance = NULL){ - assertthat::assert_that( - is.Raster(baseline_threshold), is.Raster(new_suit), - is_comparable_raster(baseline_threshold, new_suit), - is.vector(params) || is.list(params), - is.null(resistance) || is.logical(resistance) || is.Raster(resistance), - # Check that baseline threshold raster is binomial - length(unique(baseline_threshold))==2 - ) - - check_package('kissmig') - if(!isNamespaceLoaded("kissmig")) { attachNamespace("kissmig");requireNamespace("kissmig") } - - # Set suitability layer to 0 if resistance layer is set - if(is.Raster(resistance)){ - new_suit[resistance>0] <- 0 - } - - # Simulate kissmig for a given threshold and suitability raster - km <- kissmig::kissmig(O = terra_to_raster( baseline_threshold ), - # Rescale newsuit to 0-1 - S = predictor_transform(new_suit, 'norm'), - it = as.numeric( params['iteration'] ), - type = params['type'], - pext = as.numeric(params['pext']), - pcor = as.numeric(params['pcor']) - ) - if(is.factor(km)) km <- terra::as.int(km) - - # Now multiply the net suitability projection with this mask - # Thus removing any non-suitable grid cells (0) and changing the value of those within reach - ns <- new_suit * km - - return( - c(km, ns) - ) -} - -# ------------------------ # -#### Connectivity constraints #### - -#' @title Adds a connectivity constraint to a scenario object. -#' @name add_constraint_connectivity -#' @aliases add_constraint_connectivity -#' @inheritParams add_constraint -#' @param resistance A [`RasterLayer`] object describing a resistance surface or barrier for use in -#' connectivity constrains (Default: \code{NULL}). -#' @family constraint -#' @keywords scenario -#' @exportMethod add_constraint_connectivity -#' @export -NULL -methods::setGeneric("add_constraint_connectivity", - signature = methods::signature("mod"), - function(mod, method, value = NULL, resistance = NULL, ...) standardGeneric("add_constraint_connectivity")) - -#' @name add_constraint_connectivity -#' @rdname add_constraint_connectivity -#' @usage \S4method{add_constraint_connectivity}{BiodiversityScenario, character, numeric, ANY}(mod, method, value, resistance) -methods::setMethod( - "add_constraint_connectivity", - methods::signature(mod = "BiodiversityScenario"), - function(mod, method, value = NULL, resistance = NULL, ...){ - assertthat::assert_that( - inherits(mod, "BiodiversityScenario"), - !is.Waiver(mod$get_predictors()), - is.character(method), - is.null(value) || is.numeric(value), - is.Raster(resistance) || is.null(resistance) - ) - # Match method - method <- match.arg(arg = method, - choices = c("hardbarrier", "resistance"), several.ok = FALSE) - - # Check if there is already a dispersal constrain, if yes raise warning - if(!is.Waiver(mod$get_constraints())){ - # If there are any dispersal constrains in there, raise warning - if(any( "connectivity" %in% names(mod$get_constraints()) )){ - if(getOption('ibis.setupmessages')) myLog('[Setup]','yellow','Overwriting existing connectivity constraint') - } - } - - # Add processing method # - # --- # - co <- list() - if(method == "hardbarrier"){ - # Assert hard barrier - assertthat::assert_that( - is.Raster(resistance), - !is.null(resistance), msg = "Set a hard barrier via the resistance parameter." - ) - # Check that resistance layer is a binary mask - assertthat::assert_that(length(unique(resistance))<=2, - terra::global(resistance,'max', na.rm = TRUE)>0, - msg = "Resistance layer should be a binary mark with values 0/1.") - co[['connectivity']] <- list(method = method, - params = c("resistance" = resistance)) - } else if(method == "resistance"){ - # Flexible resistance layer - assertthat::assert_that( - is.Raster(resistance), - !is.null(resistance), msg = "The method resistance requires a specified resistance raster." - ) - # If raster is stack with multiple layers, ensure that time - if(terra::nlyr(resistance)>1){ - # Check that layers have a z dimension and fall within the timeperiod - startend <- mod$get_timeperiod() - assertthat::assert_that( !is.null( terra::time(resistance) ), - all( range(terra::time(resistance))==startend ), - msg = "If a stack of layers is supplied as resistance, it needs a Z value of equal length to the predictors!") - } - times <- terra::time(resistance) - # If resistance layer is bigger than 1, normalize - if(any(terra::global(resistance, 'max', na.rm = TRUE)>1)){ - if(getOption('ibis.setupmessages')) myLog('[Setup]','yellow','Resistance values larger than 1. Normalizing...') - resistance <- predictor_transform(resistance, option = "norm") - } - resistance <- abs( resistance - 1 ) # Invert - if(!is.null(times)) terra::time(resistance) <- times # Reset times again if found - - co[['connectivity']] <- list(method = method, - params = c("resistance" = resistance)) - } - # --- # - new <- mod$set_constraints(co) - return( - bdproto(NULL, new) - ) - } -) - -# ------------------------ # -#### Adaptability constraints #### - -#' @title Adds an adaptability constraint to a scenario object -#' @description -#' Adaptability constraints assume that suitable habitat for species in (future) projections might be unsuitable if -#' it is outside the range of conditions currently observed for the species. -#' -#' Currently only `nichelimit` is implemented, which adds a simple constrain on the predictor parameter space, which -#' can be defined through the \code{"value"} parameter. For example by setting it to \code{1} (Default), any projections -#' are constrained to be within the range of at maximum 1 standard deviation from the range of covariates used for model -#' training. -#' @name add_constraint_adaptability -#' @aliases add_constraint_adaptability -#' @inheritParams add_constraint -#' @param names A [`character`] vector with names of the predictors for which an adaptability threshold should be set (Default: \code{NULL} for all). -#' @param value A [`numeric`] value in units of standard deviation (Default: \code{1}). -#' @param increment A [`numeric`] constant that is added to value at every time step (Default: \code{0}). -#' Allows incremental widening of the niche space, thus opening constraints. -#' @family constraint -#' @keywords scenario -#' @exportMethod add_constraint_adaptability -#' @export -NULL -methods::setGeneric("add_constraint_adaptability", - signature = methods::signature("mod"), - function(mod, method = "nichelimit", names = NULL, value = 1, increment = 0, ...) standardGeneric("add_constraint_adaptability")) - -#' @name add_constraint_adaptability -#' @rdname add_constraint_adaptability -#' @usage \S4method{add_constraint_adaptability}{BiodiversityScenario, character, character, numeric, numeric}(mod, method, names, value, increment) -methods::setMethod( - "add_constraint_adaptability", - methods::signature(mod = "BiodiversityScenario"), - function(mod, method = "nichelimit", names = NULL, value = 1, increment = 0, ...){ - assertthat::assert_that( - inherits(mod, "BiodiversityScenario"), - !is.Waiver(mod$get_predictors()), - is.character(method), - is.null(names) || is.character(names), - is.null(value) || is.numeric(value), - is.numeric(increment) - ) - # Match method - method <- match.arg(arg = method, - choices = c("nichelimit"), several.ok = FALSE) - - # Add processing method # - # --- # - co <- list() - if(method == "nichelimit"){ - # Add a constrain on parameter space, e.g. max 1 SD from training data covariates - assertthat::assert_that( - is.numeric(value), - is.null(names) || is.character(names), - value > 0, msg = "Specify a value threshold (SD) and names of predictors, for which - we do not expect the species to persist." - ) - if(is.null(names)) names <- character() - co[['adaptability']] <- list(method = method, - params = c("names" = names, "value" = value, - "increment" = increment)) - } - # --- # - new <- mod$set_constraints(co) - return( - bdproto(NULL, new) - ) - } -) - -#' Adaptability constrain by applying a limit on extrapolation beyond the niche -#' -#' @param newdata A [`data.frame`] with the information about new data layers. -#' @param model A [`list`] created by the modelling object containing the full predictors and biodiversity predictors. -#' @param names A [`character`] or \code{NULL} of the names of predictors. -#' @param value A [`numeric`] value in units of standard deviation (Default: \code{1}). -#' @param increment A [`numeric`] constant that is added to value at every time step (Default: \code{0}). -#' Allows incremental widening of the niche space, thus opening constraints. -#' @param increment_step A [`numeric`] indicating the number of time increment should be applied. -#' @keywords internal -#' @noRd -.nichelimit <- function(newdata, model, names = NULL, value = 1, increment = 0, increment_step = 1){ - assertthat::assert_that( - is.data.frame(newdata), - is.list(model), - is.numeric(as.numeric(value)), - is.null(names) || is.na(names) || is.character(names), - is.numeric(as.numeric(increment)), - is.numeric(as.numeric(increment_step)) - ) - # Check that names are present if set - if(is.null(names) || is.na(names)) names <- model$predictors_names - if(is.character(names) ) assertthat::assert_that(all(names %in% model$predictors_names)) - # Convert numeric parameters to numeric to be sure - value <- as.numeric(value) - increment <- as.numeric(increment) - increment_step <- as.numeric(increment_step) - # --- # - # Now calculate the range across each target predictor and occurrence dataset - df <- data.frame() - for(id in names(model$biodiversity)){ - sub <- model$biodiversity[[id]] - # Which are presence data - is_presence <- which(sub$observations[['observed']] > 0) - df <- rbind(df, - sub$predictors[is_presence, names]) - } - rr <- sapply(df, function(x) range(x, na.rm = TRUE)) # Calculate ranges - rsd <- sapply(df, function(x) sd(x, na.rm = TRUE)) # Calculate standard deviation - - # Apply value and increment if set - rsd <- rsd * (value + (increment*increment_step)) - rr[1,] <- rr[1,] - rsd; rr[2,] <- rr[2,] + rsd - - # Now 'clamp' all predictor values beyond these names to 0, e.g. partial out - nd <- newdata - for(n in names){ - # Calc min - min_ex <- which(nd[,n] < rr[1,n]) - max_ex <- which(nd[,n] > rr[2,n]) - if(length(min_ex)>0) nd[min_ex,n] <- NA - if(length(max_ex)>0) nd[max_ex,n] <- NA - # FIXME Or rather do a smooth logistic decay for less extreme? - } - return(nd) -} - -# ------------------------ # -#### Boundary constraints #### - -#' @title Adds a boundary constraint to a scenario object -#' @description -#' The purpose of boundary constraints is to limit a future projection within a specified area -#' (such as for example a range or ecoregion). This can help to limit unreasonable projections into geographic space. -#' -#' Similar to boundary constraints it is also possible to define a \code{"zone"} for the scenario projections, similar -#' as was done for model training. The difference to a boundary constraint is that the boundary constraint is applied posthoc -#' as a hard cut on any projection, while the zones would allow any projection (and other constraints) to be applied within -#' the zone. -#' **Note: Setting a boundary constraint for future projections effectively potentially suitable areas!** -#' @name add_constraint_boundary -#' @aliases add_constraint_boundary -#' @inheritParams add_constraint -#' @param layer A [`SpatRaster`] or [`sf`] object with the same extent as the model background. Has to be binary and -#' is used for a posthoc masking of projected grid cells. -#' @family constraint -#' @keywords scenario -#' @exportMethod add_constraint_boundary -#' @export -NULL -methods::setGeneric("add_constraint_boundary", - signature = methods::signature("mod", "layer"), - function(mod, layer, ...) standardGeneric("add_constraint_boundary")) - -#' @name add_constraint_boundary -#' @rdname add_constraint_boundary -#' @usage \S4method{add_constraint_boundary}{BiodiversityScenario, sf, character}(mod, layer, method) -methods::setMethod( - "add_constraint_boundary", - methods::signature(mod = "BiodiversityScenario", layer = "sf"), - function(mod, layer, method = "boundary", ...){ - assertthat::assert_that( - inherits(mod, "BiodiversityScenario"), - inherits(layer, "sf"), - is.character(method) - ) - - # Rasterize the layer - # First try and dig out a layer from a predictor dataset if found - if(inherits( mod$get_predictors(), "PredictorDataSet")){ - ras <- mod$get_predictors()$get_data() |> stars_to_raster() - ras <- ras[[1]] - } else { - # Try and get the underlying model and its predictors - ras <- mod$get_model()$get_data() - } - assertthat::assert_that(is.Raster(ras)) - bb <- try({ terra::rasterize(layer, ras, 1)}, silent = TRUE) - if(inherits(bb, "try-error")) stop("Provide a rasterized layer of the boundary constraint!") - - # Call again - o <- add_constraint_boundary(mod, layer = bb, method = method, ...) - - return( o ) - } -) - -#' @name add_constraint_boundary -#' @rdname add_constraint_boundary -#' @usage \S4method{add_constraint_boundary}{BiodiversityScenario, ANY, character}(mod, layer, method) -methods::setMethod( - "add_constraint_boundary", - methods::signature(mod = "BiodiversityScenario", layer = "ANY"), - function(mod, layer, method = "boundary", ...){ - assertthat::assert_that( - inherits(mod, "BiodiversityScenario"), - is.Raster(layer), - is.character(method) - ) - - # Check that layer is a single SpatRaster - if(!inherits(layer, "SpatRaster")){ - assertthat::assert_that(terra::nlyr(layer) == 1) - layer <- layer[[1]] - } - - # Add processing method # - # --- # - co <- list() - if(method == "boundary"){ - # Add a constrain on parameter space, e.g. max 1 SD from training data covariates - assertthat::assert_that( - length( unique( layer )) <=2 - ) - # If length of values is greater than 1, remove everything else by setting it to NA - if( length( unique( layer )) >1 ){ - layer[layer<1] <- NA - } - co[['boundary']] <- list(method = method, - params = c("layer" = layer)) - } - # --- # - new <- mod$set_constraints(co) - return( - bdproto(NULL, new) - ) - } -) +#' @include utils.R bdproto-biodiversityscenario.R +NULL + +#' Add a constraint to an existing \code{scenario} +#' +#' @description This function adds a constrain to a [`BiodiversityScenario-class`] object to +#' constrain (future) projections. These constrains can for instance be constraints on a possible +#' dispersal distance, connectivity between identified patches or limitations on species adaptability. +#' **Most constrains require pre-calculated thresholds to present in the [`BiodiversityScenario-class`] object!** +#' @param mod A [`BiodiversityScenario`] object with specified predictors. +#' @param method A [`character`] indicating the type of constraints to be added to the scenario. See details for more +#' information. +#' @param value For many dispersal \code{"constrain"} this is set as [`numeric`] value specifying a +#' fixed constrain or constant in units \code{"m"} (Default: \code{NULL}). For kissmig the value needs to +#' give the number of iteration steps (or within year migration steps). +#' For adaptability constraints this parameter specifies the extent (in units of standard deviation) to which extrapolations +#' should be performed. +#' @param type A [`character`] indicating the type used in the method. See for instance \code{`kissmig`}. +#' @param layer A [`SpatRaster`] object that can be used for boundary constraints (Default: \code{NULL}). +#' @param pext [`numeric`] indicator for \code{`kissmig`} of the probability a colonized cell becomes uncolonised, +#' i.e., the species gets locally extinct (Default: \code{0.1}). +#' @param pcor [`numeric`] probability that corner cells are considered in the 3x3 neighbourhood (Default: \code{0.2}). +#' @param ... passed on parameters. See also the specific methods for adding constraints. +#' +#' @seealso [`add_constraint_dispersal`], [`add_constraint_connectivity`], [`add_constraint_adaptability`], [`add_constraint_boundary`] +#' @details +#' Constraints can be added to scenario objects to increase or decrease the suitability of a given area for the +#' target feature. This function acts as a wrapper to add these constraints. +#' Currently supported are the following options: +#' **Dispersal**: +#' * \code{sdd_fixed} - Applies a fixed uniform dispersal distance per modelling timestep. +#' * \code{sdd_nexpkernel} - Applies a dispersal distance using a negative exponential kernel from its origin. +#' * \code{kissmig} - Applies the kissmig stochastic dispersal model. Requires \code{`kissmig`} package. Applied at each modelling time step. +#' * \code{migclim} - Applies the dispersal algorithm MigClim to the modelled objects. Requires \code{"MigClim"} package. +#' +#' A comprehensive overview of the benefits of including dispersal constrains in species distribution models +#' can be found in Bateman et al. (2013). +#' +#' **Connectivity**: +#' * \code{hardbarrier} - Defines a hard barrier to any dispersal events. By definition this sets all values larger +#' than \code{0} in the barrier layer to \code{0} in the projection. Barrier has to be provided through the \code{"resistance"} +#' parameter. +#' * \code{resistance} - Allows the provision of a static or dynamic layer that is multiplied with the projection at each +#' time step. Can for example be used to reduce the suitability of any given area (using pressures not included in the model). +#' The respective layer(s) have to be provided through the \code{"resistance"} parameter. Provided layers are incorporated as +#' \code{abs(resistance - 1)} and multiplied with the prediction. +#' +#' **Adaptability**: +#' * \code{nichelimit} - Specifies a limit on the environmental niche to only allow a modest amount of extrapolation beyond the known occurrences. This +#' can be particular useful to limit the influence of increasing marginal responses and avoid biologically unrealistic projections. +#' +#' **Boundary**: +#' * \code{boundary} - Applies a hard boundary constraint on the projection, thus disallowing an expansion of a range outside +#' the provide layer. Similar as specifying projection limits (see [`distribution`]), but can be used to specifically +#' constrain a projection within a certain area (e.g. a species range or an island). +#' +#' @returns Adds constraints data to a [`BiodiversityScenario`] object. +#' @references +#' * Bateman, B. L., Murphy, H. T., Reside, A. E., Mokany, K., & VanDerWal, J. (2013). Appropriateness of full‐, partial‐and no‐dispersal scenarios in climate change impact modelling. Diversity and Distributions, 19(10), 1224-1234. +#' * Nobis MP and Normand S (2014) KISSMig - a simple model for R to account for limited migration in analyses of species distributions. Ecography 37: 1282-1287. +#' * Mendes, P., Velazco, S. J. E., de Andrade, A. F. A., & Júnior, P. D. M. (2020). Dealing with overprediction in species distribution models: How adding distance constraints can improve model accuracy. Ecological Modelling, 431, 109180. +#' @examples +#' \dontrun{ +#' # Assumes that a trained 'model' object exists +#' mod <- scenario(model) |> +#' add_predictors(env = predictors, transform = 'scale', derivates = "none") |> +#' add_constraint_dispersal(method = "kissmig", value = 2, pext = 0.1) |> +#' project() +#' } +#' @name add_constraint +#' @family constraint +#' @aliases add_constraint +#' @keywords scenario +#' @exportMethod add_constraint +#' @export +NULL +methods::setGeneric("add_constraint", + signature = methods::signature("mod"), + function(mod, method, ...) standardGeneric("add_constraint")) + +#' @name add_constraint +#' @rdname add_constraint +#' @usage \S4method{add_constraint}{BiodiversityScenario,character}(mod,method) +methods::setMethod( + "add_constraint", + methods::signature(mod = "BiodiversityScenario"), + function(mod, method, ...) { + assertthat::assert_that( + inherits(mod, "BiodiversityScenario"), + !is.Waiver(mod$get_predictors()), + is.character(method) + ) + # Match method + method <- match.arg(arg = method, + choices = c("sdd_fixed", "sdd_nexpkernel", "kissmig", "migclim", + "hardbarrier","resistance", "boundary", + "nichelimit"), several.ok = FALSE) + + # Now call the respective functions individually + o <- switch(method, + # Fixed dispersal + "sdd_fixed" = add_constraint_dispersal(mod, method = "sdd_fixed", ...), + # Short-distance dispersal + "sdd_nexpkernel" = add_constraint_dispersal(mod, method = "sdd_nexpkernel", ...), + # Add kissmig dispersal + "kissmig" = add_constraint_dispersal(mod, method = "kissmig", ...), + # Using the migclim package + "migclim" = add_constraint_dispersal(mod, method = "migclim", ...), + # --- # + "hardbarrier" = add_constraint_connectivity(mod, method = "hardbarrier", ...), + # --- # + "resistance" = add_constraint_connectivity(mod, method = "resistance", ...), + # --- # + "nichelimit" = add_constraint_adaptability(mod, method = "nichelimit", ...), + # --- # + "boundary" = add_constraint_boundary(mod, ...) + ) + return(o) + } +) + +# ------------------------ # +#### Dispersal constraints #### + +#' @title Adds a dispersal constrain to a scenario object. +#' @name add_constraint_dispersal +#' @aliases add_constraint_dispersal +#' @inheritParams add_constraint +#' @family constraint +#' @keywords scenario +#' @exportMethod add_constraint_dispersal +#' @export +NULL +methods::setGeneric("add_constraint_dispersal", + signature = methods::signature("mod"), + function(mod, method, value = NULL, type = NULL, ...) standardGeneric("add_constraint_dispersal")) + +#' @name add_constraint_dispersal +#' @rdname add_constraint_dispersal +#' @usage \S4method{add_constraint_dispersal}{BiodiversityScenario,character,numeric}(mod,method,value) +methods::setMethod( + "add_constraint_dispersal", + methods::signature(mod = "BiodiversityScenario"), + function(mod, method, value = NULL, type = NULL, ...){ + assertthat::assert_that( + inherits(mod, "BiodiversityScenario"), + !is.Waiver(mod$get_predictors()), + is.character(method), + is.null(value) || is.numeric(value), + is.null(type) || is.character(type) + ) + # Match method + method <- match.arg(arg = method, + choices = c("sdd_fixed", "sdd_nexpkernel", "kissmig", "migclim"), several.ok = FALSE) + + # Other arguments supplied + dots <- list(...) + argnames <- names(dots) + + # Check if there is already a dispersal constrain, if yes raise warning + if(!is.Waiver(mod$get_constraints())){ + # If there are any dispersal constrains in there, raise warning + if(any("dispersal" %in% names(mod$get_constraints()))){ + if(getOption('ibis.setupmessages')) myLog('[Estimation]', 'yellow', 'Overwriting existing dispersal constraint.') + } + } + + # Add processing method # + # --- # + cr <- list() + if(method == "sdd_fixed"){ + # Short-distance dispersal (Fixed) + assertthat::assert_that( + is.numeric(value), msg = "Fixed short distance dispersal needs an annual mean disperal distance value." + ) + cr[['dispersal']] <- list(method = method, + params = c("mean_dispersal_distance" = value)) + } else if(method == "sdd_nexpkernel") { + # Negative exponential kernel + assertthat::assert_that( + is.numeric(value), msg = "Short distance negative exponential kernal dispersal needs an annual mean disperal distance value." + ) + cr[['dispersal']] <- list(method = method, + params = c("mean_dispersal_distance" = value)) + } else if(method == "kissmig"){ + # Check parameters to be correct + check_package("kissmig") + # Gather some default parameters + if(is.null(type)) type <- "DIS" else match.arg(type, c("DIS", "FOC", "LOC", "NOC"), several.ok = FALSE) + assertthat::assert_that( + is.numeric(value), + value > 0, msg = "For kissmig the value needs to give the number of iteration steps (or within time migration steps)." + ) + # probability [0,1] a colonized cell becomes uncolonized between iteration steps, i.e., the species gets locally extinct + if("pext" %in% argnames) pext <- dots[["pext"]] else pext <- 0.1 + # probability [0,1] corner cells are considered in the 3x3 cell neighborhood. Following Nobis & Nomand 2014, 0.2 is recommended for circular spread + if("pcor" %in% argnames) pcor <- dots[["pcor"]] else pcor <- 0.2 + + if(getOption('ibis.setupmessages')) myLog('[Estimation]', 'green', 'KISSMIG options: iterations=',value,'| pext=', pext,'| pcor=', pcor) + + cr[['dispersal']] <- list(method = method, + params = c("iteration" = value, + "type" = type, + "signed" = FALSE, + "pext" = pext, + "pcor" = pcor + )) + + } + if(method == "migclim"){ + # Using the MigClim package for calculating any transitions and + # This requires prior calculated Thresholds! + out <- add_constraint_MigClim(mod = mod, ...) + return(out) + } else { + # --- # + new <- mod$set_constraints(cr) + return( + bdproto(NULL, new) + ) + } + + } +) + +#' Short-distance fixed dispersal function +#' @param baseline_threshold The [`SpatRaster`] with presence/absence information from a previous year. +#' @param new_suit A new [`SpatRaster`] object. +#' @param value A [`numeric`] value of the fixed dispersal threshold. In unit \code{'meters'}. +#' @param resistance A resistance [`SpatRaster`] object with values to be omitted during distance calculation (Default: \code{NULL}). +#' @noRd +#' @keywords internal +.sdd_fixed <- function(baseline_threshold, new_suit, value, resistance = NULL){ + assertthat::assert_that( + is.Raster(baseline_threshold), is.Raster(new_suit), + is_comparable_raster(baseline_threshold, new_suit), + is.numeric(value), + is.null(resistance) || is.Raster(resistance), + # Check that baseline threshold raster is binomial + length(unique(baseline_threshold))==2 + ) + + # Set resistance layer to 0 if set to zero. + if(is.Raster(resistance)){ + baseline_threshold[resistance == 1] <- 2 + # Set resistance to the value omitted + resistance <- 2 + baseline_threshold <- terra::mask(baseline_threshold, resistance) + } + # Grow baseline raster by the amount of value at max + # Furthermore divide by value to get a normalized distance + dis <- terra::gridDist(baseline_threshold, target = 1) + ras_dis <- terra::clamp(dis, lower = 0, upper = value) / value + # Invert + ras_dis <- abs(ras_dis - 1) + + # Now multiply the net suitability projection with this mask + # Thus removing any grid cells outside + out <- new_suit * ras_dis + return(out) +} + +#' Short-distance negative exponential kernel dispersal function +#' @param baseline_threshold The [`SpatRaster`] with presence/absence information from a previous year. +#' @param new_suit A new [`SpatRaster`] object. +#' @param value A [`numeric`] value of the fixed dispersal threshold. In unit \code{'meters'}. +#' @param normalize Should a normalising constant be used for the exponential dispersal parameter (Default: \code{FALSE}). +#' @param resistance A resistance [`SpatRaster`] object with values to be omitted during distance calculation (Default: \code{NULL}). +#' @noRd +#' @keywords internal +.sdd_nexpkernel <- function(baseline_threshold, new_suit, value, normalize = FALSE, resistance = NULL){ + assertthat::assert_that( + is.Raster(baseline_threshold), is.Raster(new_suit), + is_comparable_raster(baseline_threshold, new_suit), + is.numeric(value), + is.null(resistance) || is.Raster(resistance), + # Check that baseline threshold raster is binomial + length(unique(baseline_threshold)[,1])==2 + ) + + # Set resistance layer to 0 if set to zero. + if(is.Raster(resistance)){ + baseline_threshold[resistance == 1] <- 2 + # Set resistance to the value omitted + resistance <- 2 + baseline_threshold <- terra::mask(baseline_threshold, resistance) + } + + # Inverse of mean dispersal distance + alpha <- 1/value + + # Grow baseline raster by using an exponentially weighted kernel + ras_dis <- terra::gridDist(baseline_threshold, target = 1) + if(normalize){ + # Normalized (with a constant) negative exponential kernel + ras_dis <- terra::app(ras_dis, fun = function(x) (1 / (2 * pi * value ^ 2)) * exp(-x / value) ) + } else { + ras_dis <- terra::app(ras_dis, fun = function(x) exp(-alpha * x)) + } + + # Now multiply the net suitability projection with this mask + # Thus removing any non-suitable grid cells (0) and changing the value of those within reach + out <- new_suit * ras_dis + return(out) +} + +#' Keep it simple migration calculation. +#' @param baseline_threshold The [`SpatRaster`] with presence/absence information from a previous year. +#' @param new_suit A new [`SpatRaster`] object. +#' @param params A [vector] or [list] with passed on parameter values. +#' @param resistance A resistance [`SpatRaster`] object with values to be omitted during distance calculation (Default: \code{NULL}). +#' @noRd +#' @keywords internal +.kissmig_dispersal <- function(baseline_threshold, new_suit, params, resistance = NULL){ + assertthat::assert_that( + is.Raster(baseline_threshold), is.Raster(new_suit), + is_comparable_raster(baseline_threshold, new_suit), + is.vector(params) || is.list(params), + is.null(resistance) || is.logical(resistance) || is.Raster(resistance), + # Check that baseline threshold raster is binomial + length(unique(baseline_threshold))==2 + ) + + check_package('kissmig') + if(!isNamespaceLoaded("kissmig")) { attachNamespace("kissmig");requireNamespace("kissmig") } + + # Set suitability layer to 0 if resistance layer is set + if(is.Raster(resistance)){ + new_suit[resistance>0] <- 0 + } + + # Simulate kissmig for a given threshold and suitability raster + km <- kissmig::kissmig(O = terra_to_raster( baseline_threshold ), + # Rescale newsuit to 0-1 + S = predictor_transform(new_suit, 'norm'), + it = as.numeric( params['iteration'] ), + type = params['type'], + pext = as.numeric(params['pext']), + pcor = as.numeric(params['pcor']) + ) + if(is.factor(km)) km <- terra::as.int(km) + + # Now multiply the net suitability projection with this mask + # Thus removing any non-suitable grid cells (0) and changing the value of those within reach + ns <- new_suit * km + + return( + c(km, ns) + ) +} + +# ------------------------ # +#### Connectivity constraints #### + +#' @title Adds a connectivity constraint to a scenario object. +#' @name add_constraint_connectivity +#' @aliases add_constraint_connectivity +#' @inheritParams add_constraint +#' @param resistance A [`SpatRaster`] object describing a resistance surface or barrier for use in +#' connectivity constrains (Default: \code{NULL}). +#' @family constraint +#' @keywords scenario +#' @exportMethod add_constraint_connectivity +#' @export +NULL +methods::setGeneric("add_constraint_connectivity", + signature = methods::signature("mod"), + function(mod, method, value = NULL, resistance = NULL, ...) standardGeneric("add_constraint_connectivity")) + +#' @name add_constraint_connectivity +#' @rdname add_constraint_connectivity +#' @usage \S4method{add_constraint_connectivity}{BiodiversityScenario,character,numeric,ANY}(mod,method,value,resistance) +methods::setMethod( + "add_constraint_connectivity", + methods::signature(mod = "BiodiversityScenario"), + function(mod, method, value = NULL, resistance = NULL, ...){ + assertthat::assert_that( + inherits(mod, "BiodiversityScenario"), + !is.Waiver(mod$get_predictors()), + is.character(method), + is.null(value) || is.numeric(value), + is.Raster(resistance) || is.null(resistance) + ) + # Match method + method <- match.arg(arg = method, + choices = c("hardbarrier", "resistance"), several.ok = FALSE) + + # Check if there is already a dispersal constrain, if yes raise warning + if(!is.Waiver(mod$get_constraints())){ + # If there are any dispersal constrains in there, raise warning + if(any( "connectivity" %in% names(mod$get_constraints()) )){ + if(getOption('ibis.setupmessages')) myLog('[Setup]','yellow','Overwriting existing connectivity constraint') + } + } + + # Add processing method # + # --- # + co <- list() + if(method == "hardbarrier"){ + # Assert hard barrier + assertthat::assert_that( + is.Raster(resistance), + !is.null(resistance), msg = "Set a hard barrier via the resistance parameter." + ) + # Check that resistance layer is a binary mask + assertthat::assert_that(length(unique(resistance))<=2, + terra::global(resistance,'max', na.rm = TRUE)>0, + msg = "Resistance layer should be a binary mark with values 0/1.") + co[['connectivity']] <- list(method = method, + params = c("resistance" = resistance)) + } else if(method == "resistance"){ + # Flexible resistance layer + assertthat::assert_that( + is.Raster(resistance), + !is.null(resistance), msg = "The method resistance requires a specified resistance raster." + ) + # If raster is stack with multiple layers, ensure that time + if(terra::nlyr(resistance)>1){ + # Check that layers have a z dimension and fall within the timeperiod + startend <- mod$get_timeperiod() + assertthat::assert_that( !is.null( terra::time(resistance) ), + all( range(terra::time(resistance))==startend ), + msg = "If a stack of layers is supplied as resistance, it needs a Z value of equal length to the predictors!") + } + times <- terra::time(resistance) + # If resistance layer is bigger than 1, normalize + if(any(terra::global(resistance, 'max', na.rm = TRUE)>1)){ + if(getOption('ibis.setupmessages')) myLog('[Setup]','yellow','Resistance values larger than 1. Normalizing...') + resistance <- predictor_transform(resistance, option = "norm") + } + resistance <- abs( resistance - 1 ) # Invert + if(!is.null(times)) terra::time(resistance) <- times # Reset times again if found + + co[['connectivity']] <- list(method = method, + params = c("resistance" = resistance)) + } + # --- # + new <- mod$set_constraints(co) + return( + bdproto(NULL, new) + ) + } +) + +# ------------------------ # +#### Adaptability constraints #### + +#' @title Adds an adaptability constraint to a scenario object +#' @description +#' Adaptability constraints assume that suitable habitat for species in (future) projections might be unsuitable if +#' it is outside the range of conditions currently observed for the species. +#' +#' Currently only `nichelimit` is implemented, which adds a simple constrain on the predictor parameter space, which +#' can be defined through the \code{"value"} parameter. For example by setting it to \code{1} (Default), any projections +#' are constrained to be within the range of at maximum 1 standard deviation from the range of covariates used for model +#' training. +#' @name add_constraint_adaptability +#' @aliases add_constraint_adaptability +#' @inheritParams add_constraint +#' @param names A [`character`] vector with names of the predictors for which an adaptability threshold should be set (Default: \code{NULL} for all). +#' @param value A [`numeric`] value in units of standard deviation (Default: \code{1}). +#' @param increment A [`numeric`] constant that is added to value at every time step (Default: \code{0}). +#' Allows incremental widening of the niche space, thus opening constraints. +#' @family constraint +#' @examples +#' \dontrun{ +#' scenario(fit) |> +#' add_constraint_adaptability(value = 1) +#' } +#' +#' @keywords scenario +#' @exportMethod add_constraint_adaptability +#' @export +NULL +methods::setGeneric("add_constraint_adaptability", + signature = methods::signature("mod"), + function(mod, method = "nichelimit", names = NULL, value = 1, increment = 0, ...) standardGeneric("add_constraint_adaptability")) + +#' @name add_constraint_adaptability +#' @rdname add_constraint_adaptability +#' @usage \S4method{add_constraint_adaptability}{BiodiversityScenario,character,character,numeric,numeric}(mod,method,names,value,increment) +methods::setMethod( + "add_constraint_adaptability", + methods::signature(mod = "BiodiversityScenario"), + function(mod, method = "nichelimit", names = NULL, value = 1, increment = 0, ...){ + assertthat::assert_that( + inherits(mod, "BiodiversityScenario"), + !is.Waiver(mod$get_predictors()), + is.character(method), + is.null(names) || is.character(names), + is.null(value) || is.numeric(value), + is.numeric(increment) + ) + # Match method + method <- match.arg(arg = method, + choices = c("nichelimit"), several.ok = FALSE) + + # Add processing method # + # --- # + co <- list() + if(method == "nichelimit"){ + # Add a constrain on parameter space, e.g. max 1 SD from training data covariates + assertthat::assert_that( + is.numeric(value), + is.null(names) || is.character(names), + value > 0, msg = "Specify a value threshold (SD) and names of predictors, for which + we do not expect the species to persist." + ) + if(is.null(names)) names <- character() + co[['adaptability']] <- list(method = method, + params = c("names" = names, "value" = value, + "increment" = increment)) + } + # --- # + new <- mod$set_constraints(co) + return( + bdproto(NULL, new) + ) + } +) + +#' Adaptability constrain by applying a limit on extrapolation beyond the niche +#' +#' @param newdata A [`data.frame`] with the information about new data layers. +#' @param model A [`list`] created by the modelling object containing the full predictors and biodiversity predictors. +#' @param names A [`character`] or \code{NULL} of the names of predictors. +#' @param value A [`numeric`] value in units of standard deviation (Default: \code{1}). +#' @param increment A [`numeric`] constant that is added to value at every time step (Default: \code{0}). +#' Allows incremental widening of the niche space, thus opening constraints. +#' @param increment_step A [`numeric`] indicating the number of time increment should be applied. +#' @keywords internal +#' @noRd +.nichelimit <- function(newdata, model, names = NULL, value = 1, increment = 0, increment_step = 1){ + assertthat::assert_that( + is.data.frame(newdata), + is.list(model), + is.numeric(as.numeric(value)), + is.null(names) || is.na(names) || is.character(names), + is.numeric(as.numeric(increment)), + is.numeric(as.numeric(increment_step)) + ) + # Check that names are present if set + if(is.null(names) || is.na(names)) names <- model$predictors_names + if(is.character(names) ) assertthat::assert_that(all(names %in% model$predictors_names)) + # Convert numeric parameters to numeric to be sure + value <- as.numeric(value) + increment <- as.numeric(increment) + increment_step <- as.numeric(increment_step) + # --- # + # Now calculate the range across each target predictor and occurrence dataset + df <- data.frame() + for(id in names(model$biodiversity)){ + sub <- model$biodiversity[[id]] + # Which are presence data + is_presence <- which(sub$observations[['observed']] > 0) + df <- rbind(df, + sub$predictors[is_presence, names]) + } + rr <- sapply(df, function(x) range(x, na.rm = TRUE)) # Calculate ranges + rsd <- sapply(df, function(x) stats::sd(x, na.rm = TRUE)) # Calculate standard deviation + + # Apply value and increment if set + rsd <- rsd * (value + (increment*increment_step)) + rr[1,] <- rr[1,] - rsd; rr[2,] <- rr[2,] + rsd + + # Now 'clamp' all predictor values beyond these names to 0, e.g. partial out + nd <- newdata + for(n in names){ + # Calc min + min_ex <- which(nd[,n] < rr[1,n]) + max_ex <- which(nd[,n] > rr[2,n]) + if(length(min_ex)>0) nd[min_ex,n] <- NA + if(length(max_ex)>0) nd[max_ex,n] <- NA + # FIXME Or rather do a smooth logistic decay for less extreme? + } + return(nd) +} + +# ------------------------ # +#### Boundary constraints #### + +#' @title Adds a boundary constraint to a scenario object +#' @description +#' The purpose of boundary constraints is to limit a future projection within a specified area +#' (such as for example a range or ecoregion). This can help to limit unreasonable projections into geographic space. +#' +#' Similar to boundary constraints it is also possible to define a \code{"zone"} for the scenario projections, similar +#' as was done for model training. The difference to a boundary constraint is that the boundary constraint is applied posthoc +#' as a hard cut on any projection, while the zones would allow any projection (and other constraints) to be applied within +#' the zone. +#' **Note: Setting a boundary constraint for future projections effectively potentially suitable areas!** +#' @name add_constraint_boundary +#' @aliases add_constraint_boundary +#' @inheritParams add_constraint +#' @param layer A [`SpatRaster`] or [`sf`] object with the same extent as the model background. Has to be binary and +#' is used for a posthoc masking of projected grid cells. +#' @family constraint +#' @examples +#' \dontrun{ +#' # Add scenario constraint +#' scenario(fit) |> add_constraint_boundary(range) +#' } +#' +#' @keywords scenario +#' @exportMethod add_constraint_boundary +#' @export +NULL +methods::setGeneric("add_constraint_boundary", + signature = methods::signature("mod", "layer"), + function(mod, layer, ...) standardGeneric("add_constraint_boundary")) + +#' @name add_constraint_boundary +#' @rdname add_constraint_boundary +#' @usage \S4method{add_constraint_boundary}{BiodiversityScenario,sf,character}(mod,layer,method) +methods::setMethod( + "add_constraint_boundary", + methods::signature(mod = "BiodiversityScenario", layer = "sf"), + function(mod, layer, method = "boundary", ...){ + assertthat::assert_that( + inherits(mod, "BiodiversityScenario"), + inherits(layer, "sf"), + is.character(method) + ) + + # Rasterize the layer + # First try and dig out a layer from a predictor dataset if found + if(inherits( mod$get_predictors(), "PredictorDataSet")){ + ras <- mod$get_predictors()$get_data() |> stars_to_raster() + ras <- ras[[1]] + } else { + # Try and get the underlying model and its predictors + ras <- mod$get_model()$get_data() + } + assertthat::assert_that(is.Raster(ras)) + bb <- try({ terra::rasterize(layer, ras, 1)}, silent = TRUE) + if(inherits(bb, "try-error")) stop("Provide a rasterized layer of the boundary constraint!") + + # Call again + o <- add_constraint_boundary(mod, layer = bb, method = method, ...) + + return( o ) + } +) + +#' @name add_constraint_boundary +#' @rdname add_constraint_boundary +#' @usage \S4method{add_constraint_boundary}{BiodiversityScenario,ANY,character}(mod,layer,method) +methods::setMethod( + "add_constraint_boundary", + methods::signature(mod = "BiodiversityScenario", layer = "ANY"), + function(mod, layer, method = "boundary", ...){ + assertthat::assert_that( + inherits(mod, "BiodiversityScenario"), + is.Raster(layer), + is.character(method) + ) + + # Check that layer is a single SpatRaster + if(!inherits(layer, "SpatRaster")){ + assertthat::assert_that(terra::nlyr(layer) == 1) + layer <- layer[[1]] + } + + # Add processing method # + # --- # + co <- list() + if(method == "boundary"){ + # Add a constrain on parameter space, e.g. max 1 SD from training data covariates + assertthat::assert_that( + length( unique( layer )) <=2 + ) + # If length of values is greater than 1, remove everything else by setting it to NA + if( length( unique( layer )) >1 ){ + layer[layer<1] <- NA + } + co[['boundary']] <- list(method = method, + params = c("layer" = layer)) + } + # --- # + new <- mod$set_constraints(co) + return( + bdproto(NULL, new) + ) + } +) diff --git a/R/add_constraint_MigClim.R b/R/add_constraint_MigClim.R index fe1892d4..879f91f4 100644 --- a/R/add_constraint_MigClim.R +++ b/R/add_constraint_MigClim.R @@ -15,7 +15,7 @@ NULL #' Can be set either to a uniform numeric [vector], e.g. \code{c(1,1,1,1)} or to a proportional decline \code{(1,0.4,0.16,0.06,0.03)} (Default). #' **Depending on the resolution of the raster, this parameter needs to be adapted** #' @param barrierType A [character] indicating whether any set barrier should be set as \code{'strong'} or \code{'weak'} barriers. -#' Strong barriers prevent any dispersal across the barrier and weak barriers only do so if the whole [dispKernel] length +#' Strong barriers prevent any dispersal across the barrier and weak barriers only do so if the whole \code{"dispKernel"} length #' is covered by the barrier (Default: \code{'strong'}). #' @param lddFreq [`numeric`] parameter indicating the frequency of long-distance dispersal (LDD) events. Default is \code{0}, so no long-distance dispersal. #' @param lddRange A [`numeric`] value highlighting the minimum and maximum distance of LDD events. @@ -26,8 +26,8 @@ NULL #' Set as probability vector that defines the probability of a cell producing propagules. #' @param replicateNb Number of replicates to be used for the analysis (Default: \code{10}). #' @param dtmp A [`character`] to a folder where temporary files are to be created. -#' @details The barrier parameter is defined through [add_barrier]. -#' @seealso [`MigClim.userGuide()`] +#' @details The barrier parameter is defined through \code{"add_barrier"}. +#' @seealso \code{"MigClim::MigClim.userGuide()"} #' @references #' * Engler R., Hordijk W. and Guisan A. The MIGCLIM R package – seamless integration of #' dispersal constraints into projections of species distribution models. Ecography, @@ -61,7 +61,7 @@ methods::setGeneric("add_constraint_MigClim", #' @name add_constraint_MigClim #' @rdname add_constraint_MigClim -#' @usage \S4method{add_constraint_MigClim}{BiodiversityScenario, character, numeric, numeric, character, numeric, numeric, numeric, numeric, numeric, character}(mod, rcThresholdMode, dispSteps, dispKernel, barrierType, lddFreq, lddRange, iniMatAge, propaguleProdProb, replicateNb, dtmp) +#' @usage \S4method{add_constraint_MigClim}{BiodiversityScenario,character,numeric,numeric,character,numeric,numeric,numeric,numeric,numeric,character}(mod,rcThresholdMode,dispSteps,dispKernel,barrierType,lddFreq,lddRange,iniMatAge,propaguleProdProb,replicateNb,dtmp) methods::setMethod( "add_constraint_MigClim", methods::signature(mod = "BiodiversityScenario"), diff --git a/R/add_control_bias.R b/R/add_control_bias.R index c4c88c3d..bab8253c 100644 --- a/R/add_control_bias.R +++ b/R/add_control_bias.R @@ -14,7 +14,7 @@ #' offsets to factor out a specified bias variable. #' #' @note -#' **Covariate transformations applied to other predictors need to ** +#' **Covariate transformations applied to other predictors need to be applied to bias too.** #' Another option to consider biases particular in Poisson-point process models is to remove them #' through an offset. Functionality to do so is available through the [`add_offset_bias()`] method. Setting the #' method to \code{"offset"} will automatically point to this option. @@ -34,6 +34,7 @@ #' * Botella, C., Joly, A., Bonnet, P., Munoz, F., & Monestiez, P. (2021). Jointly estimating spatial sampling effort and habitat suitability for multiple species from opportunistic presence‐only data. Methods in Ecology and Evolution, 12(5), 933-945. #' @returns Adds bias control option to a [`distribution`] object. #' @keywords bias, offset +#' @aliases add_control_bias #' @examples #' \dontrun{ #' x <- distribution(background) |> @@ -54,7 +55,7 @@ methods::setGeneric( #' @name add_control_bias #' @rdname add_control_bias -#' @usage \S4method{add_control_bias}{BiodiversityDistribution, SpatRaster}(x, layer) +#' @usage \S4method{add_control_bias}{BiodiversityDistribution,SpatRaster,character,ANY,logical}(x,layer,method,bias_value,add) methods::setMethod( "add_control_bias", methods::signature(x = "BiodiversityDistribution", layer = "SpatRaster"), diff --git a/R/add_latent.R b/R/add_latent.R index bbf18fb8..a0ffdcf8 100644 --- a/R/add_latent.R +++ b/R/add_latent.R @@ -20,13 +20,13 @@ NULL #' #' Available are: #' -#' [*] \code{"spde"} - stochastic partial differential equation (SPDE) for [`INLA-engine`] and [`INLABRU-engine`]. +#' [*] \code{"spde"} - stochastic partial differential equation (SPDE) for [`engine_inla`] and [`engine_inlabru`]. #' SPDE effects aim at capturing the variation of the response variable in space, once all of the covariates are accounted for. #' Examining the spatial distribution of the spatial error can reveal which covariates might be missing. For example, #' if elevation is positively correlated with the response variable, but is not included in the model, #' we could see a higher posterior mean in areas with higher elevation. Note that calculations of #' SPDE's can be computationally costly. -#' * \code{"car"} - conditional autocorrelative errors (CAR) for [`INLA-engine`]. Not yet implemented in full. +#' * \code{"car"} - conditional autocorrelative errors (CAR) for [`engine_inla`]. Not yet implemented in full. #' * \code{"kde"} - additional covariate of the kernel density of input point observations. #' * \code{"poly"} - spatial trend correction by adding coordinates as polynominal transformation. Available for all Engines. #' * \code{"nnd"} - nearest neighbour distance. This function calculates the euclidean distance from each grid cell @@ -35,7 +35,7 @@ NULL #' #' @param x [distribution()] (i.e. [`BiodiversityDistribution-class`]) object. #' @param method A [`character`] describing what kind of spatial effect is to be added to the model. See details. -#' @param priors A [`Prior-List`] object supplied to the latent effect. Relevant only for [`engine_inla`] and \code{NULL} equates the use of default priors. +#' @param priors A \code{"Prior-List"} object supplied to the latent effect. Relevant only for [`engine_inla`] and \code{NULL} equates the use of default priors. #' @param separate_spde A [`logical`] parameter indicating whether, in the case of SPDE effects, separate effects #' for each likelihood are being fitted. Default (\code{FALSE}) uses a copy of the first added likelihood. #' @param ... Other parameters passed down @@ -45,6 +45,7 @@ NULL #' * Mendes, P., Velazco, S. J. E., de Andrade, A. F. A., & Júnior, P. D. M. (2020). Dealing with overprediction in species distribution models: How adding distance constraints can improve model accuracy. Ecological Modelling, 431, 109180. #' #' @keywords latent +#' @aliases add_latent_spatial #' @examples #' \dontrun{ #' distribution(background) |> add_latent_spatial(method = "poly") @@ -63,7 +64,7 @@ methods::setGeneric( #' @name add_latent_spatial #' @rdname add_latent_spatial -#' @usage \S4method{add_latent_spatial}{BiodiversityDistribution}(x) +#' @usage \S4method{add_latent_spatial}{BiodiversityDistribution,character,ANY,logical}(x,method,priors,separate_spde,...) methods::setMethod( "add_latent_spatial", methods::signature(x = "BiodiversityDistribution"), @@ -105,6 +106,7 @@ methods::setMethod( #' rm_latent(model) -> model #' } #' @keywords latent, internal +#' @aliases rm_latent #' @name rm_latent NULL diff --git a/R/add_log.R b/R/add_log.R index e484fe3e..486eb753 100644 --- a/R/add_log.R +++ b/R/add_log.R @@ -31,7 +31,7 @@ methods::setGeneric( #' @name add_log #' @rdname add_log -#' @usage \S4method{add_log}{BiodiversityDistribution, character}(x, filename) +#' @usage \S4method{add_log}{BiodiversityDistribution,character}(x,filename) methods::setMethod( "add_log", methods::signature(x = "BiodiversityDistribution", filename = "character"), diff --git a/R/add_offset.R b/R/add_offset.R index a0d6e435..5ce7bba0 100644 --- a/R/add_offset.R +++ b/R/add_offset.R @@ -30,6 +30,7 @@ #' * Merow, C., Allen, J.M., Aiello-Lammens, M., Silander, J.A., 2016. Improving niche and range estimates with Maxent and point process models by integrating spatially explicit information. Glob. Ecol. Biogeogr. 25, 1022–1036. https://doi.org/10.1111/geb.12453 #' @returns Adds an offset to a [`distribution`] object. #' @family offset +#' @aliases add_offset #' @keywords prior, offset #' @examples #' \dontrun{ @@ -51,7 +52,7 @@ methods::setGeneric( #' @name add_offset #' @rdname add_offset -#' @usage \S4method{add_offset}{BiodiversityDistribution, SpatRaster}(x, layer) +#' @usage \S4method{add_offset}{BiodiversityDistribution,SpatRaster,logical}(x,layer,add) methods::setMethod( "add_offset", methods::signature(x = "BiodiversityDistribution", layer = "SpatRaster"), @@ -110,6 +111,7 @@ methods::setMethod( #' } #' @family offset #' @keywords prior, offset, internal +#' @aliases rm_offset #' @name rm_offset NULL @@ -124,7 +126,7 @@ methods::setGeneric( #' @name rm_offset #' @rdname rm_offset -#' @usage \S4method{rm_offset}{BiodiversityDistribution, character}(x, layer) +#' @usage \S4method{rm_offset}{BiodiversityDistribution,character}(x,layer) methods::setMethod( "rm_offset", methods::signature(x = "BiodiversityDistribution", layer = "character"), @@ -179,6 +181,7 @@ methods::setMethod( #' * Merow, C., Allen, J.M., Aiello-Lammens, M., Silander, J.A., 2016. Improving niche and range estimates with Maxent and point process models by integrating spatially explicit information. Glob. Ecol. Biogeogr. 25, 1022–1036. https://doi.org/10.1111/geb.12453 #' @family offset #' @keywords prior, offset +#' @aliases add_offset_bias #' @returns Adds a bias offset to a [`distribution`] object. #' @examples #' \dontrun{ @@ -200,7 +203,7 @@ methods::setGeneric( #' @name add_offset_bias #' @rdname add_offset_bias -#' @usage \S4method{add_offset_bias}{BiodiversityDistribution, SpatRaster}(x, layer) +#' @usage \S4method{add_offset_bias}{BiodiversityDistribution,SpatRaster,logical,ANY}(x,layer,add,points) methods::setMethod( "add_offset_bias", methods::signature(x = "BiodiversityDistribution", layer = "SpatRaster"), @@ -273,8 +276,10 @@ methods::setMethod( #' This function has additional options compared to the more generic #' [`add_offset()`], allowing customized options specifically for expert-based #' ranges as offsets or spatialized polygon information on species occurrences. -#' If even more control is needed, the user is informed of the \pkg{bossMaps} -#' package Merow et al. (2017). +#' If even more control is needed, the user is informed of the \code{"bossMaps"} +#' package Merow et al. (2017). Some functionalities of that package emulated +#' through the \code{"distance_function"} set to \code{"log"}. This tries to fit +#' a 5-parameter logistic function to estimate the distance from the range (Merow et al. 2017). #' #' @details #' The output created by this function creates a [`SpatRaster`] to be added to @@ -282,12 +287,12 @@ methods::setMethod( #' specific as they are added directly to the overall estimate of \code{`y^hat`}. #' #' Note that all offsets created by this function are by default log-transformed before export. -#' Background values (e.g. beyond [`distance_max`]) are set to a very small +#' Background values (e.g. beyond \code{"distance_max"}) are set to a very small #' constant (\code{1e-10}). #' #' @inheritParams add_offset #' @param distance_max A [`numeric`] threshold on the maximum distance beyond the range that should be considered -#' to have a high likelihood of containing species occurrences (Default: \code{Inf} [m]). Can be set to \code{NULL} or \code{0} +#' to have a high likelihood of containing species occurrences (Default: \code{Inf} \code{"m"}). Can be set to \code{NULL} or \code{0} #' to indicate that no distance should be calculated. #' @param family A [`character`] denoting the type of model to which this offset is to be added. By default #' it assumes a \code{'poisson'} distributed model and as a result the output created by this function will be log-transformed. @@ -305,11 +310,12 @@ methods::setMethod( #' @param field_occurrence A [`numeric`] or [`character`] location of biodiversity point records. #' @param fraction An optional [`SpatRaster`] object that is multiplied with digitized raster layer. #' Can be used to for example to remove or reduce the expected value (Default: \code{NULL}). -#' @seealso [`bossMaps`] +#' @seealso \code{"bossMaps"} #' @references #' * Merow, C., Wilson, A.M., Jetz, W., 2017. Integrating occurrence data and expert maps for improved species range predictions. Glob. Ecol. Biogeogr. 26, 243–258. https://doi.org/10.1111/geb.12539 #' * Merow, C., Allen, J.M., Aiello-Lammens, M., Silander, J.A., 2016. Improving niche and range estimates with Maxent and point process models by integrating spatially explicit information. Glob. Ecol. Biogeogr. 25, 1022–1036. https://doi.org/10.1111/geb.12453 #' @returns Adds a range offset to a [`distribution`] object. +#' @aliases add_offset_range #' @examples #' \dontrun{ #' # Adds the offset to a distribution object @@ -335,7 +341,7 @@ methods::setGeneric( #' Function for when raster is directly supplied (precomputed) #' @name add_offset_range #' @rdname add_offset_range -#' @usage \S4method{add_offset_range}{BiodiversityDistribution, SpatRaster}(x, layer) +#' @usage \S4method{add_offset_range}{BiodiversityDistribution,SpatRaster,ANY,logical}(x,layer,fraction,add) methods::setMethod( "add_offset_range", methods::signature(x = "BiodiversityDistribution", layer = "SpatRaster"), @@ -391,7 +397,7 @@ methods::setMethod( #' @name add_offset_range #' @rdname add_offset_range -#' @usage \S4method{add_offset_range}{BiodiversityDistribution, sf}(x, layer) +#' @usage \S4method{add_offset_range}{BiodiversityDistribution,sf,numeric,character,numeric,logical,character,character,ANY,ANY,logical}(x,layer,distance_max,family,presence_prop,distance_clip,distance_function,field_occurrence,fraction,point,add) methods::setMethod( "add_offset_range", methods::signature(x = "BiodiversityDistribution", layer = "sf"), @@ -673,7 +679,6 @@ methods::setMethod( # suppressMessages( attach(holdEnv) ) # on.exit(detach("holdEnv")) xx <- pp[i,] |> as.list() - xx$x <- x # Fit Model if(family == "binomial"){ @@ -687,6 +692,8 @@ methods::setMethod( ) ) } else { + # Add estimate here + xx$x <- x suppressMessages( suppressWarnings( logisticParam <- try({ @@ -710,7 +717,6 @@ methods::setMethod( # suppressMessages( attach(holdEnv) ) # on.exit(detach("holdEnv")) xx <- pp[which.min(result$aic),] |> as.list() - xx$x <- x # Fit Model if(family == "binomial"){ suppressMessages( @@ -723,6 +729,7 @@ methods::setMethod( ) ) } else { + xx$x <- x suppressMessages( suppressWarnings( logisticParam <- try({ @@ -735,11 +742,13 @@ methods::setMethod( } # Get the coefficients of the best model + if(inherits(logisticParam, "try-error")) stop("Offset calculating failed...") co <- logisticParam$coefficients names(co) <- c("upper", "lower", "rate", "shift", "skew") return(co) } +#### Elevational offset #### #' Specify elevational preferences as offset #' #' @description @@ -749,7 +758,7 @@ methods::setMethod( #' Specifically this functions calculates a continuous decay and decreasing probability of a species to occur #' from elevation limits. It requires a [`SpatRaster`] with elevation information. #' A generalized logistic transform (aka Richard's curve) is used to calculate decay from the suitable elevational -#' areas, with the [`rate`] parameter allowing to vary the steepness of decline. +#' areas, with the \code{"rate"} parameter allowing to vary the steepness of decline. #' #' Note that all offsets created by this function are by default log-transformed before export. In addition #' this function also mean-centers the output as recommended by Ellis-Soto et al. @@ -765,6 +774,7 @@ methods::setMethod( #' * Ellis‐Soto, D., Merow, C., Amatulli, G., Parra, J.L., Jetz, W., 2021. Continental‐scale 1 km hummingbird diversity derived from fusing point records with lateral and elevational expert information. Ecography (Cop.). 44, 640–652. https://doi.org/10.1111/ecog.05119 #' * Merow, C., Allen, J.M., Aiello-Lammens, M., Silander, J.A., 2016. Improving niche and range estimates with Maxent and point process models by integrating spatially explicit information. Glob. Ecol. Biogeogr. 25, 1022–1036. https://doi.org/10.1111/geb.12453 #' @returns Adds a elevational offset to a [`distribution`] object. +#' @aliases add_offset_elevation #' @examples #' \dontrun{ #' # Adds the offset to a distribution object @@ -787,7 +797,7 @@ methods::setGeneric( #' @name add_offset_elevation #' @rdname add_offset_elevation -#' @usage \S4method{add_offset_elevation}{BiodiversityDistribution, SpatRaster, numeric}(x, elev, pref) +#' @usage \S4method{add_offset_elevation}{BiodiversityDistribution,SpatRaster,numeric,numeric,logical}(x,elev,pref,rate,add) methods::setMethod( "add_offset_elevation", methods::signature(x = "BiodiversityDistribution", elev = "SpatRaster", pref = "numeric"), diff --git a/R/add_predictors.R b/R/add_predictors.R index cd554223..7ac3084f 100644 --- a/R/add_predictors.R +++ b/R/add_predictors.R @@ -15,7 +15,7 @@ NULL #' the original provided predictors alone, but instead create new ones, for instance by transforming #' their values through a quadratic or hinge transformation. Note that this effectively #' increases the number of predictors in the object, generally requiring stronger regularization by -#' the used [`engine`]. +#' the used [`Engine`]. #' Both transformations and derivates can also be combined. #' Available options for transformation are: #' * \code{'none'} - Leaves the provided predictors in the original scale. @@ -49,13 +49,13 @@ NULL #' @param ... Other parameters passed down #' @note #' **Important:** -#' Not every [`engine`] supported by the \pkg{ibis.iSDM} R-package allows missing data points +#' Not every [`Engine`] supported by the \pkg{ibis.iSDM} R-package allows missing data points #' among extracted covariates. Thus any observation with missing data is generally removed prior #' from model fitting. Thus ensure that covariates have appropriate no-data settings (for instance setting \code{NA} #' values to \code{0} or another out of range constant). #' #' Not every engine does actually need covariates. For instance it is perfectly legit -#' to fit a model with only occurrence data and a spatial latent effect ([add_latent]). +#' to fit a model with only occurrence data and a spatial latent effect ([add_latent_spatial]). #' This correspondents to a spatial kernel density estimate. #' #' Certain names such \code{"offset"} are forbidden as predictor variable names. The function @@ -82,7 +82,7 @@ methods::setGeneric( #' @name add_predictors #' @rdname add_predictors -#' @usage \S4method{add_predictors}{BiodiversityDistribution,SpatRasterCollection}(x, env) +#' @usage \S4method{add_predictors}{BiodiversityDistribution,SpatRasterCollection,ANY,character,character,numeric,ANY,logical,logical,logical,ANY}(x,env,names,transform,derivates,derivate_knots,int_variables,bgmask,harmonize_na,explode_factors,priors,...) methods::setMethod( "add_predictors", methods::signature(x = "BiodiversityDistribution", env = "SpatRasterCollection"), @@ -98,7 +98,7 @@ methods::setMethod( #' @name add_predictors #' @rdname add_predictors -#' @usage \S4method{add_predictors}{BiodiversityDistribution, SpatRaster}(x, env) +#' @usage \S4method{add_predictors}{BiodiversityDistribution,SpatRaster,ANY,character,character,numeric,ANY,logical,logical,logical,ANY}(x,env,names,transform,derivates,derivate_knots,int_variables,bgmask,harmonize_na,explode_factors,priors,...) methods::setMethod( "add_predictors", methods::signature(x = "BiodiversityDistribution", env = "SpatRaster"), @@ -236,7 +236,7 @@ methods::setMethod( #' @name add_predictors #' @rdname add_predictors -#' @usage \S4method{add_predictors}{BiodiversityDistribution, stars}(x, env) +#' @usage \S4method{add_predictors}{BiodiversityDistribution,stars,ANY,character,character,numeric,ANY,logical,logical,logical,ANY}(x,env,names,transform,derivates,derivate_knots,int_variables,bgmask,harmonize_na,explode_factors,priors,...) methods::setMethod( "add_predictors", methods::signature(x = "BiodiversityDistribution", env = "stars"), @@ -263,6 +263,13 @@ methods::setMethod( #' @param lower [`numeric`] value for a lower elevational preference of a species. #' @param upper [`numeric`] value for a upper elevational preference of a species. #' @param transform [`character`] Any optional transformation to be applied. Usually not needed (Default: \code{"none"}). +#' @aliases add_predictor_elevationpref +#' @examples +#' \dontrun{ +#' distribution(background) |> +#' add_predictor_elevationpref(elevation, lower = 200, upper = 1000) +#' } +#' #' @name add_predictor_elevationpref NULL @@ -277,7 +284,7 @@ methods::setGeneric( #' @name add_predictor_elevationpref #' @rdname add_predictor_elevationpref -#' @usage \S4method{add_predictor_elevationpref}{BiodiversityDistribution, ANY, numeric, numeric, character}(x, layer, lower, upper, transform) +#' @usage \S4method{add_predictor_elevationpref}{BiodiversityDistribution,ANY,numeric,numeric,character}(x,layer,lower,upper,transform) methods::setMethod( "add_predictor_elevationpref", methods::signature(x = "BiodiversityDistribution", layer = "ANY", lower = "numeric", upper = "numeric"), @@ -356,7 +363,7 @@ methods::setMethod( #' This function allows to add a species range which is usually drawn by experts in a separate process #' as spatial explicit prior. Both [`sf`] and [`SpatRaster`]-objects are supported as input. #' -#' Users are advised to look at the [`bossMaps`] R-package presented as part of Merow et al. (2017), +#' Users are advised to look at the \code{"bossMaps"} R-package presented as part of Merow et al. (2017), #' which allows flexible calculation of non-linear distance transforms from the boundary of the range. #' Outputs of this package could be added directly to this function. #' **Note that this function adds the range as predictor and not as offset. For this purpose a separate function [`add_offset_range()`] exists.** @@ -373,6 +380,13 @@ methods::setMethod( #' @param fraction An optional [`SpatRaster`] object that is multiplied with digitized raster layer. #' Can be used to for example to remove or reduce the expected value (Default: \code{NULL}). #' @param priors A [`PriorList-class`] object. Default is set to NULL which uses default prior assumptions. +#' @aliases add_predictor_range +#' @examples +#' \dontrun{ +#' distribution(background) |> +#' add_predictor_range(range, method = "distance", distance_max = 2) +#' } +#' #' @references #' * Merow, C., Wilson, A. M., & Jetz, W. (2017). Integrating occurrence data and expert maps for improved species range predictions. Global Ecology and Biogeography, 26(2), 243–258. https://doi.org/10.1111/geb.12539 #' @name add_predictor_range @@ -390,7 +404,7 @@ methods::setGeneric( #' Function for when distance raster is directly supplied (precomputed) #' @name add_predictor_range #' @rdname add_predictor_range -#' @usage \S4method{add_predictor_range}{BiodiversityDistribution, SpatRaster}(x, layer) +#' @usage \S4method{add_predictor_range}{BiodiversityDistribution,SpatRaster,character,ANY,ANY}(x,layer,method,fraction,priors) methods::setMethod( "add_predictor_range", methods::signature(x = "BiodiversityDistribution", layer = "SpatRaster"), @@ -440,7 +454,7 @@ methods::setMethod( #' @name add_predictor_range #' @rdname add_predictor_range -#' @usage \S4method{add_predictor_range}{BiodiversityDistribution, sf}(x, layer) +#' @usage \S4method{add_predictor_range}{BiodiversityDistribution,sf,character,numeric,ANY,ANY}(x,layer,method,distance_max,fraction,priors) methods::setMethod( "add_predictor_range", methods::signature(x = "BiodiversityDistribution", layer = "sf"), @@ -544,6 +558,7 @@ methods::setMethod( #' See Examples. #' @param x [distribution()] (i.e. [`BiodiversityDistribution-class`]) object. #' @param names [`vector`] A Vector of character names describing the environmental stack. +#' @aliases rm_predictors #' @examples #' \dontrun{ #' distribution(background) |> @@ -564,12 +579,12 @@ methods::setGeneric( #' @name rm_predictors #' @rdname rm_predictors -#' @usage \S4method{rm_predictors}{BiodiversityDistribution,vector}(x, names) +#' @usage \S4method{rm_predictors}{BiodiversityDistribution,ANY}(x,names) methods::setMethod( "rm_predictors", methods::signature(x = "BiodiversityDistribution", names = "character"), # rm_predictors ---- - function(x, names ) { + function(x, names) { assertthat::assert_that(inherits(x, "BiodiversityDistribution"), is.character(names) || assertthat::is.scalar(names) || is.vector(names) ) @@ -593,6 +608,7 @@ methods::setMethod( #' #' @param x [distribution()] (i.e. [`BiodiversityDistribution-class`]) object. #' @param names [`vector`] A Vector of character names describing the environmental stack. +#' @aliases sel_predictors #' @examples #' \dontrun{ #' distribution(background) |> @@ -613,12 +629,12 @@ methods::setGeneric( #' @name sel_predictors #' @rdname sel_predictors -#' @usage \S4method{sel_predictors}{BiodiversityDistribution,vector}(x, names) +#' @usage \S4method{sel_predictors}{BiodiversityDistribution,ANY}(x,names) methods::setMethod( "sel_predictors", methods::signature(x = "BiodiversityDistribution", names = "character"), # sel_predictors ---- - function(x, names ) { + function(x, names) { assertthat::assert_that(inherits(x, "BiodiversityDistribution"), is.character(names) || assertthat::is.scalar(names) || is.vector(names) ) @@ -641,7 +657,7 @@ methods::setMethod( # Add predictor actions for scenario objects ---- #' @name add_predictors #' @rdname add_predictors -#' @usage \S4method{add_predictors}{BiodiversityScenario, SpatRaster}(x, env) +#' @usage \S4method{add_predictors}{BiodiversityScenario,SpatRaster,ANY,character,character,numeric,ANY,logical}(x,env,names,transform,derivates,derivate_knots,int_variables,harmonize_na,...) methods::setMethod( "add_predictors", methods::signature(x = "BiodiversityScenario", env = "SpatRaster"), @@ -658,7 +674,7 @@ methods::setMethod( #' @name add_predictors #' @rdname add_predictors -#' @usage \S4method{add_predictors}{BiodiversityScenario, stars}(x, env) +#' @usage \S4method{add_predictors}{BiodiversityScenario,stars,ANY,character,character,numeric,ANY,logical}(x,env,names,transform,derivates,derivate_knots,int_variables,harmonize_na,...) methods::setMethod( "add_predictors", methods::signature(x = "BiodiversityScenario", env = "stars"), diff --git a/R/add_predictors_globiom.R b/R/add_predictors_globiom.R index 86551c94..44cc2e45 100644 --- a/R/add_predictors_globiom.R +++ b/R/add_predictors_globiom.R @@ -32,6 +32,7 @@ NULL #' @param priors A [`PriorList-class`] object. Default is set to \code{NULL} which uses default prior assumptions. #' @param ... Other parameters passed down #' @seealso [add_predictors] +#' @aliases add_predictors_globiom #' @examples #' \dontrun{ #' obj <- distribution(background) |> @@ -54,7 +55,7 @@ methods::setGeneric( #' @name add_predictors_globiom #' @rdname add_predictors_globiom -#' @usage \S4method{add_predictors_globiom}{BiodiversityDistribution, character}(x, fname) +#' @usage \S4method{add_predictors_globiom}{BiodiversityDistribution,character,ANY,character,character,numeric,ANY,logical,logical,ANY}(x,fname,names,transform,derivates,derivate_knots,int_variables,bgmask,harmonize_na,priors,...) methods::setMethod( "add_predictors_globiom", methods::signature(x = "BiodiversityDistribution", fname = "character"), @@ -168,7 +169,7 @@ methods::setMethod( #' @name add_predictors_globiom #' @rdname add_predictors_globiom -#' @usage \S4method{add_predictors_globiom}{BiodiversityScenario, character}(x, fname) +#' @usage \S4method{add_predictors_globiom}{BiodiversityScenario,character,ANY,character,character,numeric,ANY,logical}(x,fname,names,transform,derivates,derivate_knots,int_variables,harmonize_na,...) methods::setMethod( "add_predictors_globiom", methods::signature(x = "BiodiversityScenario", fname = "character"), @@ -284,8 +285,11 @@ methods::setMethod( #' Options include \code{"reference"} for the first entry, \code{"projection"} for all entries but the first, #' and \code{"all"} for all entries (Default: \code{"reference"}). #' @param template An optional [`SpatRaster`] object towards which projects should be transformed. +#' @param shares_to_area A [`logical`] on whether shares should be corrected to areas (if identified). +#' @param use_gdalutils (Deprecated) [`logical`] on to use gdalutils hack around. #' @param verbose [`logical`] on whether to be chatty. #' @return A [`SpatRaster`] stack with the formatted GLOBIOM predictors. +#' @aliases formatGLOBIOM #' #' @examples \dontrun{ #' # Expects a filename pointing to a netCDF file. @@ -293,7 +297,8 @@ methods::setMethod( #' } #' @keywords internal, utils formatGLOBIOM <- function(fname, oftype = "raster", ignore = NULL, - period = "all", template = NULL, + period = "all", template = NULL, shares_to_area = FALSE, + use_gdalutils = FALSE, verbose = getOption("ibis.setupmessages")){ assertthat::assert_that( file.exists(fname), @@ -302,6 +307,8 @@ formatGLOBIOM <- function(fname, oftype = "raster", ignore = NULL, is.null(ignore) || is.character(ignore), is.character(period), is.character(fname), + is.logical(shares_to_area), + is.logical(use_gdalutils), is.logical(verbose) ) period <- match.arg(period, c("reference", "projection", "all"), several.ok = FALSE) @@ -351,7 +358,16 @@ formatGLOBIOM <- function(fname, oftype = "raster", ignore = NULL, ) # Sometimes variables don't seem to have a time dimension - if(!"time" %in% names(stars::st_dimensions(ff))) next() + if(!"time" %in% names(stars::st_dimensions(ff))) { + if(shares_to_area && length(grep("area",names(ff)))>0){ + # Check that the unit is a unit + if(fatt$var[[v]]$units %in% c("km2","ha","m2")){ + sc_area <- ff + } + } else { + next() + } + } # Crop to background extent if set # if(!is.null(template)){ @@ -407,21 +423,22 @@ formatGLOBIOM <- function(fname, oftype = "raster", ignore = NULL, # FIXME: # MJ 14/11/2022 - The code below is buggy, resulting in odd curvilinear extrapolations for Europe # Hacky approach now is to convert to raster, crop, project and then convert back. - ff <- hack_project_stars(ff, template) - # Make background - # bg <- stars::st_as_stars(template) - # - # # Get resolution - # res <- sapply(stars::st_dimensions(bg), "[[", "delta") - # res[1:2] = abs(res[1:2]) # Assumes the first too entries are the coordinates - # assertthat::assert_that(!anyNA(res)) - # - # # And warp by projecting and resampling - # ff <- ff |> st_transform(crs = sf::st_crs(template)) |> - # stars::st_warp(crs = sf::st_crs(bg), - # cellsize = res, - # method = "near") |> - # stars:::st_transform.stars(crs = sf::st_crs(template)) + # Only use if gdalUtils is installed + if(("gdalUtils" %in% installed.packages()[,1])&&use_gdalutils){ + ff <- hack_project_stars(ff, template, use_gdalutils) + } else { + # Make background + bg <- stars::st_as_stars(template) + + # # Get resolution + res <- stars::st_res(bg) + assertthat::assert_that(!anyNA(res)) + + # # And warp by projecting and resampling + ff <- ff |> stars::st_warp(bg, crs = st_crs(bg), + cellsize = res, method = "near") |> + st_transform(crs = sf::st_crs(template)) + } # Overwrite full dimensions full_dis <- stars::st_dimensions(ff) } @@ -491,6 +508,15 @@ formatGLOBIOM <- function(fname, oftype = "raster", ignore = NULL, } } + # Correct shares to area if set + if(shares_to_area && inherits(sc_area,"stars")){ + # Transform and warp the shares + sc_area <- st_warp(sc_area, stars::st_as_stars(template), crs = sf::st_crs(sc),method = "near") + # grep those layers with the name share + shares <- grep(pattern = "share|fraction|proportion", names(sc),value = TRUE) + sc[shares] <- sc[shares] * sc_area + } + # Now format outputs depending on type, either returning the raster or the stars object if(oftype == "raster"){ # Output type raster, use function from utils_scenario diff --git a/R/add_priors.R b/R/add_priors.R index 5c52f3f4..461afe98 100644 --- a/R/add_priors.R +++ b/R/add_priors.R @@ -17,7 +17,10 @@ NULL #' @aliases add_priors #' @examples #' \dontrun{ -#' x <- distribution(background) +#' pp <- GLMNETPrior("forest") +#' x <- distribution(background) |> +#' add_priors(pp) +#' #' } #' @name add_priors NULL @@ -33,7 +36,7 @@ methods::setGeneric( #' @name add_priors #' @rdname add_priors -#' @usage \S4method{add_priors}{BiodiversityDistribution, ANY}(x, priors) +#' @usage \S4method{add_priors}{BiodiversityDistribution,ANY}(x,priors,...) methods::setMethod( "add_priors", methods::signature(x = "BiodiversityDistribution"), @@ -67,7 +70,7 @@ methods::setGeneric( #' @inherit add_priors #' @inheritParams add_priors #' @keywords deprecated -#' @usage \S4method{set_priors}{BiodiversityDistribution}(x) +#' @usage \S4method{set_priors}{BiodiversityDistribution,ANY}(x,priors,...) methods::setMethod( "set_priors", methods::signature(x = "BiodiversityDistribution"), @@ -76,7 +79,7 @@ methods::setMethod( is.null(priors) || inherits(priors, "PriorList") || inherits(priors, 'INLAPrior') || inherits(priors, 'GDBPrior') ) message('Deprecated. Use add_priors ') - add_priors(x, priors) + add_priors(x, priors, ...) } ) @@ -92,7 +95,12 @@ methods::setMethod( #' @family prior #' @examples #' \dontrun{ -#' TBD +#' # Add prior +#' pp <- GLMNETPrior("forest") +#' x <- distribution(background) |> +#' add_priors(pp) +#' # Remove again +#' x <- x |> rm_priors("forest") #' } #' @name rm_priors NULL @@ -108,7 +116,7 @@ methods::setGeneric( #' @name rm_priors #' @rdname rm_priors -#' @usage \S4method{rm_priors}{BiodiversityDistribution}(x) +#' @usage \S4method{rm_priors}{BiodiversityDistribution,ANY}(x,names,...) methods::setMethod( "rm_priors", methods::signature(x = "BiodiversityDistribution"), @@ -162,7 +170,7 @@ methods::setGeneric( #' @name get_priors #' @rdname get_priors -#' @usage \S4method{get_priors}{ANY, character}(mod, target_engine) +#' @usage \S4method{get_priors}{ANY,character}(mod,target_engine,...) methods::setMethod( "get_priors", methods::signature(mod = "ANY", target_engine = "character"), diff --git a/R/bdproto-biodiversitydistribution.R b/R/bdproto-biodiversitydistribution.R index 5d9316b3..4c5210c1 100644 --- a/R/bdproto-biodiversitydistribution.R +++ b/R/bdproto-biodiversitydistribution.R @@ -81,8 +81,13 @@ BiodiversityDistribution <- bdproto( return(o) }, # Set limits - set_limits = function(self, x){ - assertthat::assert_that(is.Raster(x)) + set_limits = function(self, x, mcp_buffer = 0, limits_clip = FALSE){ + assertthat::assert_that(is.Raster(x) || inherits(x, "sf"), + msg = "Provide a SpatRaster or sf object!") + # Construct limits object assuming zones + x <- list(layer = x, limits_method = "zones", + "mcp_buffer" = mcp_buffer, "limits_clip" = limits_clip) + bdproto(NULL, self, limits = x ) }, # Get provided limits @@ -137,7 +142,8 @@ BiodiversityDistribution <- bdproto( }, # Set new priors set_priors = function(self, x ){ - assertthat::assert_that(inherits(x, 'PriorList'), msg = 'An object created through `priors` has to be provided.') + assertthat::assert_that(inherits(x, 'PriorList'), + msg = 'An object created through `priors` has to be provided.') # Check if a priorlist is set. If yes, then combine the new one with existing priors if(is.Waiver(self$priors)){ bdproto(NULL, self, priors = x ) diff --git a/R/bdproto-biodiversityscenario.R b/R/bdproto-biodiversityscenario.R index 9e6ccf5d..0bcb258f 100644 --- a/R/bdproto-biodiversityscenario.R +++ b/R/bdproto-biodiversityscenario.R @@ -434,7 +434,7 @@ BiodiversityScenario <- bdproto( if(plot){ if( 'threshold' %in% attributes(self$get_data())$names ){ - if(has_name(out,"band")) out <- dplyr::rename(out, "time" = "band") + if(utils::hasName(out,"band")) out <- dplyr::rename(out, "time" = "band") ggplot2::ggplot(out, ggplot2::aes(x = time, y = as.numeric(area_km2))) + ggplot2::theme_classic(base_size = 18) + diff --git a/R/bdproto-distributionmodel.R b/R/bdproto-distributionmodel.R index 246224d3..4f30359c 100644 --- a/R/bdproto-distributionmodel.R +++ b/R/bdproto-distributionmodel.R @@ -1,439 +1,447 @@ -#' @include utils.R bdproto.R identifier.R -NULL - -#' @export -if (!methods::isClass("DistributionModel")) methods::setOldClass("DistributionModel") -NULL - -#' Prototype for the trained Model object -#' -#' All trained Models should inherit the options here -#' -#' @name DistributionModel-class -#' @aliases DistributionModel -#' @family bdproto -#' @keywords bdproto -NULL - -#' @export -DistributionModel <- bdproto( - "DistributionModel", - id = character(), # An id for any trained model - model = list(), - fits = list(), # List of fits with data - # Print message with summary of model - print = function(self) { - # TODO: Have a lot more information in here and to be prettified - - # Check whether prediction exists and number of layers - has_prediction <- "prediction" %in% self$show_rasters() - # Check whether threshold has been calculated - has_threshold <- grep('threshold',self$show_rasters(),value = TRUE)[1] - - # FIXME: Have engine-specific code moved to engine - if(inherits(self, 'INLA-Model') || inherits(self, 'INLABRU-Model') ){ - if( length( self$fits ) != 0 ){ - # Get strongest effects - ms <- subset(tidy_inla_summary(self$get_data('fit_best')), - select = c('variable', 'mean')) - ms <- ms[order(ms$mean,decreasing = TRUE),] # Sort - - message(paste0( - 'Trained ',class(self)[1],' (',self$show(),')', - '\n \033[2mStrongest summary effects:\033[22m', - '\n \033[34mPositive:\033[39m ', name_atomic(ms$variable[ms$mean>0]), - '\n \033[31mNegative:\033[39m ', name_atomic(ms$variable[ms$mean<0]), - ifelse(has_prediction, - paste0("\n Prediction fitted: ",text_green("yes")), - ""), - ifelse(!is.na(has_threshold), - paste0("\n Threshold created: ",text_green("yes")), - "") - )) - } - } else if( inherits(self, 'GDB-Model') ) { - - # Get Variable importance - vi <- mboost::varimp( - self$get_data('fit_best') - ) - vi <- sort( vi[which(vi>0)],decreasing = TRUE ) - - message(paste0( - 'Trained ',class(self)[1],' (',self$show(),')', - '\n \033[2mStrongest effects:\033[22m', - '\n ', name_atomic(names(vi)), - ifelse(has_prediction, - paste0("\n Prediction fitted: ",text_green("yes")), - ""), - ifelse(!is.na(has_threshold), - paste0("\n Threshold created: ",text_green("yes")), - "") - )) - } else if( inherits(self, 'BART-Model') ) { - # Calculate variable importance from the posterior trees - vi <- varimp.bart(self$get_data('fit_best')) - - message(paste0( - 'Trained ',class(self)[1],' (',self$show(),')', - '\n \033[2mStrongest effects:\033[22m', - '\n ', name_atomic(vi$names), - ifelse(has_prediction, - paste0("\n Prediction fitted: ",text_green("yes")), - ""), - ifelse(!is.na(has_threshold), - paste0("\n Threshold created: ",text_green("yes")), - "") - )) - } else if( inherits(self, 'STAN-Model') ) { - # Calculate variable importance from the posterior - vi <- rstan::summary(self$get_data('fit_best'))$summary |> as.data.frame() |> - tibble::rownames_to_column(var = "parameter") |> as.data.frame() - # Get beta coefficients only - vi <- vi[grep("beta", vi$parameter,ignore.case = TRUE),] - - # Get variable names from model object - # FIXME: This might not work for all possible modelling objects. For instance - model <- self$model - assertthat::assert_that(nrow(vi) == length(model$predictors_names), - length(vi$parameter) == length(model$predictors_names)) - vi$parameter <- model$predictors_names - - vi <- vi[order(abs(vi$mean),decreasing = TRUE),] - message(paste0( - 'Trained ',class(self)[1],' (',self$show(),')', - '\n \033[2mStrongest summary effects:\033[22m', - '\n \033[34mPositive:\033[39m ', name_atomic(vi$parameter[vi$mean>0]), - '\n \033[31mNegative:\033[39m ', name_atomic(vi$parameter[vi$mean<0]) - )) - } else if( inherits(self, 'XGBOOST-Model') ) { - vi <- xgboost::xgb.importance(model = self$get_data('fit_best'),) - - message(paste0( - 'Trained ',class(self)[1],' (',self$show(),')', - '\n \033[2mStrongest effects:\033[22m', - '\n ', name_atomic(vi$Feature), - ifelse(has_prediction, - paste0("\n Prediction fitted: ",text_green("yes")), - ""), - ifelse(!is.na(has_threshold), - paste0("\n Threshold created: ",text_green("yes")), - "") - )) - } else if( inherits(self, 'BREG-Model') ) { - obj <- self$get_data('fit_best') - # Summarize the beta coefficients from the posterior - ms <- posterior::summarise_draws(obj$beta) |> - subset(select = c('variable', 'mean')) - # Reorder - ms <- ms[order(ms$mean,decreasing = TRUE),] # Sort - - message(paste0( - 'Trained ',class(self)[1],' (',self$show(),')', - '\n \033[2mStrongest summary effects:\033[22m', - '\n \033[34mPositive:\033[39m ', name_atomic(ms$variable[ms$mean>0]), - '\n \033[31mNegative:\033[39m ', name_atomic(ms$variable[ms$mean<0]), - ifelse(has_prediction, - paste0("\n Prediction fitted: ",text_green("yes")), - ""), - ifelse(!is.na(has_threshold), - paste0("\n Threshold created: ",text_green("yes")), - "") - )) - } else if(inherits(self, 'GLMNET-Model')) { - obj <- self$get_data('fit_best') - - # Summarise coefficients within 1 standard deviation - ms <- tidy_glmnet_summary(obj) - - message(paste0( - 'Trained ',class(self)[1],' (',self$show(),')', - '\n \033[1mStrongest summary effects:\033[22m', - '\n \033[34mPositive:\033[39m ', name_atomic(ms$variable[ms$mean>0]), - '\n \033[31mNegative:\033[39m ', name_atomic(ms$variable[ms$mean<0]), - ifelse(has_prediction, - paste0("\n Prediction fitted: ",text_green("yes")), - ""), - ifelse(!is.na(has_threshold), - paste0("\n Threshold created: ",text_green("yes")), - "") - )) - - } else { - message(paste0( - 'Trained distribution model (',self$show(),')', - text_red('\n No fitted model found!'), - ifelse(has_prediction, - paste0("\n Prediction fitted: ",text_green("yes")), - ""), - ifelse(!is.na(has_threshold), - paste0("\n Threshold created: ",text_green("yes")), - "") - )) - } - }, - # Show the name of the Model - show = function(self) { - self$model$runname - }, - # Plot the prediction - plot = function(self, what = 'mean'){ - if( length( self$fits ) != 0 && !is.null( self$fits$prediction ) ){ - pred <- self$get_data('prediction') - assertthat::assert_that(is.Raster(pred)) - # Check if median is requested but not present, change to q50 - if(what == "median" && !(what %in% names(pred))) { what <- "q50" } - - # Match argument - what <- match.arg(what, names(pred), several.ok = FALSE) - assertthat::assert_that( what %in% names(pred),msg = paste0('Prediction type not found. Available: ', paste0(names(pred),collapse = '|'))) - terra::plot(pred[[what]], - main = paste0(self$model$runname, ' prediction (',what,')'), - box = FALSE, - axes = TRUE, - colNA = NA, col = ibis_colours[['sdm_colour']] - ) - } else { - message( - paste0('No model predictions found.') - ) - } - }, - # Plot threshold - plot_threshold = function(self, what = 1){ - assertthat::assert_that(is.numeric(what) || is.character(what)) - # Determines whether a threshold exists and plots it - rl <- self$show_rasters() - if(length(grep('threshold',rl))>0){ - - # Get stack of computed thresholds - ras <- self$get_data( grep('threshold', rl, value = TRUE)[[what]] ) - suppressWarnings( - ras <- terra::droplevels(ras) - ) - # Get colour palette - format <- attr(ras[[1]], 'format') # Format attribute - if(format == "normalize"){ - col <- colorRampPalette(c("grey","#EB072F","#FFE900","#5A94DD","black"))(100) - } else if(format == "percentile") { - col <- colorRampPalette(c("grey","#EB072F","#FFE900","#5A94DD","black"))(length(unique(ras))) - } else { - # Binary - col <- c("grey", "black") - } - terra::plot(ras, - box = FALSE, - axes = TRUE, - colNA = NA, col = col - ) - } else { - message("No computed threshold was found!") - invisible() - } - }, - # Show model run time if settings exist - show_duration = function(self){ - if(!is.Waiver(self$settings)) self$settings$duration() - }, - # Get effects or importance tables from model - summary = function(self, obj = 'fit_best'){ - # Distinguishing between model types - if(inherits(self, 'GDB-Model')){ - clean_mboost_summary( self$get_data(obj) ) - } else if(inherits(self, 'INLA-Model') || inherits(self, 'INLABRU-Model')){ - tidy_inla_summary(self$get_data(obj)) - } else if(inherits(self, 'BART-Model')){ - # Number of times each variable is used by a tree split - # Tends to become less informative with higher numbers of splits - varimp.bart(self$get_data(obj)) |> tibble::remove_rownames() - } else if(inherits(self, 'STAN-Model')){ - vi <- rstan::summary(self$get_data(obj))$summary |> as.data.frame() |> - tibble::rownames_to_column(var = "parameter") |> as.data.frame() - # Get beta coefficients only - vi <- vi[grep("beta", vi$parameter,ignore.case = TRUE),] - # FIXME: This might not work for all possible modelling objects. For instance - model <- self$model - assertthat::assert_that(nrow(vi) == length(model$predictors_names), - length(vi$parameter) == length(model$predictors_names)) - vi$parameter <- model$predictors_names - names(vi) <- make.names(names(vi)) - return( tibble::as_tibble( vi ) ) - } else if(inherits(self, 'BREG-Model')){ - posterior::summarise_draws(self$get_data(obj)$beta) - } else if(inherits(self, "XGBOOST-Model")){ - xgboost::xgb.importance(model = self$get_data(obj)) - } else if(inherits(self, 'GLMNET-Model')){ - tidy_glmnet_summary(self$get_data(obj)) - } - }, - # Dummy partial response calculation. To be overwritten per engine - partial = function(self){ - new_waiver() - }, - # Dummy spartial response calculation. To be overwritten per engine - spartial = function(self){ - new_waiver() - }, - # Generic plotting function for effect plots - effects = function(self, x = 'fit_best', what = 'fixed', ...){ - assertthat::assert_that(is.character(what)) - if(inherits(self, 'GDB-Model')){ - # How many effects - n <- length( stats::coef( self$get_data(x) )) - # Use the base plotting - par.ori <- graphics::par(no.readonly = TRUE) - graphics::par(mfrow = c(ceiling(n/3),3)) - - mboost:::plot.mboost(x = self$get_data(x), - type = 'b',cex.axis=1.5, cex.lab=1.5) - - graphics::par(par.ori)#dev.off() - } else if(inherits(self, 'INLA-Model')) { - plot_inla_marginals(self$get_data(x),what = what) - } else if(inherits(self, 'GLMNET-Model')) { - if(what == "fixed"){ - glmnet:::plot.glmnet(self$get_data(x)$glmnet.fit, xvar = "lambda") # Deviance explained - } else{ plot(self$get_data(x)) } - } else if(inherits(self, 'STAN-Model')) { - # Get true beta parameters - ra <- grep("beta", names(self$get_data(x)),value = TRUE) # Get range - rstan::stan_plot(self$get_data(x), pars = ra) - } else if(inherits(self, 'INLABRU-Model')) { - # Use inlabru effect plot - ggplot2::ggplot() + - inlabru::gg(self$get_data(x)$summary.fixed, bar = TRUE) - } else if(inherits(self, 'BART-Model')){ - message('Calculating partial dependence plots') - self$partial(self$get_data(x), x.vars = what, ...) - } else if(inherits(self, 'BREG-Model')){ - obj <- self$get_data(x) - if(what == "fixed") what <- "coefficients" - what <- match.arg(what, choices = c("coefficients", "scaled.coefficients","residuals", - "size", "fit", "help", "inclusion"), several.ok = FALSE) - if( length( grep("poisson", obj$call) ) > 0 ){ - BoomSpikeSlab::plot.poisson.spike(obj, y = what) - } else if( length( grep("binomial", obj$call) ) > 0 ){ - BoomSpikeSlab::plot.logit.spike(obj, y = what) - } else { - BoomSpikeSlab::plot.lm.spike(obj, y = what) - } - } else if(inherits(self, "XGBOOST-Model")){ - # Check whether linear model was fitted, otherwise plot tree - if( self$settings$get("only_linear") ){ - vi <- self$summary(x) - xgboost::xgb.ggplot.importance(vi) - } else { - obj <- self$get_data(x) - xgboost::xgb.plot.multi.trees(obj) - } - } else { - self$partial(self$get_data(x), x.vars = NULL) - } - }, - # Get equation - get_equation = function(self){ - self$get_data("fit_best_equation") - }, - # Get specific fit from this Model - get_data = function(self, x = "prediction") { - if (!x %in% names(self$fits)) - return(new_waiver()) - return(self$fits[[x]]) - }, - # Set fit for this Model - set_data = function(self, x, value) { - # Get biodiversity dataset collection - ff <- self$fits - # Set the object - ff[[x]] <- value - bdproto(NULL, self, fits = ff ) - }, - # Get the threshold value if calculated - get_thresholdvalue = function(self){ - # Determines whether a threshold exists and plots it - rl <- self$show_rasters() - if(length(grep('threshold',rl))==0) return( new_waiver() ) - - # Get the thresholded layer and return the respective attribute - obj <- self$get_data( grep('threshold',rl,value = TRUE) ) - assertthat::assert_that(assertthat::has_attr(obj, "threshold")) - return( - attr(obj, "threshold") - ) - }, - # List all rasters in object - show_rasters = function(self){ - rn <- names(self$fits) - rn <- rn[ which( sapply(rn, function(x) is.Raster(self$get_data(x)) ) ) ] - return(rn) - }, - # Get projection - get_projection = function(self){ - sf::st_crs(self$model$background) - }, - # Get resolution - get_resolution = function(self){ - if(!is.Waiver(self$get_data())){ - terra::res( self$get_data() ) - } else { - # Try to get it from the modelling object - self$model$predictors_object$get_resolution() - } - }, - # Remove calculated thresholds - rm_threshold = function(self){ - rl <- self$show_rasters() - if(length(grep('threshold',rl))>0){ - for(val in grep('threshold',rl,value = TRUE)){ - self$fits[[val]] <- NULL - } - } - invisible() - }, - # Calculate a suitability index - calc_suitabilityindex = function(self, method = "normalize"){ - assertthat::assert_that( - is.character(method), - is.Raster(self$get_data()) - ) - method <- match.arg(method, c("normalize", "reltotal"), several.ok = FALSE) - - # Get the raster of the mean prediction - ras <- self$get_data()[["mean"]] - if(method == "normalize"){ - out <- predictor_transform(ras, option = "norm") - } else { - out <- ras / terra::global(ras,"sum", na.rm = TRUE)[,1] - } - return(out) - }, - # Save object - save = function(self, fname, type = 'gtif', dt = 'FLT4S'){ - assertthat::assert_that( - is.character(fname), - type %in% c('gtif','gtiff','tif','nc','ncdf'), - 'fits' %in% self$ls(), - dt %in% c('LOG1S','INT1S','INT1U','INT2S','INT2U','INT4S','INT4U','FLT4S','FLT8S') - ) - type <- tolower(type) - - # Get raster file in fitted object - cl <- sapply(self$fits, class) - ras <- self$fits[[grep('raster', cl,ignore.case = T)]] - - # Check that no-data value is not present in ras - assertthat::assert_that(any(!terra::global(ras, "min", na.rm = TRUE)[,1] <= -9999), - msg = 'No data value -9999 is potentially in prediction!') - - if(file.exists(fname)) warning('Overwritting existing file...') - if(type %in% c('gtif','gtiff','tif')){ - # Save as geotiff - writeGeoTiff(ras, fname = fname, dt = dt) - } else if(type %in% c('nc','ncdf')) { - # Save as netcdf - # TODO: Potentially change the unit descriptions - writeNetCDF(ras, fname = fname, varName = 'iSDM prediction', varUnit = "",varLong = "") - } - invisible() - } -) +#' @include utils.R bdproto.R identifier.R +NULL + +#' @export +if (!methods::isClass("DistributionModel")) methods::setOldClass("DistributionModel") +NULL + +#' Prototype for the trained Model object +#' +#' All trained Models should inherit the options here +#' +#' @name DistributionModel-class +#' @aliases DistributionModel +#' @family bdproto +#' @keywords bdproto +NULL + +#' @export +DistributionModel <- bdproto( + "DistributionModel", + id = character(), # An id for any trained model + model = list(), + fits = list(), # List of fits with data + # Print message with summary of model + print = function(self) { + # TODO: Have a lot more information in here and to be prettified + + # Check whether prediction exists and number of layers + has_prediction <- "prediction" %in% self$show_rasters() + # Check whether threshold has been calculated + has_threshold <- grep('threshold',self$show_rasters(),value = TRUE)[1] + + # FIXME: Have engine-specific code moved to engine + if(inherits(self, 'INLA-Model') || inherits(self, 'INLABRU-Model') ){ + if( length( self$fits ) != 0 ){ + # Get strongest effects + ms <- subset(tidy_inla_summary(self$get_data('fit_best')), + select = c('variable', 'mean')) + ms <- ms[order(ms$mean,decreasing = TRUE),] # Sort + + message(paste0( + 'Trained ',class(self)[1],' (',self$show(),')', + '\n \033[2mStrongest summary effects:\033[22m', + '\n \033[34mPositive:\033[39m ', name_atomic(ms$variable[ms$mean>0]), + '\n \033[31mNegative:\033[39m ', name_atomic(ms$variable[ms$mean<0]), + ifelse(has_prediction, + paste0("\n Prediction fitted: ",text_green("yes")), + ""), + ifelse(!is.na(has_threshold), + paste0("\n Threshold created: ",text_green("yes")), + "") + )) + } + } else if( inherits(self, 'GDB-Model') ) { + + # Get Variable importance + vi <- mboost::varimp( + self$get_data('fit_best') + ) + vi <- sort( vi[which(vi>0)],decreasing = TRUE ) + + message(paste0( + 'Trained ',class(self)[1],' (',self$show(),')', + '\n \033[2mStrongest effects:\033[22m', + '\n ', name_atomic(names(vi)), + ifelse(has_prediction, + paste0("\n Prediction fitted: ",text_green("yes")), + ""), + ifelse(!is.na(has_threshold), + paste0("\n Threshold created: ",text_green("yes")), + "") + )) + } else if( inherits(self, 'BART-Model') ) { + # Calculate variable importance from the posterior trees + vi <- varimp.bart(self$get_data('fit_best')) + + message(paste0( + 'Trained ',class(self)[1],' (',self$show(),')', + '\n \033[2mStrongest effects:\033[22m', + '\n ', name_atomic(vi$names), + ifelse(has_prediction, + paste0("\n Prediction fitted: ",text_green("yes")), + ""), + ifelse(!is.na(has_threshold), + paste0("\n Threshold created: ",text_green("yes")), + "") + )) + } else if( inherits(self, 'STAN-Model') ) { + # Calculate variable importance from the posterior + vi <- rstan::summary(self$get_data('fit_best'))$summary |> as.data.frame() |> + tibble::rownames_to_column(var = "parameter") |> as.data.frame() + # Get beta coefficients only + vi <- vi[grep("beta", vi$parameter,ignore.case = TRUE),] + + # Get variable names from model object + # FIXME: This might not work for all possible modelling objects. For instance + model <- self$model + assertthat::assert_that(nrow(vi) == length(model$predictors_names), + length(vi$parameter) == length(model$predictors_names)) + vi$parameter <- model$predictors_names + + vi <- vi[order(abs(vi$mean),decreasing = TRUE),] + message(paste0( + 'Trained ',class(self)[1],' (',self$show(),')', + '\n \033[2mStrongest summary effects:\033[22m', + '\n \033[34mPositive:\033[39m ', name_atomic(vi$parameter[vi$mean>0]), + '\n \033[31mNegative:\033[39m ', name_atomic(vi$parameter[vi$mean<0]) + )) + } else if( inherits(self, 'XGBOOST-Model') ) { + vi <- xgboost::xgb.importance(model = self$get_data('fit_best'),) + + message(paste0( + 'Trained ',class(self)[1],' (',self$show(),')', + '\n \033[2mStrongest effects:\033[22m', + '\n ', name_atomic(vi$Feature), + ifelse(has_prediction, + paste0("\n Prediction fitted: ",text_green("yes")), + ""), + ifelse(!is.na(has_threshold), + paste0("\n Threshold created: ",text_green("yes")), + "") + )) + } else if( inherits(self, 'BREG-Model') ) { + obj <- self$get_data('fit_best') + # Summarize the beta coefficients from the posterior + ms <- posterior::summarise_draws(obj$beta) |> + subset(select = c('variable', 'mean')) + # Reorder + ms <- ms[order(ms$mean,decreasing = TRUE),] # Sort + + message(paste0( + 'Trained ',class(self)[1],' (',self$show(),')', + '\n \033[2mStrongest summary effects:\033[22m', + '\n \033[34mPositive:\033[39m ', name_atomic(ms$variable[ms$mean>0]), + '\n \033[31mNegative:\033[39m ', name_atomic(ms$variable[ms$mean<0]), + ifelse(has_prediction, + paste0("\n Prediction fitted: ",text_green("yes")), + ""), + ifelse(!is.na(has_threshold), + paste0("\n Threshold created: ",text_green("yes")), + "") + )) + } else if(inherits(self, 'GLMNET-Model')) { + obj <- self$get_data('fit_best') + + # Summarise coefficients within 1 standard deviation + ms <- tidy_glmnet_summary(obj) + + message(paste0( + 'Trained ',class(self)[1],' (',self$show(),')', + '\n \033[1mStrongest summary effects:\033[22m', + '\n \033[34mPositive:\033[39m ', name_atomic(ms$variable[ms$mean>0]), + '\n \033[31mNegative:\033[39m ', name_atomic(ms$variable[ms$mean<0]), + ifelse(has_prediction, + paste0("\n Prediction fitted: ",text_green("yes")), + ""), + ifelse(!is.na(has_threshold), + paste0("\n Threshold created: ",text_green("yes")), + "") + )) + + } else { + message(paste0( + 'Trained distribution model (',self$show(),')', + text_red('\n No fitted model found!'), + ifelse(has_prediction, + paste0("\n Prediction fitted: ",text_green("yes")), + ""), + ifelse(!is.na(has_threshold), + paste0("\n Threshold created: ",text_green("yes")), + "") + )) + } + }, + # Show the name of the Model + show = function(self) { + self$model$runname + }, + # Plot the prediction + plot = function(self, what = 'mean'){ + if( length( self$fits ) != 0 && !is.null( self$fits$prediction ) ){ + pred <- self$get_data('prediction') + assertthat::assert_that(is.Raster(pred)) + # Check if median is requested but not present, change to q50 + if(what == "median" && !(what %in% names(pred))) { what <- "q50" } + + # Match argument + what <- match.arg(what, names(pred), several.ok = FALSE) + assertthat::assert_that( what %in% names(pred),msg = paste0('Prediction type not found. Available: ', paste0(names(pred),collapse = '|'))) + terra::plot(pred[[what]], + main = paste0(self$model$runname, ' prediction (',what,')'), + box = FALSE, + axes = TRUE, + colNA = NA, col = ibis_colours[['sdm_colour']] + ) + } else { + message( + paste0('No model predictions found.') + ) + } + }, + # Plot threshold + plot_threshold = function(self, what = 1){ + assertthat::assert_that(is.numeric(what) || is.character(what)) + # Determines whether a threshold exists and plots it + rl <- self$show_rasters() + if(length(grep('threshold',rl))>0){ + + # Get stack of computed thresholds + ras <- self$get_data( grep('threshold', rl, value = TRUE)[[what]] ) + suppressWarnings( + ras <- terra::droplevels(ras) + ) + # Get colour palette + format <- attr(ras[[1]], 'format') # Format attribute + if(format == "normalize"){ + col <- colorRampPalette(c("grey","#EB072F","#FFE900","#5A94DD","black"))(100) + } else if(format == "percentile") { + col <- colorRampPalette(c("grey","#EB072F","#FFE900","#5A94DD","black"))(length(unique(ras)[,1])) + } else { + # Binary + col <- c("grey", "black") + } + terra::plot(ras, + box = FALSE, + axes = TRUE, + colNA = NA, col = col + ) + } else { + message("No computed threshold was found!") + invisible() + } + }, + # Show model run time if settings exist + show_duration = function(self){ + if(!is.Waiver(self$settings)) self$settings$duration() + }, + # Get effects or importance tables from model + summary = function(self, obj = 'fit_best'){ + # Distinguishing between model types + if(inherits(self, 'GDB-Model')){ + clean_mboost_summary( self$get_data(obj) ) + } else if(inherits(self, 'INLA-Model') || inherits(self, 'INLABRU-Model')){ + tidy_inla_summary(self$get_data(obj)) + } else if(inherits(self, 'BART-Model')){ + # Number of times each variable is used by a tree split + # Tends to become less informative with higher numbers of splits + varimp.bart(self$get_data(obj)) |> tibble::remove_rownames() + } else if(inherits(self, 'STAN-Model')){ + vi <- rstan::summary(self$get_data(obj))$summary |> as.data.frame() |> + tibble::rownames_to_column(var = "parameter") |> as.data.frame() + # Get beta coefficients only + vi <- vi[grep("beta", vi$parameter,ignore.case = TRUE),] + # FIXME: This might not work for all possible modelling objects. For instance + model <- self$model + assertthat::assert_that(nrow(vi) == length(model$predictors_names), + length(vi$parameter) == length(model$predictors_names)) + vi$parameter <- model$predictors_names + names(vi) <- make.names(names(vi)) + return( tibble::as_tibble( vi ) ) + } else if(inherits(self, 'BREG-Model')){ + posterior::summarise_draws(self$get_data(obj)$beta) + } else if(inherits(self, "XGBOOST-Model")){ + xgboost::xgb.importance(model = self$get_data(obj)) + } else if(inherits(self, 'GLMNET-Model')){ + tidy_glmnet_summary(self$get_data(obj)) + } + }, + # Model convergence check + has_converged = function(self){ + new_waiver() + }, + # Dummy residual function + get_residuals = function(self){ + new_waiver() + }, + # Dummy partial response calculation. To be overwritten per engine + partial = function(self){ + new_waiver() + }, + # Dummy spartial response calculation. To be overwritten per engine + spartial = function(self){ + new_waiver() + }, + # Generic plotting function for effect plots + effects = function(self, x = 'fit_best', what = 'fixed', ...){ + assertthat::assert_that(is.character(what)) + if(inherits(self, 'GDB-Model')){ + # How many effects + n <- length( stats::coef( self$get_data(x) )) + # Use the base plotting + par.ori <- graphics::par(no.readonly = TRUE) + graphics::par(mfrow = c(ceiling(n/3),3)) + + mboost:::plot.mboost(x = self$get_data(x), + type = 'b',cex.axis=1.5, cex.lab=1.5) + + graphics::par(par.ori)#dev.off() + } else if(inherits(self, 'INLA-Model')) { + plot_inla_marginals(self$get_data(x),what = what) + } else if(inherits(self, 'GLMNET-Model')) { + if(what == "fixed"){ + glmnet:::plot.glmnet(self$get_data(x)$glmnet.fit, xvar = "lambda") # Deviance explained + } else{ plot(self$get_data(x)) } + } else if(inherits(self, 'STAN-Model')) { + # Get true beta parameters + ra <- grep("beta", names(self$get_data(x)),value = TRUE) # Get range + rstan::stan_plot(self$get_data(x), pars = ra) + } else if(inherits(self, 'INLABRU-Model')) { + # Use inlabru effect plot + ggplot2::ggplot() + + inlabru::gg(self$get_data(x)$summary.fixed, bar = TRUE) + } else if(inherits(self, 'BART-Model')){ + message('Calculating partial dependence plots') + self$partial(self$get_data(x), x.vars = what, ...) + } else if(inherits(self, 'BREG-Model')){ + obj <- self$get_data(x) + if(what == "fixed") what <- "coefficients" + what <- match.arg(what, choices = c("coefficients", "scaled.coefficients","residuals", + "size", "fit", "help", "inclusion"), several.ok = FALSE) + if( length( grep("poisson", obj$call) ) > 0 ){ + BoomSpikeSlab::plot.poisson.spike(obj, y = what) + } else if( length( grep("binomial", obj$call) ) > 0 ){ + BoomSpikeSlab::plot.logit.spike(obj, y = what) + } else { + BoomSpikeSlab::plot.lm.spike(obj, y = what) + } + } else if(inherits(self, "XGBOOST-Model")){ + # Check whether linear model was fitted, otherwise plot tree + if( self$settings$get("only_linear") ){ + vi <- self$summary(x) + xgboost::xgb.ggplot.importance(vi) + } else { + obj <- self$get_data(x) + xgboost::xgb.plot.multi.trees(obj) + } + } else { + self$partial(self$get_data(x), x.vars = NULL) + } + }, + # Get equation + get_equation = function(self){ + self$get_data("fit_best_equation") + }, + # Get specific fit from this Model + get_data = function(self, x = "prediction") { + if (!x %in% names(self$fits)) + return(new_waiver()) + return(self$fits[[x]]) + }, + # Set fit for this Model + set_data = function(self, x, value) { + # Get biodiversity dataset collection + ff <- self$fits + # Set the object + ff[[x]] <- value + bdproto(NULL, self, fits = ff ) + }, + # Get the threshold value if calculated + get_thresholdvalue = function(self){ + # Determines whether a threshold exists and plots it + rl <- self$show_rasters() + if(length(grep('threshold',rl))==0) return( new_waiver() ) + + # Get the thresholded layer and return the respective attribute + obj <- self$get_data( grep('threshold',rl,value = TRUE) ) + assertthat::assert_that(assertthat::has_attr(obj, "threshold")) + return( + attr(obj, "threshold") + ) + }, + # List all rasters in object + show_rasters = function(self){ + rn <- names(self$fits) + rn <- rn[ which( sapply(rn, function(x) is.Raster(self$get_data(x)) ) ) ] + return(rn) + }, + # Get projection + get_projection = function(self){ + sf::st_crs(self$model$background) + }, + # Get resolution + get_resolution = function(self){ + if(!is.Waiver(self$get_data())){ + terra::res( self$get_data() ) + } else { + # Try to get it from the modelling object + self$model$predictors_object$get_resolution() + } + }, + # Remove calculated thresholds + rm_threshold = function(self){ + rl <- self$show_rasters() + if(length(grep('threshold',rl))>0){ + for(val in grep('threshold',rl,value = TRUE)){ + self$fits[[val]] <- NULL + } + } + invisible() + }, + # Calculate a suitability index + calc_suitabilityindex = function(self, method = "normalize"){ + assertthat::assert_that( + is.character(method), + is.Raster(self$get_data()) + ) + method <- match.arg(method, c("normalize", "reltotal"), several.ok = FALSE) + + # Get the raster of the mean prediction + ras <- self$get_data()[["mean"]] + if(method == "normalize"){ + out <- predictor_transform(ras, option = "norm") + } else { + out <- ras / terra::global(ras,"sum", na.rm = TRUE)[,1] + } + return(out) + }, + # Save object + save = function(self, fname, type = 'gtif', dt = 'FLT4S'){ + assertthat::assert_that( + is.character(fname), + type %in% c('gtif','gtiff','tif','nc','ncdf'), + 'fits' %in% self$ls(), + dt %in% c('LOG1S','INT1S','INT1U','INT2S','INT2U','INT4S','INT4U','FLT4S','FLT8S') + ) + type <- tolower(type) + + # Get raster file in fitted object + cl <- sapply(self$fits, class) + ras <- self$fits[[grep('raster', cl,ignore.case = T)]] + + # Check that no-data value is not present in ras + assertthat::assert_that(any(!terra::global(ras, "min", na.rm = TRUE)[,1] <= -9999), + msg = 'No data value -9999 is potentially in prediction!') + + if(file.exists(fname)) warning('Overwritting existing file...') + if(type %in% c('gtif','gtiff','tif')){ + # Save as geotiff + writeGeoTiff(ras, fname = fname, dt = dt) + } else if(type %in% c('nc','ncdf')) { + # Save as netcdf + # TODO: Potentially change the unit descriptions + writeNetCDF(ras, fname = fname, varName = 'iSDM prediction', varUnit = "",varLong = "") + } + invisible() + } +) diff --git a/R/bdproto-predictors.R b/R/bdproto-predictors.R index 4041fa23..d753fa2a 100644 --- a/R/bdproto-predictors.R +++ b/R/bdproto-predictors.R @@ -120,7 +120,7 @@ PredictorDataset <- bdproto( all(x %in% names(self$get_data())) ) # Match indices - ind <- match(x, self$get_names()) + ind <- base::match(x, self$get_names()) if(is.Raster(self$get_data() )){ # Overwrite predictor dataset if(base::length(ind) == base::length(self$get_names())){ diff --git a/R/bdproto-priorlist.R b/R/bdproto-priorlist.R index 83b58039..eb24e9d5 100644 --- a/R/bdproto-priorlist.R +++ b/R/bdproto-priorlist.R @@ -91,7 +91,7 @@ PriorList <- bdproto( if(!all(variable %in% vn)){ id <- NULL } else { - id <- names(vn)[match(variable, vn, nomatch = 0)] + id <- names(vn)[base::match(variable, vn, nomatch = 0)] } } return(id) diff --git a/R/bdproto.R b/R/bdproto.R index 248b0bec..bc135a78 100644 --- a/R/bdproto.R +++ b/R/bdproto.R @@ -40,6 +40,7 @@ NULL #' Abacus$subtract(10) #' #' @return A bdproto object. +#' @aliases bdproto #' @keywords bdproto #' @family bdproto #' @noRd diff --git a/R/check.R b/R/check.R new file mode 100644 index 00000000..b6c835f6 --- /dev/null +++ b/R/check.R @@ -0,0 +1,140 @@ +#' @include utils.R +NULL + +#' Check objects in the package for common errors or issues +#' +#' @description +#' Not always is there enough data or sufficient information to robustly +#' infer the suitable habitat or niche of a species. As many SDM algorithms are +#' essentially regression models, similar assumptions about model convergence, +#' homogeneity of residuals and inferrence usually apply (although often +#' ignored). +#' This function simply checks the respective input object for common issues or +#' mistakes. +#' +#' @details +#' Different checks are implemented depending on the supplied object +#' +#' * [`BiodiversityDistribution`] +#' - Checks if there are less than 200 observations +#' - TODO: Add rm_insufficient_covs link +#' +#' * [`DistributionModel`] +#' - Check model convergence +#' - Check if model is found +#' - Check if coefficients exist +#' - Check if there are unusal outliers in prediction (using 10median absolute deviation) +#' - Check if threshold is larger than layer +#' +#' * [`BiodiversityScenario`] +#' - +#' +#' @note +#' This function will likely be expanded with additional checks in the future. +#' If you have ideas, please let them know per issue. +#' +#' @param obj A [`BiodiversityDistribution`], [`DistributionModel`] or [`BiodiversityScenario`] object. +#' @param stoponwarning [`logical`] Should check return a stop if warning is raised? (Default: \code{FALSE}). +#' @name check +#' @returns Message outputs +#' @keywords misc +#' @aliases check +#' @examples +#' \dontrun{ +#' # Where mod is an estimated DistributionModel +#' check(mod) +#' } +#' @export +NULL + +#' @name check +#' @rdname check +#' @exportMethod check +#' @export +methods::setGeneric( + "check", + signature = methods::signature("obj", "stoponwarning"), + function(obj, stoponwarning = FALSE) standardGeneric("check")) + +#' @name check +#' @rdname check +#' @usage \S4method{check}{ANY,logical}(obj,stoponwarning) +methods::setMethod( + "check", + methods::signature(obj = "ANY"), + function(obj, stoponwarning = FALSE) { + assertthat::assert_that( + is.logical(stoponwarning) + ) + + # Messages + ms <- list(Warnings = character()) + + # Type specific checks + if(inherits(obj,"BiodiversityDistribution")){ + + # Are there enough observations? + if( sum( obj$biodiversity$get_observations() ) < 200 ){ + ms$Warnings <- append(ms$Warnings, "Not enough observations found.") + } + + } + if(inherits(obj, "DistributionModel")){ + + fit <- obj$get_data("fit_best") + # Did the model exist? + if(is.Waiver(fit)){ + ms$Warnings <- append(ms$Warnings, "No fitted model found!") + } + + # Sufficient iterations for converged? + if(!obj$has_converged()){ + ms$Warnings <- append(ms$Warnings, "Model likely has not converged!") + } + + # No coefficients? + if(nrow(obj$get_coefficients())==0){ + ms$Warnings <- append(ms$Warnings, "No coefficients in the model!") + } + + # Check if threshold exists, if so if it is equal to maximum + if(!is.Waiver(obj$get_thresholdvalue())){ + # Get prediction + pred <- obj$get_data() + if(obj$get_thresholdvalue() >= terra::global(pred,"max",na.rm=TRUE)[,1]){ + ms$Warnings <- append(ms$Warnings, "Threshold larger than prediction!") + } + } + + # Check for positive outliers in Prediction + if(!is.Waiver( obj$get_data() )){ + pred <- obj$get_data() + # Calculate outliers using the mad + pmed <- terra::global(pred, median, na.rm = TRUE)[,1] + abs_dev <- abs(pred[]-pmed)[,1] + # Calculate mad + pmad <- 1.4826 * median(abs_dev,na.rm = TRUE) + test <- terra::values(pred)[,1] > (pmed + (10 * pmad)) # 10x + # For a single outlier value, raise warning + if(length(test)==1){ + ms$Warnings <- append(ms$Warnings, "Single outlier in prediction. Likely overfit!") + } + } + } + if(inherits(obj, "BiodiversityScenario")){ + + } + + # Check for warnings + if(stoponwarning){ + # Check if there are any warnings + if(any(length(ms$Warnings)>0)){ + stop("Warning raised during checks!") + } + } + # Compile + for(entry in ms$Warnings){ + myLog("[Checks]",col = "yellow", entry) + } + } +) diff --git a/R/distribution.R b/R/distribution.R index 1cfafd00..ea95cf12 100644 --- a/R/distribution.R +++ b/R/distribution.R @@ -6,42 +6,52 @@ NULL #' @description #' This function creates an object that contains all the data, parameters and settings #' for building an (integrated) species distribution model. -#' Key functions to add data are [`add_biodiversity_*`], [`add_predictors()`], -#' [`add_latent_*`], [`engine_*`], [`add_priors`] and [`add_offset`]. It creates a -#' prototype [`BiodiversityDistribution`] object with its own functions. +#' Key functions to add data are [`add_biodiversity_poipo`] and the like, [`add_predictors`], +#' [`add_latent_spatial`], [`engine_glmnet`] or similar, [`add_priors`] and [`add_offset`]. +#' It creates a prototype [`BiodiversityDistribution`] object with its own functions. #' After setting input data and parameters, model predictions can then be created #' via the [train] function and predictions be created. #' #' Additionally, it is possible to specify a \code{"limit"} to any predictions conducted on #' the background. This can be for instance a buffered layer by a certain dispersal distance (Cooper and Soberon, 2018) -#' or a categorical layer representing biomes or soil conditions. See also the -#' frequently asked question (FAQ) section on the homepage for more information. +#' or a categorical layer representing biomes or soil conditions. +#' Another option is to create a constraint by constructing a minimum convex polygon (MCP) using +#' the supplied biodiversity data. This option can be enabled by setting +#' \code{"limits_method"} to \code{"mcp"}. It is also possible to provide a small buffer +#' to constructed MCP that way. +#' See the frequently asked question (FAQ) section on the homepage for more information. #' #' See **Details** for a description of the internal functions available #' to modify or summarize data within the created object. #' -#' **Note that any model requires at minimum a single added [biodiversity] dataset -#' as well as a specified [engine].** +#' **Note that any model requires at minimum a single added biodiversity dataset +#' as well as a specified engine.** #' #' @param background Specification of the modelling background. Must be a -#' [`SpatRaster`], [`sf`] or [`extent`] object. +#' [`SpatRaster`] or [`sf`] object. #' @param limits A [`SpatRaster`] or [`sf`] object that limits the prediction surface when #' intersected with input data (Default: \code{NULL}). +#' @param limits_method A [`character`] of the method used for hard limiting a projection. +#' Available options are \code{"none"} (Default), \code{"zones"} or \code{"mcp"}. +#' @param mcp_buffer A [`numeric`] distance to buffer the mcp (Default \code{0}). Only used if +#' \code{"mcp"} is used. +#' @param limits_clip [`logical`] Should the limits clip all predictors before fitting +#' a model (\code{TRUE}) or just the prediction (\code{FALSE}, default). #' #' @details #' This function creates a [`BiodiversityDistribution-class`] object that in itself contains #' other functions and stores parameters and (pre-)processed data. -#' A full list of functions available can be queried via \code{names(object)}. +#' A full list of functions available can be queried via \code{"names(object)"}. #' Some of the functions are not intended to be manipulated directly, -#' but rather through convenience functions (e.g. [`object$set_predictors()`]). +#' but rather through convenience functions (e.g. \code{"object$set_predictors()"}). #' Similarly other objects are stored in the [`BiodiversityDistribution-class`] object that -#' have their own functions as well and can be queried (e.g. [`names(object)`]). For a list of +#' have their own functions as well and can be queried (e.g. \code{"names(object)"}). For a list of #' functions see the reference documentation. By default, -#' if some datasets are not set, then a [`Waiver`] object is returned instead. +#' if some datasets are not set, then a \code{"Waiver"} object is returned instead. #' #' The following objects can be stored: #' * \code{object$biodiversity} A [`BiodiversityDatasetCollection`] object with the added biodiversity data. -#' * \code{object$engine} An [`engine`] object (e.g. [engine_inlabru]) with function depended on the added engine. +#' * \code{object$engine} An \code{"engine"} object (e.g. [`engine_inlabru()`]) with function depended on the added engine. #' * \code{object$predictors} A [`PredictorDataset`] object with all set predictions. #' * \code{object$priors} A [`PriorList`] object with all specified priors. #' * \code{object$log} A [`Log`] object that captures. @@ -51,8 +61,8 @@ NULL #' * \code{object$get_biodiversity_equations()} Lists the equations used for each biodiversity dataset with given id. Defaults to all predictors. #' * \code{object$get_biodiversity_types()} Lists the type of each specified biodiversity dataset with given id. #' * \code{object$get_extent()} Outputs the [terra::ext] of the modelling region. -#' * \code{object$show_background_info()} Returns a [`list`] with the [terra::ext] and the [sp::proj4string]. -#' * \code{object$get_extent_dimensions()} Outputs the [terra::ext] dimension by calling the [`extent_dimensions()`] function. +#' * \code{object$show_background_info()} Returns a [`list`] with the [terra::ext] and the [terra::crs]. +#' * \code{object$get_extent_dimensions()} Outputs the [terra::ext] dimension by calling the \code{"extent_dimensions()"} function. #' * \code{object$get_predictor_names()} Returns a [character] vector with the names of all added predictors. #' * \code{object$get_prior_variables()} Returns a description of [`priors`] added. #' @@ -60,7 +70,7 @@ NULL #' #' @returns [`BiodiversityDistribution-class`] object containing data for building a biodiversity distribution modelling problem. #' -#' @seealso [`bdproto`] on the general definition of [`proto`] objects and in particular [`bdproto-biodiversitydistribution`]. +#' @seealso \code{"bdproto"} on the general definition of [`proto`] objects and in particular [`BiodiversityDistribution`]. #' #' @references #' * Fletcher, R.J., Hefley, T.J., Robertson, E.P., Zuckerberg, B., McCleery, R.A., Dorazio, R.M., (2019) A practical guide for combining data to model species distributions. Ecology 100, e02710. https://doi.org/10.1002/ecy.2710 @@ -83,18 +93,21 @@ NULL #' @export methods::setGeneric("distribution", signature = methods::signature("background"), - function(background, limits = NULL) standardGeneric("distribution")) + function(background, limits = NULL, limits_method = "none", mcp_buffer = 0,limits_clip = FALSE) standardGeneric("distribution")) #' @name distribution -#' @usage \S4method{distribution}{SpatRaster, ANY}(background, limits) +#' @usage \S4method{distribution}{SpatRaster,ANY,character,numeric,logical}(background,limits,limits_method,mcp_buffer,limits_clip) #' @rdname distribution methods::setMethod( "distribution", methods::signature(background = "SpatRaster"), - function(background, limits = NULL) { + function(background, limits = NULL, limits_method = "none", mcp_buffer = 0,limits_clip = FALSE) { assertthat::assert_that(!missing(background) || !exists('background'), inherits(limits,'SpatRaster') || inherits(limits, 'sf') || inherits(limits, 'Spatial') || is.null(limits), - msg = 'No background file supplied!') + is.character(limits_method), + is.numeric(mcp_buffer), + is.logical(limits_clip), + msg = 'No background file supplied or limits misspecified!') # Check that arguments are valid assertthat::assert_that( inherits(background,'SpatRaster') ) @@ -104,30 +117,39 @@ methods::setMethod( ) # Rerun the distribution call with the object - distribution(newbg, limits) + distribution(newbg, limits, limits_method, mcp_buffer, limits_clip) }) #' @name distribution -#' @usage \S4method{distribution}{sf, ANY}(background, limits) +#' @usage \S4method{distribution}{sf,ANY,character,numeric,logical}(background,limits,limits_method,mcp_buffer,limits_clip) #' @rdname distribution methods::setMethod( "distribution", methods::signature(background = "sf"), - function(background, limits = NULL) { + function(background, limits = NULL, limits_method = "none", mcp_buffer = 0, limits_clip = FALSE) { # Check that arguments are valid assertthat::assert_that(!missing(background) || !exists('background'), inherits(limits,'SpatRaster') || inherits(limits, 'sf') || inherits(limits, 'Spatial') || is.null(limits), + is.character(limits_method), + is.numeric(mcp_buffer), + is.logical(limits_clip), msg = 'No background file supplied!') assertthat::assert_that( inherits(background,'sf'), unique(st_geometry_type(background)) %in% c('MULTIPOLYGON','POLYGON') ) + # Small checks on alternative limit functionalities + limits_method <- match.arg(limits_method, c("none","zones", "mcp"), several.ok = FALSE) + assertthat::assert_that(mcp_buffer>=0, msg = "Buffered mcp distance has to be positive!") + # Messenger if(getOption('ibis.setupmessages')) myLog('[Setup]','green','Creating distribution object...') # Convert limits if provided if(!is.null(limits)){ + # Set methods in case a layer was supplied + limits_method <- "zones" # Convert to polygon if raster if(inherits(limits,'SpatRaster')){ assertthat::assert_that(terra::is.factor(limits), @@ -147,10 +169,18 @@ methods::setMethod( # Get fir column and rename limits <- limits[,1]; names(limits) <- c('limit','geometry') - } + limits <- list(layer = limits, "limits_method" = "zones", + "mcp_buffer" = mcp_buffer, "limits_clip" = limits_clip) + } else if(limits_method == "mcp"){ + # Specify the option to calculate a mcp based on the added data. + # This is done directly in train. + limits <- list("layer" = NULL, "limits_method" = "mcp", + "mcp_buffer" = mcp_buffer, "limits_clip" = limits_clip) - # Convert to waiver if NULL - if(is.null(limits)) limits <- new_waiver() + } else { + # Convert to waiver if NULL + if(is.null(limits)) limits <- new_waiver() + } # Create BiodiversityDistribution object bdproto(NULL, BiodiversityDistribution, diff --git a/R/effects.R b/R/effects.R index cbe67bd7..9792961f 100644 --- a/R/effects.R +++ b/R/effects.R @@ -4,14 +4,14 @@ NULL #' Plot effects of trained model #' #' @description This functions is handy wrapper that calls the default plotting -#' functions for the model of a specific [engine]. Equivalent to +#' functions for the model of a specific engine. Equivalent to #' calling \code{effects} of a fitted [distribution] function. #' @note #' For some models, where default coefficients plots are not available, #' this function will attempt to generate [partial] dependency plots instead. #' @param object Any fitted [distribution] object. #' @param ... Not used. -#' +#' @aliases effects #' @examples #' \dontrun{ #' # Where mod is an estimated distribution model diff --git a/R/engine_bart.R b/R/engine_bart.R index dfdfdf55..d8a432f4 100644 --- a/R/engine_bart.R +++ b/R/engine_bart.R @@ -9,9 +9,9 @@ NULL #' algorithms (Dorie et al. 2019). Default prior preference is for trees to be small (few terminal nodes) #' and shrinkage towards \code{0}. #' -#' This package requires the [dbarts] R-package to be installed. -#' Many of the functionalities of this [engine] have been inspired by the [embarcadero] R-package. Users -#' are therefore advised to cite if they make heavy use of BART. +#' This package requires the \code{"dbarts"} R-package to be installed. +#' Many of the functionalities of this engine have been inspired by the \code{"embarcadero"} R-package. +#' Users are therefore advised to cite if they make heavy use of BART. #' @details #' Prior distributions can furthermore be set for: #' * probability that a tree stops at a node of a given depth (Not yet implemented) @@ -27,13 +27,14 @@ NULL #' * Carlson, CJ. embarcadero: Species distribution modelling with Bayesian additive regression trees in r. Methods Ecol Evol. 2020; 11: 850– 858. https://doi.org/10.1111/2041-210X.13389 #' * Dorie, V., Hill, J., Shalit, U., Scott, M., & Cervone, D. (2019). Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition. Statistical Science, 34(1), 43-68. #' * Vincent Dorie (2020). dbarts: Discrete Bayesian Additive Regression Trees Sampler. R package version 0.9-19. https://CRAN.R-project.org/package=dbarts -#' @returns An [engine]. +#' @returns An [Engine]. #' @examples #' \dontrun{ #' # Add BART as an engine #' x <- distribution(background) |> engine_bart(iter = 100) #' } #' @family engine +#' @aliases engine_bart #' @name engine_bart NULL #' @rdname engine_bart @@ -528,7 +529,7 @@ engine_bart <- function(x, assertthat::assert_that(x.var %in% attr(model$fit$data@x,'term.labels') || is.null(x.var), msg = 'Variable not in predicted model' ) bart_partial_effect(model, x.vars = x.var, - transform = self$settings$data$binary, values = values, ... ) + transform = self$settings$data$binary, values = values ) }, # Spatial partial dependence plot option from embercardo spartial = function(self, predictors, x.var = NULL, equal = FALSE, smooth = 1, transform = TRUE, type = NULL){ @@ -545,6 +546,23 @@ engine_bart <- function(x, # Also return spatial return(p) }, + # Model convergence check + has_converged = function(self){ + fit <- self$get_data("fit_best") + if(is.Waiver(fit)) return(FALSE) + return(TRUE) + }, + # Residual function + get_residuals = function(self){ + # Get best object + obj <- self$get_data("fit_best") + if(is.Waiver(obj)) return(obj) + # Get residuals + rd <- dbarts:::residuals.bart(obj) + if(length(rd)==0) rd <- new_waiver() + return(rd) + }, + # Coefficient function get_coefficients = function(self){ # Returns a vector of the coefficients with direction/importance cofs <- self$summary() diff --git a/R/engine_breg.R b/R/engine_breg.R index 520c7e9b..eef326a2 100644 --- a/R/engine_breg.R +++ b/R/engine_breg.R @@ -22,7 +22,8 @@ NULL #' * Nguyen, K., Le, T., Nguyen, V., Nguyen, T., & Phung, D. (2016, November). Multiple kernel learning with data augmentation. In Asian Conference on Machine Learning (pp. 49-64). PMLR. #' * Steven L. Scott (2021). BoomSpikeSlab: MCMC for Spike and Slab Regression. R package version 1.2.4. https://CRAN.R-project.org/package=BoomSpikeSlab #' @family engine -#' @returns An [engine]. +#' @returns An [Engine]. +#' @aliases engine_breg #' @examples #' \dontrun{ #' # Add BREG as an engine @@ -712,6 +713,22 @@ engine_breg <- function(x, if(length(int)>0) cofs <- cofs[-int,] return(cofs) }, + # Model convergence check + has_converged = function(self){ + fit <- self$get_data("fit_best") + if(is.Waiver(fit)) return(FALSE) + return(TRUE) + }, + # Residual function + get_residuals = function(self){ + # Get best object + obj <- self$get_data("fit_best") + if(is.Waiver(obj)) return(obj) + # Get residuals + rd <- obj$deviance.residuals + assertthat::assert_that(length(rd)>0) + return(rd) + }, # Engine-specific projection function project = function(self, newdata, type = NULL, layer = "mean"){ assertthat::assert_that("model" %in% names(self), diff --git a/R/engine_gdb.R b/R/engine_gdb.R index ce075ee8..e296ba90 100644 --- a/R/engine_gdb.R +++ b/R/engine_gdb.R @@ -4,20 +4,20 @@ NULL #' #' @description #' Gradient descent boosting is an efficient way to optimize any loss function -#' of a generalized linear or additive model (such as the GAMs available through the [mgcv] R-package). +#' of a generalized linear or additive model (such as the GAMs available through the \code{"mgcv"} R-package). #' It furthermore automatically regularizes the fit, thus the resulting model only contains the #' covariates whose baselearners have some influence on the response. -#' Depending on the type of the [add_biodiversity] data, either poisson process models or -#' logistic regressions are estimated. If the \code{only_linear} term in [train] is set to \code{FALSE}, +#' Depending on the type of the \code{add_biodiversity} data, either poisson process models or +#' logistic regressions are estimated. If the \code{"only_linear"} term in [train] is set to \code{FALSE}, #' splines are added to the estimation, thus providing a non-linear additive inference. #' #' @details: -#' This package requires the [mboost] R-package to be installed. -#' It is in philosophy somewhat related to the [engine_xgboost] and [XGBoost] R-package, +#' This package requires the \code{"mboost"} R-package to be installed. +#' It is in philosophy somewhat related to the [engine_xgboost] and \code{"XGBoost"} R-package, #' however providing some additional desirable features that make estimation quicker and #' particularly useful for spatial projections. Such as for instance the ability to specifically add -#' spatial baselearners via [add_latent] or the specification of monotonically constrained priors -#' via [GDBPrior]. +#' spatial baselearners via [add_latent_spatial] or the specification of +#' monotonically constrained priors via [GDBPrior]. #' @param x [distribution()] (i.e. [`BiodiversityDistribution-class`]) object. #' @param iter An [`integer`] giving the number of boosting iterations (Default: \code{2e3L}). #' @param learning_rate A bounded [`numeric`] value between \code{0} and \code{1} defining the shrinkage parameter. @@ -30,7 +30,8 @@ NULL #' * Hofner, B., Müller, J., Hothorn, T., (2011). Monotonicity-constrained species distribution models. Ecology 92, 1895–901. #' * Mayr, A., Hofner, B. and Schmid, M. (2012). The importance of knowing when to stop - a sequential stopping rule for component-wise gradient boosting. Methods of Information in Medicine, 51, 178–186. #' @family engine -#' @returns An[engine]. +#' @returns An engine. +#' @aliases engine_gdb #' @examples #' \dontrun{ #' # Add GDB as an engine @@ -549,46 +550,55 @@ engine_gdb <- function(x, partial = function(self, x.var, constant = NULL, variable_length = 100, values = NULL, plot = FALSE, type = NULL){ # Assert that variable(s) are in fitted model assertthat::assert_that( is.character(x.var),inherits(self$get_data('fit_best'), 'mboost'), - is.numeric(variable_length) ) + is.numeric(variable_length), + all(is.character(x.var))) # Unlike the effects function, build specific predictor for target variable(s) only variables <- mboost::extract(self$get_data('fit_best'),'variable.names') assertthat::assert_that( all( x.var %in% variables), msg = 'x.var variable not found in model!' ) if(is.null(type)) type <- self$get_data('params')$type + model <- self$model # Special treatment for factors - if(any(model$predictors_types$type=="factor")){ - if(x.var %in% model$predictors_types$predictors[model$predictors_types$type=="factor"]){ - variable_range <- levels(self$model$predictors[,x.var]) + variable_range <- list() + dummy <- as.data.frame(matrix(nrow = variable_length)) + # Loop through the provided variables + for(v in x.var){ + if(any(model$predictors_types$type=="factor")){ + if(any(x.var %in% model$predictors_types$predictors[model$predictors_types$type=="factor"])){ + for(v in x.var){ + variable_range[[v]] <- levels(model$predictors[,v]) + } + } else { + variable_range[[v]] <- range(model$predictors[[v]],na.rm = TRUE) + } } else { - variable_range <- range(self$model$predictors[[x.var]],na.rm = TRUE) + variable_range[[v]] <- range(model$predictors[[v]],na.rm = TRUE) } - } else { - variable_range <- range(self$model$predictors[[x.var]],na.rm = TRUE) - } - # Create dummy data.frame - if(is.character(variable_range)){ - # For factors, just add them - dummy <- as.data.frame(matrix(nrow = length(variable_range))) - dummy[,x.var] <- factor(variable_range) - } else { - dummy <- as.data.frame(matrix(nrow = variable_length)) - # If custom input values are specified - if(!is.null(values)){ - variable_length <- length(values) - assertthat::assert_that(length(values) >=1) - dummy[, x.var] <- values + # Create dummy data.frame + if(is.character(variable_range[[v]])){ + # For factors, just add them + dummy[, v] <- factor(variable_range[[v]]) } else { - dummy[,x.var] <- seq(variable_range[1],variable_range[2], length.out = variable_length) + # If custom input values are specified + if(!is.null(values)){ + variable_length <- length(values) + assertthat::assert_that(length(values) >=1) + dummy[, v] <- values + } else { + dummy[, v] <- seq(variable_range[[v]][1],variable_range[[v]][2], + length.out = variable_length) + } } } + # For the others if(is.null(constant)){ - if(any(self$model$predictors_types$type=='factor')){ + if(any(model$predictors_types$type=='factor')){ # Numeric names - nn <- self$model$predictors_types$predictors[which(self$model$predictors_types$type=='numeric')] - constant <- apply(self$model$predictors[,nn], 2, function(x) mean(x, na.rm=T)) + nn <- model$predictors_types$predictors[which(model$predictors_types$type=='numeric')] + constant <- apply(model$predictors[,nn], 2, function(x) mean(x, na.rm=T)) dummy <- cbind(dummy,t(constant)) # For each factor duplicate the entire matrix and add factor levels # nf <- self$model$predictors_types$predictors[which(self$model$predictors_types$type=='factor')] @@ -599,24 +609,44 @@ engine_gdb <- function(x, # } } else { # Calculate mean - constant <- apply(self$model$predictors, 2, function(x) mean(x, na.rm=T)) - dummy <- cbind(dummy,t(constant)) + constant <- apply(model$predictors, 2, function(x) mean(x, na.rm=T)) + dummy <- cbind(dummy, t(constant)) } } else { dummy[,variables] <- constant } # Now predict with model - pp <- mboost::predict.mboost(object = self$get_data('fit_best'), newdata = dummy, - which = x.var, - type = type, aggregate = 'sum') - # Combine with - out <- data.frame(partial_effect = dummy[[x.var]], - mean = pp[,grep(x.var, colnames(pp))] ) + suppressWarnings( + pp <- mboost::predict.mboost(object = self$get_data('fit_best'), newdata = dummy, + which = x.var, + type = type, aggregate = 'sum') + ) + # Check duplicates. If bbs is present and non-linear, use bbs estimate + out <- data.frame() + for(v in x.var){ + if(!self$settings$data$only_linear){ + # Combine with + out <- rbind(out, data.frame(variable = v, + partial_effect = dummy[[v]], + mean = pp[,grep("bbs", colnames(pp))] ) + ) + } else { + # Combine with + out <- rbind(out, data.frame(variable = v, + partial_effect = dummy[[v]], + mean = pp[,grep(v, colnames(pp))] ) + ) + } + } # If plot, make plot, otherwise if(plot){ + par.ori <- par(no.readonly = TRUE) + par(mfrow = c(1,2)) mboost::plot.mboost(self$get_data('fit_best'), which = x.var, newdata = dummy) + if(utils::hasName(par.ori, "pin")) par.ori$pin <- NULL + par(par.ori) } return(out) }, @@ -629,7 +659,8 @@ engine_gdb <- function(x, model <- self$model # Also check that what is present in coefficients of model variables <- as.character( mboost::extract(mod,'variable.names') ) - assertthat::assert_that(x.var %in% variables ) + assertthat::assert_that(x.var %in% variables, + msg = "Variable not found in model!" ) # Make template of target variable(s) temp <- emptyraster( self$model$predictors_object$get_data()[[1]] ) # Background @@ -670,6 +701,25 @@ engine_gdb <- function(x, } return(temp) }, + # Model convergence check + has_converged = function(self){ + fit <- self$get_data("fit_best") + if(is.Waiver(fit)) return(FALSE) + # Get risks + evl <- fit$risk() + if(fit$mstop() == length(evl)) return(FALSE) + return(TRUE) + }, + # Residual function + get_residuals = function(self){ + # Get best object + obj <- self$get_data("fit_best") + if(is.Waiver(obj)) return(obj) + # Get residuals + rd <- obj$resid() + assertthat::assert_that(length(rd)>0) + return(rd) + }, # Get coefficients get_coefficients = function(self){ # Returns a vector of the coefficients with direction/importance diff --git a/R/engine_glmnet.R b/R/engine_glmnet.R index ed213b8c..0d84d651 100644 --- a/R/engine_glmnet.R +++ b/R/engine_glmnet.R @@ -38,7 +38,8 @@ NULL #' * Renner, I.W., Elith, J., Baddeley, A., Fithian, W., Hastie, T., Phillips, S.J., Popovic, G. and Warton, D.I., 2015. Point process models for presence‐only analysis. Methods in Ecology and Evolution, 6(4), pp.366-379. #' * Fithian, W. & Hastie, T. (2013) Finite-sample equivalence in statistical models for presence-only data. The Annals of Applied Statistics 7, 1917–1939 #' @family engine -#' @returns An [engine]. +#' @returns An [Engine]. +#' @aliases engine_glmnet #' @examples #' \dontrun{ #' # Add BREG as an engine @@ -637,7 +638,30 @@ engine_glmnet <- function(x, if(plot) terra::plot(prediction, col = ibis_colours$viridis_orig) return(prediction) }, - # Get coefficients from breg + # Convergence check + has_converged = function(self){ + fit <- self$get_data("fit_best") + if(is.Waiver(fit)) return(FALSE) + # Get lambdas + lmd <- fit$lambda + if(determine_lambda(fit) == min(lmd)) return(FALSE) + return(TRUE) + }, + # Residual function + get_residuals = function(self){ + # Get best object + obj <- self$get_data("fit_best") + if(is.Waiver(obj)) return(obj) + # Calculate residuals + model <- self$model$predictors + # Get fm + fitted_values <- predict(obj, model, s = 'lambda.1se') + fitted_min <- predict(obj, model, s = 'lambda.min') + rd <- fitted_min[,1] - fitted_values[,1] + assertthat::assert_that(length(rd)>0) + return(rd) + }, + # Get coefficients from glmnet get_coefficients = function(self){ # Returns a vector of the coefficients with direction/importance obj <- self$get_data("fit_best") diff --git a/R/engine_inla.R b/R/engine_inla.R index 502b2dc3..9692b811 100644 --- a/R/engine_inla.R +++ b/R/engine_inla.R @@ -1,1036 +1,1048 @@ -#' @include bdproto-engine.R utils-inla.R bdproto-distributionmodel.R -NULL - -#' Use INLA as engine -#' -#' @description -#' Allows a full Bayesian analysis of linear and additive models using Integrated Nested Laplace approximation. -#' Engine has been largely superceded by the [engine_bru] package and users are advised to us this one, -#' unless specific options are required. -#' -#' @details -#' All \code{INLA} engines require the specification of a mesh that needs to be provided to the -#' \code{"optional_mesh"} parameter. Otherwise the mesh will be created based on best guesses of the -#' data spread. A good mesh needs to have triangles as regular as possible in size and shape: equilateral. -#' -#' [*] \code{"max.edge"}: The largest allowed triangle edge length, must be in the same scale units as the coordinates -#' Lower bounds affect the density of triangles -#' [*] \code{"offset"}: The automatic extension distance of the mesh -#' If positive: same scale units. If negative, interpreted as a factor relative to the approximate data diameter -#' i.e., a value of -0.10 will add a 10% of the data diameter as outer extension. -#' [*] \code{"cutoff"}: The minimum allowed distance between points, -#' it means that points at a closer distance than the supplied value are replaced by a single vertex. -#' it is critical when there are some points very close to each other, either for point locations or in the -#' domain boundary. -#' [*] \code{"proj_stepsize"}: The stepsize for spatial predictions, which affects the spatial grain of any outputs -#' created. -#' -#' Priors can be set via [INLAPrior]. -#' @note -#' **How INLA Meshes are generated, substantially influences prediction outcomes. See Dambly et al. (2023).** -#' @param x [distribution()] (i.e. [`BiodiversityDistribution-class`]) object. -#' @param optional_mesh A directly supplied [`INLA`] mesh (Default: \code{NULL}) -#' @param optional_projstk A directly supplied projection stack. Useful if projection stack is identical for multiple species (Default: \code{NULL}) -#' @param max.edge The largest allowed triangle edge length, must be in the same scale units as the coordinates. -#' Default is an educated guess (Default: \code{NULL}). -#' @param offset interpreted as a numeric factor relative to the approximate data diameter. -#' Default is an educated guess (Default: \code{NULL}). -#' @param cutoff The minimum allowed distance between points on the mesh. -#' Default is an educated guess (Default: \code{NULL}). -#' @param proj_stepsize The stepsize in coordinate units between cells of the projection grid (Default: \code{NULL}). -#' @param timeout Specify a timeout for INLA models in sec. Afterwards it passed. -#' @param strategy Which approximation to use for the joint posterior. Options are \code{"auto"} ("default"), \code{"adaptative"}, -#' \code{"gaussian"}, \code{"simplified.laplace"} & \code{"laplace"}. -#' @param int.strategy Integration strategy. Options are \code{"auto"},\code{"grid"}, \code{"eb"} ("default") & \code{"ccd"}. -#' See also https://groups.google.com/g/r-inla-discussion-group/c/hDboQsJ1Mls -#' @param barrier Should a barrier model be added to the model? -#' @param type The mode used for creating posterior predictions. -#' Either summarizing the linear \code{"predictor"} or \code{"response"} (Default: \code{"response"}). -#' @param area Accepts a [`character`] denoting the type of area calculation to be done on the mesh (Default: \code{'gpc2'}). -#' @param nonconvex.bdry Create a non-convex boundary hulls instead (Default: \code{FALSE}) **Not yet implemented** -#' @param nonconvex.convex Non-convex minimal extension radius for convex curvature **Not yet implemented** -#' @param nonconvex.concave Non-convex minimal extension radius for concave curvature **Not yet implemented** -#' @param nonconvex.res Computation resolution for nonconvex.hulls **Not yet implemented** -#' @param ... Other options. -#' @references -#' * Havard Rue, Sara Martino, and Nicholas Chopin (2009), Approximate Bayesian Inference for Latent Gaussian Models Using Integrated Nested Laplace Approximations (with discussion), Journal of the Royal Statistical Society B, 71, 319-392. -#' * Finn Lindgren, Havard Rue, and Johan Lindstrom (2011). An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach (with discussion), Journal of the Royal Statistical Society B, 73(4), 423-498. -#' * Simpson, Daniel, Janine B. Illian, S. H. Sørbye, and Håvard Rue. 2016. “Going Off Grid: Computationally Efficient Inference for Log-Gaussian Cox Processes.” Biometrika 1 (103): 49–70. -#' * Dambly, L. I., Isaac, N. J., Jones, K. E., Boughey, K. L., & O'Hara, R. B. (2023). Integrated species distribution models fitted in INLA are sensitive to mesh parameterisation. Ecography, e06391. -#' @family engine -#' @returns An [engine]. -#' @examples -#' \dontrun{ -#' # Add INLA as an engine (with a custom mesh) -#' x <- distribution(background) |> engine_inla(mesh = my_mesh) -#' } -#' @name engine_inla -NULL -#' @rdname engine_inla -#' @export -engine_inla <- function(x, - optional_mesh = NULL, - optional_projstk = NULL, - max.edge = NULL, - offset = NULL, - cutoff = NULL, - proj_stepsize = NULL, - timeout = NULL, - strategy = "auto", - int.strategy = "eb", - barrier = FALSE, - type = "response", - area = "gpc2", - # Not yet implemented. - nonconvex.bdry = FALSE, - nonconvex.convex = -0.15, - nonconvex.concave = -0.05, - nonconvex.res = 40, - ...) { - - # Check whether INLA package is available - check_package('INLA') - if(!isNamespaceLoaded("INLA")) { attachNamespace("INLA");requireNamespace('INLA') } - - myLog('[Deprecation]','yellow','Consider using engine_inlabru as engine with better prediction support.') - - # assert that arguments are valid - assertthat::assert_that(inherits(x, "BiodiversityDistribution"), - inherits(x$background,'sf'), - inherits(optional_mesh,'inla.mesh') || is.null(optional_mesh), - is.list(optional_projstk) || is.null(optional_projstk), - is.vector(max.edge) || is.null(max.edge), - (is.vector(offset) || is.numeric(offset)) || is.null(offset), - is.numeric(cutoff) || is.null(cutoff), - is.null(timeout) || is.numeric(timeout), - is.character(type), - is.character(area), - is.character(strategy), - is.character(int.strategy), - is.null(proj_stepsize) || is.numeric(proj_stepsize) - ) - type <- match.arg(type, c("predictor", "response"), several.ok = FALSE) - area <- match.arg(area, c("gpc", "gpc2", "km"), several.ok = FALSE) - # Check strategy settings - strategy <- match.arg(strategy, c("auto", "adaptative", "gaussian", "simplified.laplace", "laplace"), several.ok = FALSE) - int.strategy <- match.arg(int.strategy, c("auto", "grid", "eb", "ccd"), several.ok = FALSE) - - # Set the projection mesh - if(inherits(optional_mesh,'inla.mesh')) { - # Load a provided on - mesh <- optional_mesh - # Convert the study region - region.poly <- methods::as(sf::st_geometry(x$background), "Spatial") - - # Security check for projection and if not set, use the one from background - if(is.null(mesh$crs)) mesh$crs <- sp::CRS( sp::proj4string(region.poly) ) - - # Calculate area - ar <- suppressWarnings( - mesh_area(mesh = mesh, region.poly = region.poly, variant = area) - ) - } else { - mesh <- new_waiver() - ar <- new_waiver() - } - - # If time out is specified - if(!is.null(timeout)) INLA::inla.setOption(fmesher.timeout = timeout) - - # Get barrier from the region polygon - # TODO: Add this in addition to spatial field below, possibly specify an option to calculate this - if(barrier && !is.Waiver(mesh)){ - mesh_bar <- mesh_barrier(mesh, region.poly) - } else { mesh_bar <- new_waiver() } - - # --- # - # Create other parameters object - params <- list( - max.edge = max.edge, - offset = offset, - cutoff = cutoff, - proj_stepsize = proj_stepsize, - type = type, - area = area, - strategy = strategy, - int.strategy = int.strategy, - ... - ) - - # Print a message in case there is already an engine object - if(!is.Waiver(x$engine)) myLog('[Setup]','yellow','Replacing currently selected engine.') - - # Set engine in distribution object - x$set_engine( - bdproto( - "INLA-Engine", - Engine, - name = "", - data = list( - 'mesh' = mesh, - 'mesh.area' = ar, - 'mesh.bar' = mesh_bar, - 'stk_pred' = optional_projstk, - 'params' = params - ), - # Function to create a mesh - create_mesh = function(self, model){ - assertthat::assert_that(is.list(model), - "background" %in% names(model)) - # Check if mesh is already present, if so use it - if(!is.Waiver(self$get_data("mesh"))) return() - # Create a new mesh based on the available data - - # Get parameters - params <- self$get_data("params") - - # Convert the study region - region.poly <- methods::as(sf::st_geometry(model$background), "Spatial") - - # Convert to boundary object for later - suppressWarnings( - bdry <- INLA::inla.sp2segment( - sp = region.poly, - join = TRUE, - crs = INLA::inla.CRS(projargs = sp::proj4string(region.poly)) - ) - ) - bdry$loc <- INLA::inla.mesh.map(bdry$loc) - - # Try and infer mesh parameters if not set - - # Get all coordinates of observations - locs <- collect_occurrencepoints(model, include_absences = FALSE) - - assertthat::assert_that( - nrow(locs)>0, - ncol(locs)==2 - ) - - if(is.null(params$max.edge)){ - # A good guess here is usally a max.edge of between 1/3 to 1/5 of the spatial range. - max.edge <- c(diff(range(locs[,1]))/(3*5) , diff(range(locs[,1]))/(3*5) * 2) - params$max.edge <- max.edge - } - if(is.null(params$offset)){ - # Check whether the coordinate system is longlat - if( sf::st_is_longlat(bdry$crs) ){ - # Specify offset as 1/100 of the boundary distance - offset <- c( diff(range(bdry$loc[,1]))*0.01, - diff(range(bdry$loc[,1]))*0.01) - } else { - offset <- c( diff(range(bdry$loc[,1]))*0.01, - diff(range(bdry$loc[,1]))*0.01) - } - params$offset <- offset - } - if(is.null(params$cutoff)){ - # Specify as minimum distance between y coordinates - # Thus capturing most points on this level - # otherwise set to default - val <- min(abs(diff(locs[,2]))) - cutoff <- ifelse(val == 0, 1e-12, val) - params$cutoff <- cutoff - } - - suppressWarnings( - mesh <- INLA::inla.mesh.2d( - # Point localities - loc = locs, - # Boundary object - boundary = bdry, - # Mesh Parameters - max.edge = params$max.edge, - offset = params$offset, - cutoff = params$cutoff, - # Define the CRS - crs = bdry$crs - ) - ) - # Calculate area - # ar <- suppressMessages( - # suppressWarnings( - # mesh_area(mesh = mesh, region.poly = region.poly, variant = params$area) - # ) - # ) - # 06/01/2023: This should work and is identical to inlabru::ipoints - ar <- suppressWarnings( - diag( INLA::inla.mesh.fem(mesh = mesh)[[1]] ) - ) - assertthat::assert_that(length(ar) == mesh$n) - - # Now set the output - self$set_data("mesh", mesh) - self$set_data("mesh.area", ar) - - invisible() - }, - # Generic plotting function for the mesh - plot = function(self, assess = FALSE){ - if(is.Waiver(self$get_data('mesh'))) stop("No mesh found!") - - if(assess){ - # For an INLA mesh assessment - out <- INLA::inla.mesh.assessment( - mesh = self$get_data('mesh'), - spatial.range = 3, - alpha = 2, - dims = c(300, 300) - ) - # Convert to raster stack - out <- c( - sp::SpatialPixelsDataFrame( sp::coordinates(out), data = as.data.frame(out), - proj4string = self$get_data('mesh')$crs ) - ) - - terra::plot(out[[c('sd','sd.dev','edge.len')]], - col = c("#00204D","#00336F","#39486B","#575C6D","#707173","#8A8779","#A69D75","#C4B56C","#E4CF5B","#FFEA46") - ) - } else { - INLA:::plot.inla.mesh( self$get_data('mesh') ) - } - }, - # Spatial latent function - # https://groups.google.com/g/r-inla-discussion-group/c/eqMhlbwChkQ/m/m0b0PuzL-PsJ - # Default SPDE prior - # It computes the approximate diameter of the mesh, multiplies by 0.2 to get a value for the prior median range, and then transforms it to log-kappa scale by the formula - # log(sqrt(8*nu)/range) where nu is alpha-dim/2. - calc_latent_spatial = function(self,type = 'spde', alpha = 2, - priors = NULL, - polynames = NULL, - varname = "spatial.field1", - ...){ - # Catch prior objects - if(is.null(priors) || is.Waiver(priors)) priors <- NULL - - # For calculating iCAR process - if(type == 'car'){ - # convert mesh to sf object - ns <- mesh_as_sf(self$data$mesh) - # Create adjacency matrix with queen's case - nc.nb <- spdep::poly2nb(ns, queen = TRUE) - #Convert the adjacency matrix into a file in the INLA format - adjmat <- spdep::nb2mat(nc.nb,style = "B") - adjmat <- methods::as(adjmat, "dgTMatrix") - # adjmat <- INLA::inla.graph2matrix(nc.nb) - # Save the adjaceny matrix as output - self$data$latentspatial <- adjmat - self$data$s.index <- as.numeric(attr(nc.nb,varname)) - } else if(type=='spde'){ - # Check that everything is correctly specified - if(!is.null(priors)) if('spde' %notin% priors$varnames() ) priors <- NULL - - # Use default spde - if(is.null(priors) || is.Waiver(priors)){ - # Define PC Matern SPDE model and save - self$data$latentspatial <- INLA::inla.spde2.matern( - mesh = self$data$mesh, - alpha = alpha - ) - } else { - # Get priors - pr <- if(is.null(priors)) c(0.01, 0.05) else priors$get('spde','prior.range') - ps <- if(is.null(priors)) c(10, 0.05) else priors$get('spde','prior.sigma') - - # Define PC Matern SPDE model and save - self$data$latentspatial <- INLA::inla.spde2.pcmatern( - mesh = self$data$mesh, - alpha = alpha, - # P(Range < 1°) = 0.001 and P(sigma > 0.5) = 0.05 - prior.range = pr, prior.sigma = ps - ) - } - # Make index for spatial field - self$data$s.index <- INLA::inla.spde.make.index(name = varname, - n.spde = self$data$latentspatial$n.spde, - n.group = 1, - n.repl = 1) - # Security checks - assertthat::assert_that( - inherits(self$data$latentspatial,'inla.spde'), - length(self$data$s.index[[1]]) == self$data$mesh$n - ) - } else if(type == 'poly'){ - # Save column names of polynomial transformed coordinates - assertthat::assert_that(!is.null(polynames)) - self$data$latentspatial <- polynames - } - invisible() - }, - # Get latent spatial equation bit - # Set vars to 2 or larger to get copied spde's - get_equation_latent_spatial = function(self, method, vars = 1, separate_spde = FALSE){ - assertthat::assert_that(is.numeric(vars)) - if(method == 'spde'){ - assertthat::assert_that(inherits(self$data$latentspatial, 'inla.spde'), - msg = 'Latent spatial has not been calculated.') - # SPDE string - if(separate_spde){ - ss <- paste0("f(spatial.field",vars,", model = ",method,")") - } else { - if(vars >1){ - ss <- paste0("f(spatial.field",vars,", copy = \'spatial.field1\', model = ",method,", fixed = TRUE)") - } else { - ss <- paste0("f(spatial.field",vars,", model = ",method,")") - } - } - return(ss) - - } else if(method == 'car'){ - assertthat::assert_that(inherits(self$data$latentspatial,'dgTMatrix'), - msg = 'Neighborhood matrix has not been calculated.') - return( - # BESAG model or BYM model to specify - # BYM found to be largely similar to SPDE https://onlinelibrary.wiley.com/doi/pdf/10.1002/ece3.3081 - paste0('f(','spatial.field',', model = "bym", graph = ','adjmat',')') - ) - } - }, - # Configure stack - make_stack = function(self, model, id, intercept = TRUE, joint = FALSE) { - assertthat::assert_that( - is.list(model), - is.character(id) - ) - # Get Environment records - env <- model$predictors - - # Include intercept in here - # TODO: Note that this sets intercepts by type and not by dataset id - if(intercept) { - env$Intercept <- 1 # Overall Intercept - env[[paste0('Intercept', - ifelse(joint,paste0('_', - make.names(tolower(model$name)),'_', - model$type),''))]] <- 1 # Setting Intercept to common type, thus sharing with similar types - } - # Set up projection matrix for the data - suppressWarnings( - mat_proj <- INLA::inla.spde.make.A( - mesh = self$get_data('mesh'), - loc = as.matrix(env[,c('x','y')]) - ) - ) - # Create INLA stack - # The three main inla.stack() arguments are a vector list with the data (data), - # a list of projector matrices (each related to one block effect, - # A) and the list of effects (effects). - - # Response for inla stack - ll_resp <- list() - # Add the expected estimate and observed note - # FIXME: Currently only two likelihoods are supported (binomial/poisson) with the NA order being the determining factor - if(model$family == 'poisson') { - if(joint) ll_resp[[ 'observed' ]] <- cbind(model$observations[['observed']], NA ) - if(!joint) ll_resp[[ 'observed' ]] <- cbind(model$observations[['observed']] ) - ll_resp[[ 'e' ]] <- model$expect - } - if(model$family == 'binomial') { - if(joint) ll_resp[[ 'observed' ]] <- cbind(NA, model$observations[['observed']] ) - if(!joint) ll_resp[[ 'observed' ]] <- cbind( model$observations[['observed']] ) - ll_resp[[ 'Ntrials' ]] <- model$expect - } - - # Effects matrix - ll_effects <- list() - # Note, order adding this is important and matches the A matrix below - # ll_effects[['Intercept']] <- rep(1, nrow(model$observations)) - # ll_effects[['Intercept']][[paste0('Intercept',ifelse(joint,paste0('_',make.names(tolower(model$name)),'_',model$type),''))]] <- seq(1, self$get_data('mesh')$n) # Old code - ll_effects[['predictors']] <- env - ll_effects[['spatial.field1']] <- seq(1, self$get_data('mesh')$n) - - # Add offset if specified - if(!is.null(model$offset)){ - ll_effects[['predictors']] <- cbind( ll_effects[['predictors']], - subset(model[['offset']],select = "spatial_offset") - ) - } - - # Check whether equation has spatial field and otherwise add - # MJ 13/06: Spatial.field now set directly to effects - # if( 'spde' %in% all.vars(model$equation) ){ - # # Get Index Objects - # iset <- self$get_data('s.index') - # ll_effects[['spatial.field']] <- c(ll_effects[['spatial.field']], iset) - # } else if ( 'adjmat' %in% all.vars(model$equation) ){ - # iset <- self$get_data('s.index') - # ll_effects[['spatial.field']] <- c(ll_effects[['spatial.field']], data.frame(spatial.index = iset) ) - # } - # Define A - A <- list(1, mat_proj) - - # Define stack - stk <- INLA::inla.stack( - data = ll_resp, - A = A, - effects = ll_effects, - tag = paste0('stk_',as.character(model$type),'_',id) - ) - # Set the stack - self$set_data(paste0('stk_',as.character(model$type),'_',id), stk) - invisible() - }, - # Main INLA training function ---- - # Setup computation function - setup = function(self, model, settings,...){ - assertthat::assert_that( - 'background' %in% names(model), - 'biodiversity' %in% names(model), - all( model$biodiversity[[1]]$predictors_names %in% model$predictors_names ), - all( sapply(model$biodiversity, function(x) is.formula(x$equation)) ), - length(model$biodiversity)>=1, - msg = 'Some internal checks failed while setting up the model.' - ) - # Messenger - if(getOption('ibis.setupmessages')) myLog('[Estimation]','green','Engine setup.') - - # Set number of threads via set.Options - INLA::inla.setOption(num.threads = getOption('ibis.nthread'), - blas.num.threads = getOption('ibis.nthread')) - - # --- Prepare general inputs --- - # Check whether spatial latent effects were added - if( 'spde' %in% all.vars(model$biodiversity[[1]]$equation) ){ - # Get spatial index - spde <- self$get_data('s.index') - } else { spde <- NULL } - - # Check for existence of specified offset and use the full one in this case - if(!is.Waiver(model$offset)) offset <- subset(model[['offset']],select = "spatial_offset") else offset <- NULL - - # Projection stepsize - params <- self$get_data('params') - if(is.null( params$proj_stepsize )){ - # Set to stepsize equivalent of the resolution of the grid - val <- max(diff(model[['predictors']]$x)) # TODO: Check that it works when dummy variable is used - params$proj_stepsize <- val - self$set_data('params', params ) - rm(val) - } - - # Number of types to determine if a joint model is necessary - nty <- length( unique( as.character(sapply(model$biodiversity, function(z) z$type)) ) ) - - # Clean up previous data and integration stacks - chk <- grep('stk_int|stk_poipo|stk_poipa|stk_polpo|stk_polpa|stk_pred|stk_full', self$list_data()) - if(length(chk)>0) self$data[chk] <- NULL - - # Re-format the full predictors if there are any factor variables - # FIXME: Potentially outsource? - if(any(model$predictors_types$type=="factor")){ - vf <- model$predictors_types$predictors[model$predictors_types$type=="factor"] - for(k in vf){ - o <- explode_factor(model$predictors[[k]],name = k) - model$predictors <- cbind(model$predictors, o) - model$predictors_names <- c(model$predictors_names, colnames(o)) - model$predictors_types <- rbind(model$predictors_types, - data.frame(predictors = colnames(o), type = "numeric") ) - # Finally remove the original column from the predictor object - model$predictors[[k]] <- NULL - model$predictors_names <- model$predictors_names[-which( model$predictors_names == k )] - model$predictors_types <- subset(model$predictors_types, subset = predictors != k) - # FIXME: Hacky solution as to not overwrite predictor object - ras_back <- model$predictors_object$data - # Explode the columns in the raster object - model$predictors_object$data <- c( - model$predictors_object$data, - explode_factorized_raster(model$predictors_object$data[[k]]) - ) - model$predictors_object$data <- terra::subset(model$predictors_object$data, -k) - } - } else { ras_back <- new_waiver() } - - # Now for each dataset create a INLA stack - for(id in 1:length(model$biodiversity) ){ - - # If there any factor variables split them per type and explode them - if(any(model$biodiversity[[id]]$predictors_types$type=="factor")){ - vf <- model$biodiversity[[id]]$predictors_types$predictors[model$biodiversity[[id]]$predictors_types$type=="factor"] - fv <- model$biodiversity[[id]]$predictors[vf] - for(k in 1:ncol(fv)){ - o <- explode_factor(fv[,k],name = colnames(fv)[k]) - # Add - model$biodiversity[[id]]$predictors <- cbind(model$biodiversity[[id]]$predictors, o) - model$biodiversity[[id]]$predictors_names <- c(model$biodiversity[[id]]$predictors_names, colnames(o)) - model$biodiversity[[id]]$predictors_types <- rbind(model$biodiversity[[id]]$predictors_types, - data.frame(predictors = colnames(o), type = "numeric") ) - # Finally remove the original column from the predictor object - model$biodiversity[[id]]$predictors[[colnames(fv)[k]]] <- NULL - model$biodiversity[[id]]$predictors_names <- model$biodiversity[[id]]$predictors_names[-which( model$biodiversity[[id]]$predictors_names == colnames(fv)[k] )] - model$biodiversity[[id]]$predictors_types <- subset(model$biodiversity[[id]]$predictors_types, subset = predictors != colnames(fv)[k]) - - } - } - # Calculate observation stack INLA stack - # Save stacks by id instead of type - self$make_stack(model = model$biodiversity[[id]], - id = names(model$biodiversity)[id], - intercept = TRUE, - joint = ifelse(nty > 1, TRUE, FALSE) - ) - - # Define mesh.area dependent on whether a single variable only is used or not - if(model$biodiversity[[id]]$family == 'poisson'){ - # Only create on if not already existing - chk <- grep('stk_int', self$list_data()) - if(length(chk)==0){ - # Make integration stack for given poisson model - stk_int <- inla_make_integration_stack( - mesh = self$get_data('mesh'), - mesh.area = self$get_data('mesh.area'), - model = model, - id = names(model$biodiversity)[id], - joint = ifelse(nty > 1, TRUE, FALSE) - ) - # Save integration stack - self$set_data(paste0('stk_int_',names(model$biodiversity)[id]),stk_int) - } - } - } - - # ------------------ # - # Get all stacks defined so far and join them - stk_inference <- lapply( - self$list_data()[grep(paste(names(model$biodiversity),collapse = '|'), self$list_data())], - function(x) self$get_data(x) - ) - stk_inference <- do.call(INLA::inla.stack, stk_inference) - - # Clamp? - if( settings$get("clamp") ) model$predictors <- clamp_predictions(model, model$predictors) - - # Make projection stack if not directly supplied - if(is.null(self$data$stk_pred)){ - - stk_pred <- inla_make_projection_stack( - stk_resp = stk_inference, - model = model, - mesh = self$get_data('mesh'), - mesh.area = self$get_data('mesh.area'), - res = self$get_data('params')$proj_stepsize, - type = model$biodiversity[[id]]$type, - background = model$background, - spde = spde, - settings = settings, - joint = ifelse(nty > 1, TRUE, FALSE) - ) - self$set_data('stk_pred', stk_pred) - } else { - # FIXME: Add some basic assertthat tests for when a prediction stack is directly supplied - stk_pred <- self$get_data('stk_pred') - } - - # Now join all stacks and save in full - # Note: If integrated stack is included, E must be set to relative area (in mesh.area). - self$set_data('stk_full', - INLA::inla.stack(stk_inference, stk_pred$stk_proj) - ) - if(!is.Waiver(ras_back)) model$predictors_object$data # Overwrite model object back to avoid issues with other engines. Hacky! - return(model) - }, - train = function(self, model, settings) { - # Check that all inputs are there - assertthat::assert_that( - inherits(settings,'Settings'), - is.list(model),length(model)>1, - # Check that model id and setting id are identical - settings$modelid == model$id, - any( (c('stk_full','stk_pred') %in% names(self$data)) ), - inherits(self$get_data('stk_full'),'inla.data.stack') - ) - # Messenger - if(getOption('ibis.setupmessages')) myLog('[Estimation]','green','Starting fitting...') - - # Get all datasets with id. This includes the data stacks and integration stacks - stk_inference <- lapply( - self$list_data()[grep(paste(names(model$biodiversity),collapse = '|'), self$list_data())], - function(x) self$get_data(x)) - stk_inference <- do.call(INLA::inla.stack, stk_inference) - - # Get full stack and projection grid - stk_full <- self$get_data('stk_full') - predcoords <- self$get_data('stk_pred')$predcoords - - # Get parameters - params <- self$get_data("params") - - # Get families and links - fam <- unique( as.character( sapply(model$biodiversity, function(x) x$family) ) ) - lin <- sapply(model$biodiversity, function(x) x$link) - # Define control family - cf <- list() - for(i in 1:length(fam)) cf[[i]] <- list(link = ifelse(fam[i] == 'poisson','log','cloglog' )) - if(length(fam)==1 && fam == 'binomial') cf[[1]]$link <- 'logit' - - # Shared link? Set to - if(length(fam)==1) {li <- 1} else { li <- NULL} # FIXME: Check whether links have to be set individually per observation - - if('spde' %in% all.vars(model$biodiversity[[1]]$equation) ){ - spde <- self$get_data('latentspatial') - stack_data_resp <- INLA::inla.stack.data(stk_inference, spde = self$get_data('latentspatial')) - stack_data_full <- INLA::inla.stack.data(stk_full, spde = self$get_data('latentspatial')) - } else { - adjmat <- spde <- self$get_data('latentspatial') - stack_data_resp <- INLA::inla.stack.data(stk_inference) - stack_data_full <- INLA::inla.stack.data(stk_full) - } - # ----------- # - # Provided or default formula - master_form <- stats::as.formula( - paste0("observed ~ ", - # # If multiple datasets, remove intercept - ifelse(length(model$biodiversity)>1,"0 + ", ""), - paste0(sapply(model$biodiversity, function(x){ - attr(stats::terms.formula(x$equation),"term.labels") - }) |> c() |> unique(),collapse = " + ") - ) - ) - - # Perform variable selection - if( settings$get(what='optim_hyperparam')){ - if(getOption('ibis.setupmessages')) myLog('[Estimation]','green','Performing backstep variable selection (hacky)...') - - k <- NULL - # Specify offsets and spde to be retained - # FIXME: Also set priors and offsets here? - if(is.Waiver(spde)) k <- NULL else k <- self$get_equation_latent_spatial('spde') - - # Use backward variable elimination - vs <- inla.backstep(master_form = master_form, - stack_data_resp = stack_data_resp, - stk_inference = stk_inference, - fam = fam, - cf = cf,li = li, - response = 'observed', - keep = k - ) - master_form <- to_formula(vs$form) - } - - # ------------------------------------------ # - # Train the model on the response - fit_resp <- INLA::inla(formula = master_form, # The specified formula - data = stack_data_resp, # The data stack - quantiles = c(0.05, 0.5, 0.95), - E = INLA::inla.stack.data(stk_inference)$e, # Expectation (Eta) for Poisson model - Ntrials = INLA::inla.stack.data(stk_inference)$Ntrials, - family = fam, # Family the data comes from - control.family = cf, # Control options - control.predictor = list(A = INLA::inla.stack.A(stk_inference), - link = li, # Link to NULL for multiple likelihoods! - compute = TRUE), # Compute for marginals of the predictors. - control.compute = list(cpo = FALSE, waic = TRUE, config = TRUE), #model diagnostics and config = TRUE gives you the GMRF - # control.fixed = list(mean = 0),# prec = list( initial = log(0.000001), fixed = TRUE)), # Added to see whether this changes GMRFlib convergence issues - verbose = settings$get(what='verbose'), # To see the log of the model runs - control.inla = INLA::control.inla(strategy = params$strategy, - int.strategy = params$int.strategy), - num.threads = getOption('ibis.nthread') - ) - - # Predict spatially - if(!settings$get(what='inference_only')){ - # Messenger - if(getOption('ibis.setupmessages')) myLog('[Estimation]','green','Starting prediction...') - - # Predict on full - fit_pred <- try({INLA::inla(formula = master_form, # The specified formula - data = stack_data_full, # The data stack - quantiles = c(0.05, 0.5, 0.95), - E = INLA::inla.stack.data(stk_full)$e, - Ntrials = INLA::inla.stack.data(stk_full)$Ntrials, - family= fam, # Family the data comes from - control.family = cf, # Control options - control.predictor = list(A = INLA::inla.stack.A(stk_full), - link = li, # Link to NULL for multiple likelihoods! - compute = TRUE), # Compute for marginals of the predictors. - control.compute = list(cpo = FALSE, waic = TRUE, config = TRUE, openmp.strategy = 'huge' ), - # control.mode = list(theta = thetas, restart = FALSE), # To speed up use previous thetas - verbose = settings$get(what='verbose'), # To see the log of the model runs - # control.results = list(return.marginals.random = FALSE, - # return.marginals.predictor = FALSE), # Don't predict marginals to save speed - # control.fixed = INLA::control.fixed(mean = 0),#, prec = list( initial = log(0.000001), fixed = TRUE)), # Added to see whether this changes GMRFlib convergence issues - control.inla = INLA::control.inla(strategy = params$strategy, - int.strategy = params$int.strategy), - num.threads = getOption('ibis.nthread') - ) - },silent = FALSE) - if(inherits(fit_pred,'try-error')) { print(fit_pred); stop('Model did not converge. Try to simplify structure and check priors!') } - # Create a spatial prediction - index.pred <- INLA::inla.stack.index(stk_full, 'stk_pred')$data - # Which type of prediction (linear predictor or response scale) - # The difference between both is that response applies the (inverse of the) link function, - # so it doesn't include the observation distribution part (measurement noise) of posterior predictions. - if(params$type == "predictor"){ - post <- fit_pred$summary.linear.predictor[index.pred, ] - } else { - post <- fit_pred$summary.fitted.values[index.pred, ] - } - assertthat::assert_that(nrow(post)>0, - nrow(post) == nrow(predcoords) ) # Check with cells in projection - # Back-transform for predictor - if(params$type == "predictor"){ - if(length(fam)==1){ - if(fam == 'poisson') post[,c('mean','0.05quant','0.5quant','0.95quant','mode')] <- exp( post[,c('mean','0.05quant','0.5quant','0.95quant','mode')] ) - if(fam == 'binomial') post[,c('mean','0.05quant','0.5quant','0.95quant','mode')] <- logistic(post[,c('mean','0.05quant','0.5quant','0.95quant','mode')]) - } else { - # Joint likelihood of Poisson log and binomial cloglog following Simpson et al. - post[,c('mean','0.05quant','0.5quant','0.95quant','mode')] <- exp( post[,c('mean','0.05quant','0.5quant','0.95quant','mode')] ) - } - } - post <- subset(post, select = c('mean','sd','0.05quant','0.5quant','0.95quant','mode') ) - post$cv <- post$sd / post$mean - # Rename - names(post) <- c("mean", "sd", "q05", "q50", "q95", "mode","cv") - - # Fill prediction - # suppressWarnings( - # prediction <- terra::rast( - # sp::SpatialPixelsDataFrame( - # points = predcoords, - # data = post, - # proj4string = sp::CRS( self$get_data('mesh')$crs@projargs ) # x$engine$data$mesh$crs@projargs - # ) - # ) - # ) - suppressWarnings( - prediction <- terra::rast(cbind( as.data.frame(predcoords), post), type = "xyz", - crs = terra::crs(self$get_data('mesh')$crs@projargs) ) - ) - - prediction <- terra::mask(prediction, model$background) # Mask with background - # Align with background - # temp <- terra::rast( - # sp::SpatialPixelsDataFrame( - # points = model$predictors[,c('x','y')], - # data = model$predictors[,c('x','y')], - # proj4string = sp::CRS( self$get_data('mesh')$crs@projargs ) # x$engine$data$mesh$crs@projargs - # ) - # ) - temp <- emptyraster( model$predictors_object$get_data() ) - prediction <- terra::resample(prediction, temp, method = 'bilinear') - - } else { - # No prediction to be conducted - fit_pred <- NULL - prediction <- NULL - } - - # Compute end of computation time - settings$set('end.time', Sys.time()) - - # Definition of INLA Model object ---- - out <- bdproto( - "INLA-Model", - DistributionModel, - id = model$id, - model = model, - settings = settings, - fits = list( - "fit_best" = fit_resp, - "fit_pred" = fit_pred, - "fit_best_equation" = master_form, - "mesh" = self$get_data('mesh'), - "spde" = self$get_data('latentspatial'), - "prediction" = prediction - ), - # Projection function - project = function(self, newdata, mode = 'coef', backtransf = NULL, layer = "mean"){ - assertthat::assert_that('fit_best' %in% names(self$fits), - is.data.frame(newdata) || is.matrix(newdata), - mode %in% c('coef','sim','full'), - assertthat::has_name(newdata,c('x','y')) - ) - stop("Projection using engine INLA is deprecated. Use engine_inlabru !") - - # Try and guess backtransformation - if(is.null(backtransf)){ - fam <- self$get_data('fit_best')$.args$family - backtransf <- ifelse(fam == 'poisson', exp, logistic) - } - - if(mode == 'coef'){ - # We use the coefficient prediction - out <- coef_prediction(mesh = self$get_data('mesh'), - mod = self, - type = 'mean', - backtransf = backtransf - ) - } else if(mode == 'sim'){ - # Simulate from posterior. Not yet coded - stop('Simulation from posterior not yet implemented. Use inlabru instead!') - } else { - stop('Full prediction not yet added.') - } - # Return result - return(out) - }, - # Partial response - # FIXME: Create external function - partial = function(self, x, x.var, constant = NULL, variable_length = 100, values = NULL, plot = FALSE, type = "response"){ - # Goal is to create a sequence of value and constant and append to existing stack - # Alternative is to create a model-matrix through INLA::inla.make.lincomb() and - # model.matrix(~ vars, data = newDummydata) fed to make.lincomb - # provide via lincomb = M to an INLA call. - # Both should be identical - stop("Partial function not implemented. Consider using inlabru instead!") - # Check that provided model exists and variable exist in model - mod <- self$get_data('fit_best') - assertthat::assert_that(inherits(mod,'inla'), - 'model' %in% names(self), - inherits(x,'BiodiversityDistribution'), - length(x.var) == 1, is.character(x.var), - is.null(constant) || is.numeric(constant) - ) - varn <- mod$names.fixed - variable <- match.arg(x.var, varn, several.ok = FALSE) - assertthat::assert_that(variable %in% varn, length(variable)==1,!is.null(variable)) - - # ------------------ # - # Get all datasets with id in model. This includes the data stacks and integration stacks - stk_inference <- lapply( - x$engine$list_data()[grep(paste(names(model$biodiversity),collapse = '|'), x$engine$list_data())], - function(z) x$engine$get_data(z)) - stk_inference <- do.call(INLA::inla.stack, stk_inference) - # FIXME: Test that this works with SPDE present - stack_data_resp <- INLA::inla.stack.data(stk_inference) - # ------------------ # - - # If constant is null, calculate average across other values - if(is.null(constant)){ - constant <- lapply(stack_data_resp, function(x) mean(x,na.rm = T))[varn[varn %notin% variable]] - } - # For target variable calculate range - variable_range <- range(stack_data_resp[[variable]],na.rm = TRUE) - - # Create dummy data.frame - dummy <- data.frame(observed = rep(NA, variable_length)) - - seq(variable_range[1],variable_range[2],length.out = variable_length) - - # # add sequence of data and na to data.frame. predict those - # control.predictor = list(A = INLA::inla.stack.A(stk_full), - # link = li, # Link to NULL for multiple likelihoods! - # compute = TRUE), # Compute for marginals of the predictors. - - print('Refitting model for partial effect') - ufit <- INLA::inla(formula = stats::as.formula(mod$.args$formula), # The specified formula - data = stk_inference, # The data stack - quantiles = c(0.05, 0.5, 0.95), - E = INLA::inla.stack.data(stk_inference)$e, # Expectation (Eta) for Poisson model - Ntrials = INLA::inla.stack.data(stk_inference)$Ntrials, - family = mod$.args$family, # Family the data comes from - control.family = mod$.args$control.family, # Control options - control.predictor = mod$.args$control.predictor, # Compute for marginals of the predictors. - control.compute = mod$.args$control.compute, - control.fixed = mod$.args$control.fixed, - verbose = FALSE, # To see the log of the model runs - control.inla = mod$.args$control.inla, - num.threads = getOption('ibis.nthread') - ) - control.predictor = list(A = INLA::inla.stack.A(stk_inference)) - - # Plot and return result - }, - # Get coefficients - get_coefficients = function(self){ - # Returns a vector of the coefficients with direction/importance - cofs <- self$summary() - cofs <- subset(cofs, select = c("variable", "mean", "sd")) - names(cofs) <- c("Feature", "Beta", "Sigma") - # Remove intercept(s) - int <- grep("Intercept",cofs$Feature,ignore.case = TRUE) - if(length(int)>0) cofs <- cofs[-int,] - return(cofs) - }, - # Function to plot SPDE if existing - plot_spatial = function(self, dim = c(300,300), kappa_cor = FALSE, what = "spatial.field1", ...){ - assertthat::assert_that(is.vector(dim), - is.character(what)) - - if( length( self$fits$fit_best$size.spde2.blc ) == 1) - { - # Get spatial projections from model - # FIXME: Potentially make the plotting of this more flexible - gproj <- INLA::inla.mesh.projector(self$get_data('mesh'), dims = dim) - g.mean <- INLA::inla.mesh.project(gproj, - self$get_data('fit_best')$summary.random[[what]]$mean) - g.sd <- INLA::inla.mesh.project(gproj, self$get_data('fit_best')$summary.random[[what]]$sd) - - # Convert to rasters - g.mean <- t(g.mean) - g.mean <- g.mean[rev(1:length(g.mean[,1])),] - r.m <- terra::rast(g.mean, - xmin = range(gproj$x)[1], xmax = range(gproj$x)[2], - ymin = range(gproj$y)[1], ymax = range(gproj$y)[2], - crs = terra::crs(self$get_data('mesh')$crs) - ) - g.sd <- t(g.sd) - g.sd <- g.sd[rev(1:length(g.sd[,1])),] - r.sd <- terra::rast(g.sd, - xmn = range(gproj$x)[1], xmx = range(gproj$x)[2], - ymn = range(gproj$y)[1], ymx = range(gproj$y)[2], - crs = terra::crs( self$get_data('mesh')$crs) - ) - - spatial_field <- c(r.m, r.sd);names(spatial_field) <- c('SPDE_mean','SPDE_sd') - # Mask with prediction if exists - if(!is.null(self$get_data('prediction'))){ - spatial_field <- terra::resample(spatial_field, self$get_data('prediction')[[1]]) - spatial_field <- terra::mask(spatial_field, self$get_data('prediction')[[1]]) - } - - # -- # - if(kappa_cor){ - # Also build correlation fun - # Get SPDE results - spde_results <- INLA::inla.spde2.result( - inla = self$get_data('fit_best'), - name = what, - spde = self$get_data('spde'), - do.transfer = TRUE) - - # Large kappa (inverse range) equals a quick parameter change in space. - # Small kappa parameter have much longer, slower gradients. - Kappa <- INLA::inla.emarginal(function(x) x, spde_results$marginals.kappa[[1]]) - sigmau <- INLA::inla.emarginal(function(x) sqrt(x), spde_results$marginals.variance.nominal[[1]]) - r <- INLA::inla.emarginal(function(x) x, spde_results$marginals.range.nominal[[1]]) - - # Get Mesh and distance between points - mesh <- self$get_data('mesh') - D <- as.matrix( stats::dist(mesh$loc[, 1:2]) ) - - # Distance vector. - dis.cor <- data.frame(distance = seq(0, max(D), length = 100)) - # Maximum distance by quarter of extent - dis.max <- abs((xmin(self$get_data('prediction')) - xmax(self$get_data('prediction')) ) / 2) # Take a quarter of the max distance - - # Modified Bessel function to get correlation strength - dis.cor$cor <- as.numeric((Kappa * dis.cor$distance) * base::besselK(Kappa * dis.cor$distance, 1)) - dis.cor$cor[1] <- 1 - # --- - # Build plot - graphics::layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE)) - plot(dis.cor$cor ~ dis.cor$distance, type = 'l', lwd = 3, - xlab = 'Distance (proj. unit)', ylab = 'Correlation', main = paste0('Kappa: ', round(Kappa,2) ) ) - graphics::abline(v = dis.max,lty = 'dotted') - plot(spatial_field[[1]],col = ibis_colours[['viridis_cividis']], main = 'mean spatial effect') - plot(spatial_field[[2]], main = 'sd spatial effect') - } else { - # Just plot the SPDE - graphics::par(mfrow=c(1,2)) - plot(spatial_field[[1]],col = ibis_colours[['viridis_cividis']], main = 'mean spatial effect') - plot(spatial_field[[2]], main = 'sd spatial effect') - # And return - return(spatial_field) - } - - - } else { - message(text_red('No spatial covariance in model specified.')) - } - } - ) - return(out) - } - )) -} +#' @include bdproto-engine.R utils-inla.R bdproto-distributionmodel.R +NULL + +#' Use INLA as engine +#' +#' @description +#' Allows a full Bayesian analysis of linear and additive models using Integrated Nested Laplace approximation. +#' Engine has been largely superceded by the [`engine_inlabru`] package and users are advised to us this one, +#' unless specific options are required. +#' +#' @details +#' All \code{INLA} engines require the specification of a mesh that needs to be provided to the +#' \code{"optional_mesh"} parameter. Otherwise the mesh will be created based on best guesses of the +#' data spread. A good mesh needs to have triangles as regular as possible in size and shape: equilateral. +#' +#' [*] \code{"max.edge"}: The largest allowed triangle edge length, must be in the same scale units as the coordinates +#' Lower bounds affect the density of triangles +#' [*] \code{"offset"}: The automatic extension distance of the mesh +#' If positive: same scale units. If negative, interpreted as a factor relative to the approximate data diameter +#' i.e., a value of -0.10 will add a 10% of the data diameter as outer extension. +#' [*] \code{"cutoff"}: The minimum allowed distance between points, +#' it means that points at a closer distance than the supplied value are replaced by a single vertex. +#' it is critical when there are some points very close to each other, either for point locations or in the +#' domain boundary. +#' [*] \code{"proj_stepsize"}: The stepsize for spatial predictions, which affects the spatial grain of any outputs +#' created. +#' +#' Priors can be set via [INLAPrior]. +#' @note +#' **How INLA Meshes are generated, substantially influences prediction outcomes. See Dambly et al. (2023).** +#' @param x [distribution()] (i.e. [`BiodiversityDistribution-class`]) object. +#' @param optional_mesh A directly supplied \code{"INLA"} mesh (Default: \code{NULL}) +#' @param optional_projstk A directly supplied projection stack. Useful if projection stack is identical for multiple species (Default: \code{NULL}) +#' @param max.edge The largest allowed triangle edge length, must be in the same scale units as the coordinates. +#' Default is an educated guess (Default: \code{NULL}). +#' @param offset interpreted as a numeric factor relative to the approximate data diameter. +#' Default is an educated guess (Default: \code{NULL}). +#' @param cutoff The minimum allowed distance between points on the mesh. +#' Default is an educated guess (Default: \code{NULL}). +#' @param proj_stepsize The stepsize in coordinate units between cells of the projection grid (Default: \code{NULL}). +#' @param timeout Specify a timeout for INLA models in sec. Afterwards it passed. +#' @param strategy Which approximation to use for the joint posterior. Options are \code{"auto"} ("default"), \code{"adaptative"}, +#' \code{"gaussian"}, \code{"simplified.laplace"} & \code{"laplace"}. +#' @param int.strategy Integration strategy. Options are \code{"auto"},\code{"grid"}, \code{"eb"} ("default") & \code{"ccd"}. +#' See also https://groups.google.com/g/r-inla-discussion-group/c/hDboQsJ1Mls +#' @param barrier Should a barrier model be added to the model? +#' @param type The mode used for creating posterior predictions. +#' Either summarizing the linear \code{"predictor"} or \code{"response"} (Default: \code{"response"}). +#' @param area Accepts a [`character`] denoting the type of area calculation to be done on the mesh (Default: \code{'gpc2'}). +#' @param nonconvex.bdry Create a non-convex boundary hulls instead (Default: \code{FALSE}) **Not yet implemented** +#' @param nonconvex.convex Non-convex minimal extension radius for convex curvature **Not yet implemented** +#' @param nonconvex.concave Non-convex minimal extension radius for concave curvature **Not yet implemented** +#' @param nonconvex.res Computation resolution for nonconvex.hulls **Not yet implemented** +#' @param ... Other options. +#' @references +#' * Havard Rue, Sara Martino, and Nicholas Chopin (2009), Approximate Bayesian Inference for Latent Gaussian Models Using Integrated Nested Laplace Approximations (with discussion), Journal of the Royal Statistical Society B, 71, 319-392. +#' * Finn Lindgren, Havard Rue, and Johan Lindstrom (2011). An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach (with discussion), Journal of the Royal Statistical Society B, 73(4), 423-498. +#' * Simpson, Daniel, Janine B. Illian, S. H. Sørbye, and Håvard Rue. 2016. “Going Off Grid: Computationally Efficient Inference for Log-Gaussian Cox Processes.” Biometrika 1 (103): 49–70. +#' * Dambly, L. I., Isaac, N. J., Jones, K. E., Boughey, K. L., & O'Hara, R. B. (2023). Integrated species distribution models fitted in INLA are sensitive to mesh parameterisation. Ecography, e06391. +#' @family engine +#' @returns An engine. +#' @aliases engine_inla +#' @examples +#' \dontrun{ +#' # Add INLA as an engine (with a custom mesh) +#' x <- distribution(background) |> engine_inla(mesh = my_mesh) +#' } +#' @name engine_inla +NULL +#' @rdname engine_inla +#' @export +engine_inla <- function(x, + optional_mesh = NULL, + optional_projstk = NULL, + max.edge = NULL, + offset = NULL, + cutoff = NULL, + proj_stepsize = NULL, + timeout = NULL, + strategy = "auto", + int.strategy = "eb", + barrier = FALSE, + type = "response", + area = "gpc2", + # Not yet implemented. + nonconvex.bdry = FALSE, + nonconvex.convex = -0.15, + nonconvex.concave = -0.05, + nonconvex.res = 40, + ...) { + + # Check whether INLA package is available + check_package('INLA') + if(!isNamespaceLoaded("INLA")) { attachNamespace("INLA");requireNamespace('INLA') } + + myLog('[Deprecation]','yellow','Consider using engine_inlabru as engine with better prediction support.') + + # assert that arguments are valid + assertthat::assert_that(inherits(x, "BiodiversityDistribution"), + inherits(x$background,'sf'), + inherits(optional_mesh,'inla.mesh') || is.null(optional_mesh), + is.list(optional_projstk) || is.null(optional_projstk), + is.vector(max.edge) || is.null(max.edge), + (is.vector(offset) || is.numeric(offset)) || is.null(offset), + is.numeric(cutoff) || is.null(cutoff), + is.null(timeout) || is.numeric(timeout), + is.character(type), + is.character(area), + is.character(strategy), + is.character(int.strategy), + is.null(proj_stepsize) || is.numeric(proj_stepsize) + ) + type <- match.arg(type, c("predictor", "response"), several.ok = FALSE) + area <- match.arg(area, c("gpc", "gpc2", "km"), several.ok = FALSE) + # Check strategy settings + strategy <- match.arg(strategy, c("auto", "adaptative", "gaussian", "simplified.laplace", "laplace"), several.ok = FALSE) + int.strategy <- match.arg(int.strategy, c("auto", "grid", "eb", "ccd"), several.ok = FALSE) + + # Set the projection mesh + if(inherits(optional_mesh,'inla.mesh')) { + # Load a provided on + mesh <- optional_mesh + # Convert the study region + region.poly <- methods::as(sf::st_geometry(x$background), "Spatial") + + # Security check for projection and if not set, use the one from background + if(is.null(mesh$crs)) mesh$crs <- sp::CRS( sp::proj4string(region.poly) ) + + # Calculate area + ar <- suppressWarnings( + mesh_area(mesh = mesh, region.poly = region.poly, variant = area) + ) + } else { + mesh <- new_waiver() + ar <- new_waiver() + } + + # If time out is specified + if(!is.null(timeout)) INLA::inla.setOption(fmesher.timeout = timeout) + + # Get barrier from the region polygon + # TODO: Add this in addition to spatial field below, possibly specify an option to calculate this + if(barrier && !is.Waiver(mesh)){ + mesh_bar <- mesh_barrier(mesh, region.poly) + } else { mesh_bar <- new_waiver() } + + # --- # + # Create other parameters object + params <- list( + max.edge = max.edge, + offset = offset, + cutoff = cutoff, + proj_stepsize = proj_stepsize, + type = type, + area = area, + strategy = strategy, + int.strategy = int.strategy, + ... + ) + + # Print a message in case there is already an engine object + if(!is.Waiver(x$engine)) myLog('[Setup]','yellow','Replacing currently selected engine.') + + # Set engine in distribution object + x$set_engine( + bdproto( + "INLA-Engine", + Engine, + name = "", + data = list( + 'mesh' = mesh, + 'mesh.area' = ar, + 'mesh.bar' = mesh_bar, + 'stk_pred' = optional_projstk, + 'params' = params + ), + # Function to create a mesh + create_mesh = function(self, model){ + assertthat::assert_that(is.list(model), + "background" %in% names(model)) + # Check if mesh is already present, if so use it + if(!is.Waiver(self$get_data("mesh"))) return() + # Create a new mesh based on the available data + + # Get parameters + params <- self$get_data("params") + + # Convert the study region + region.poly <- methods::as(sf::st_geometry(model$background), "Spatial") + + # Convert to boundary object for later + suppressWarnings( + bdry <- INLA::inla.sp2segment( + sp = region.poly, + join = TRUE, + crs = INLA::inla.CRS(projargs = sp::proj4string(region.poly)) + ) + ) + bdry$loc <- INLA::inla.mesh.map(bdry$loc) + + # Try and infer mesh parameters if not set + + # Get all coordinates of observations + locs <- collect_occurrencepoints(model, include_absences = FALSE, + tosf = FALSE) + locs <- locs[,c("x","y")]# Get only the coordinates + + assertthat::assert_that( + nrow(locs)>0, ncol(locs)==2 + ) + + if(is.null(params$max.edge)){ + # A good guess here is usally a max.edge of between 1/3 to 1/5 of the spatial range. + max.edge <- c(diff(range(locs[,1]))/(3*5) , diff(range(locs[,1]))/(3*5) * 2) + params$max.edge <- max.edge + } + if(is.null(params$offset)){ + # Check whether the coordinate system is longlat + if( sf::st_is_longlat(bdry$crs) ){ + # Specify offset as 1/100 of the boundary distance + offset <- c( diff(range(bdry$loc[,1]))*0.01, + diff(range(bdry$loc[,1]))*0.01) + } else { + offset <- c( diff(range(bdry$loc[,1]))*0.01, + diff(range(bdry$loc[,1]))*0.01) + } + params$offset <- offset + } + if(is.null(params$cutoff)){ + # Specify as minimum distance between y coordinates + # Thus capturing most points on this level + # otherwise set to default + val <- min(abs(diff(locs[,2]))) + cutoff <- ifelse(val == 0, 1e-12, val) + params$cutoff <- cutoff + } + + suppressWarnings( + mesh <- INLA::inla.mesh.2d( + # Point localities + loc = locs, + # Boundary object + boundary = bdry, + # Mesh Parameters + max.edge = params$max.edge, + offset = params$offset, + cutoff = params$cutoff, + # Define the CRS + crs = bdry$crs + ) + ) + # Calculate area + # ar <- suppressMessages( + # suppressWarnings( + # mesh_area(mesh = mesh, region.poly = region.poly, variant = params$area) + # ) + # ) + # 06/01/2023: This should work and is identical to inlabru::ipoints + ar <- suppressWarnings( + Matrix::diag( INLA::inla.mesh.fem(mesh = mesh)[[1]] ) + ) + assertthat::assert_that(length(ar) == mesh$n) + + # Now set the output + self$set_data("mesh", mesh) + self$set_data("mesh.area", ar) + + invisible() + }, + # Generic plotting function for the mesh + plot = function(self, assess = FALSE){ + if(is.Waiver(self$get_data('mesh'))) stop("No mesh found!") + + if(assess){ + # For an INLA mesh assessment + out <- INLA::inla.mesh.assessment( + mesh = self$get_data('mesh'), + spatial.range = 3, + alpha = 2, + dims = c(300, 300) + ) + # Convert to raster stack + out <- c( + sp::SpatialPixelsDataFrame( sp::coordinates(out), data = as.data.frame(out), + proj4string = self$get_data('mesh')$crs ) + ) + + terra::plot(out[[c('sd','sd.dev','edge.len')]], + col = c("#00204D","#00336F","#39486B","#575C6D","#707173","#8A8779","#A69D75","#C4B56C","#E4CF5B","#FFEA46") + ) + } else { + INLA:::plot.inla.mesh( self$get_data('mesh') ) + } + }, + # Spatial latent function + # https://groups.google.com/g/r-inla-discussion-group/c/eqMhlbwChkQ/m/m0b0PuzL-PsJ + # Default SPDE prior + # It computes the approximate diameter of the mesh, multiplies by 0.2 to get a value for the prior median range, and then transforms it to log-kappa scale by the formula + # log(sqrt(8*nu)/range) where nu is alpha-dim/2. + calc_latent_spatial = function(self,type = 'spde', alpha = 2, + priors = NULL, + polynames = NULL, + varname = "spatial.field1", + ...){ + # Catch prior objects + if(is.null(priors) || is.Waiver(priors)) priors <- NULL + + # For calculating iCAR process + if(type == 'car'){ + # convert mesh to sf object + ns <- mesh_as_sf(self$data$mesh) + # Create adjacency matrix with queen's case + nc.nb <- spdep::poly2nb(ns, queen = TRUE) + #Convert the adjacency matrix into a file in the INLA format + adjmat <- spdep::nb2mat(nc.nb,style = "B") + adjmat <- methods::as(adjmat, "dgTMatrix") + # adjmat <- INLA::inla.graph2matrix(nc.nb) + # Save the adjaceny matrix as output + self$data$latentspatial <- adjmat + self$data$s.index <- as.numeric(attr(nc.nb,varname)) + } else if(type=='spde'){ + # Check that everything is correctly specified + if(!is.null(priors)) if('spde' %notin% priors$varnames() ) priors <- NULL + + # Use default spde + if(is.null(priors) || is.Waiver(priors)){ + # Define PC Matern SPDE model and save + self$data$latentspatial <- INLA::inla.spde2.matern( + mesh = self$data$mesh, + alpha = alpha + ) + } else { + # Get priors + pr <- if(is.null(priors)) c(0.01, 0.05) else priors$get('spde','prior.range') + ps <- if(is.null(priors)) c(10, 0.05) else priors$get('spde','prior.sigma') + + # Define PC Matern SPDE model and save + self$data$latentspatial <- INLA::inla.spde2.pcmatern( + mesh = self$data$mesh, + alpha = alpha, + # P(Range < 1°) = 0.001 and P(sigma > 0.5) = 0.05 + prior.range = pr, prior.sigma = ps + ) + } + # Make index for spatial field + self$data$s.index <- INLA::inla.spde.make.index(name = varname, + n.spde = self$data$latentspatial$n.spde, + n.group = 1, + n.repl = 1) + # Security checks + assertthat::assert_that( + inherits(self$data$latentspatial,'inla.spde'), + length(self$data$s.index[[1]]) == self$data$mesh$n + ) + } else if(type == 'poly'){ + # Save column names of polynomial transformed coordinates + assertthat::assert_that(!is.null(polynames)) + self$data$latentspatial <- polynames + } + invisible() + }, + # Get latent spatial equation bit + # Set vars to 2 or larger to get copied spde's + get_equation_latent_spatial = function(self, method, vars = 1, separate_spde = FALSE){ + assertthat::assert_that(is.numeric(vars)) + if(method == 'spde'){ + assertthat::assert_that(inherits(self$data$latentspatial, 'inla.spde'), + msg = 'Latent spatial has not been calculated.') + # SPDE string + if(separate_spde){ + ss <- paste0("f(spatial.field",vars,", model = ",method,")") + } else { + if(vars >1){ + ss <- paste0("f(spatial.field",vars,", copy = \'spatial.field1\', model = ",method,", fixed = TRUE)") + } else { + ss <- paste0("f(spatial.field",vars,", model = ",method,")") + } + } + return(ss) + + } else if(method == 'car'){ + assertthat::assert_that(inherits(self$data$latentspatial,'dgTMatrix'), + msg = 'Neighborhood matrix has not been calculated.') + return( + # BESAG model or BYM model to specify + # BYM found to be largely similar to SPDE https://onlinelibrary.wiley.com/doi/pdf/10.1002/ece3.3081 + paste0('f(','spatial.field',', model = "bym", graph = ','adjmat',')') + ) + } + }, + # Configure stack + make_stack = function(self, model, id, intercept = TRUE, joint = FALSE) { + assertthat::assert_that( + is.list(model), + is.character(id) + ) + # Get Environment records + env <- model$predictors + + # Include intercept in here + # TODO: Note that this sets intercepts by type and not by dataset id + if(intercept) { + env$Intercept <- 1 # Overall Intercept + env[[paste0('Intercept', + ifelse(joint,paste0('_', + make.names(tolower(model$name)),'_', + model$type),''))]] <- 1 # Setting Intercept to common type, thus sharing with similar types + } + # Set up projection matrix for the data + suppressWarnings( + mat_proj <- INLA::inla.spde.make.A( + mesh = self$get_data('mesh'), + loc = as.matrix(env[,c('x','y')]) + ) + ) + # Create INLA stack + # The three main inla.stack() arguments are a vector list with the data (data), + # a list of projector matrices (each related to one block effect, + # A) and the list of effects (effects). + + # Response for inla stack + ll_resp <- list() + # Add the expected estimate and observed note + # FIXME: Currently only two likelihoods are supported (binomial/poisson) with the NA order being the determining factor + if(model$family == 'poisson') { + if(joint) ll_resp[[ 'observed' ]] <- cbind(model$observations[['observed']], NA ) + if(!joint) ll_resp[[ 'observed' ]] <- cbind(model$observations[['observed']] ) + ll_resp[[ 'e' ]] <- model$expect + } + if(model$family == 'binomial') { + if(joint) ll_resp[[ 'observed' ]] <- cbind(NA, model$observations[['observed']] ) + if(!joint) ll_resp[[ 'observed' ]] <- cbind( model$observations[['observed']] ) + ll_resp[[ 'Ntrials' ]] <- model$expect + } + + # Effects matrix + ll_effects <- list() + # Note, order adding this is important and matches the A matrix below + # ll_effects[['Intercept']] <- rep(1, nrow(model$observations)) + # ll_effects[['Intercept']][[paste0('Intercept',ifelse(joint,paste0('_',make.names(tolower(model$name)),'_',model$type),''))]] <- seq(1, self$get_data('mesh')$n) # Old code + ll_effects[['predictors']] <- env + ll_effects[['spatial.field1']] <- seq(1, self$get_data('mesh')$n) + + # Add offset if specified + if(!is.null(model$offset)){ + ll_effects[['predictors']] <- cbind( ll_effects[['predictors']], + subset(model[['offset']],select = "spatial_offset") + ) + } + + # Check whether equation has spatial field and otherwise add + # MJ 13/06: Spatial.field now set directly to effects + # if( 'spde' %in% all.vars(model$equation) ){ + # # Get Index Objects + # iset <- self$get_data('s.index') + # ll_effects[['spatial.field']] <- c(ll_effects[['spatial.field']], iset) + # } else if ( 'adjmat' %in% all.vars(model$equation) ){ + # iset <- self$get_data('s.index') + # ll_effects[['spatial.field']] <- c(ll_effects[['spatial.field']], data.frame(spatial.index = iset) ) + # } + # Define A + A <- list(1, mat_proj) + + # Define stack + stk <- INLA::inla.stack( + data = ll_resp, + A = A, + effects = ll_effects, + tag = paste0('stk_',as.character(model$type),'_',id) + ) + # Set the stack + self$set_data(paste0('stk_',as.character(model$type),'_',id), stk) + invisible() + }, + # Main INLA training function ---- + # Setup computation function + setup = function(self, model, settings,...){ + assertthat::assert_that( + 'background' %in% names(model), + 'biodiversity' %in% names(model), + all( model$biodiversity[[1]]$predictors_names %in% model$predictors_names ), + all( sapply(model$biodiversity, function(x) is.formula(x$equation)) ), + length(model$biodiversity)>=1, + msg = 'Some internal checks failed while setting up the model.' + ) + # Messenger + if(getOption('ibis.setupmessages')) myLog('[Estimation]','green','Engine setup.') + + # Set number of threads via set.Options + INLA::inla.setOption(num.threads = getOption('ibis.nthread'), + blas.num.threads = getOption('ibis.nthread')) + + # --- Prepare general inputs --- + # Check whether spatial latent effects were added + if( 'spde' %in% all.vars(model$biodiversity[[1]]$equation) ){ + # Get spatial index + spde <- self$get_data('s.index') + } else { spde <- NULL } + + # Check for existence of specified offset and use the full one in this case + if(!is.Waiver(model$offset)) offset <- subset(model[['offset']],select = "spatial_offset") else offset <- NULL + + # Projection stepsize + params <- self$get_data('params') + if(is.null( params$proj_stepsize )){ + # Set to stepsize equivalent of the resolution of the grid + val <- max(diff(model[['predictors']]$x)) # TODO: Check that it works when dummy variable is used + params$proj_stepsize <- val + self$set_data('params', params ) + rm(val) + } + + # Number of types to determine if a joint model is necessary + nty <- length( unique( as.character(sapply(model$biodiversity, function(z) z$type)) ) ) + + # Clean up previous data and integration stacks + chk <- grep('stk_int|stk_poipo|stk_poipa|stk_polpo|stk_polpa|stk_pred|stk_full', self$list_data()) + if(length(chk)>0) self$data[chk] <- NULL + + # Re-format the full predictors if there are any factor variables + # FIXME: Potentially outsource? + if(any(model$predictors_types$type=="factor")){ + vf <- model$predictors_types$predictors[model$predictors_types$type=="factor"] + for(k in vf){ + o <- explode_factor(model$predictors[[k]],name = k) + model$predictors <- cbind(model$predictors, o) + model$predictors_names <- c(model$predictors_names, colnames(o)) + model$predictors_types <- rbind(model$predictors_types, + data.frame(predictors = colnames(o), type = "numeric") ) + # Finally remove the original column from the predictor object + model$predictors[[k]] <- NULL + model$predictors_names <- model$predictors_names[-which( model$predictors_names == k )] + model$predictors_types <- subset(model$predictors_types, subset = predictors != k) + # FIXME: Hacky solution as to not overwrite predictor object + ras_back <- model$predictors_object$data + # Explode the columns in the raster object + model$predictors_object$data <- c( + model$predictors_object$data, + explode_factorized_raster(model$predictors_object$data[[k]]) + ) + model$predictors_object$data <- terra::subset(model$predictors_object$data, -k) + } + } else { ras_back <- new_waiver() } + + # Now for each dataset create a INLA stack + for(id in 1:length(model$biodiversity) ){ + + # If there any factor variables split them per type and explode them + if(any(model$biodiversity[[id]]$predictors_types$type=="factor")){ + vf <- model$biodiversity[[id]]$predictors_types$predictors[model$biodiversity[[id]]$predictors_types$type=="factor"] + fv <- model$biodiversity[[id]]$predictors[vf] + for(k in 1:ncol(fv)){ + o <- explode_factor(fv[,k],name = colnames(fv)[k]) + # Add + model$biodiversity[[id]]$predictors <- cbind(model$biodiversity[[id]]$predictors, o) + model$biodiversity[[id]]$predictors_names <- c(model$biodiversity[[id]]$predictors_names, colnames(o)) + model$biodiversity[[id]]$predictors_types <- rbind(model$biodiversity[[id]]$predictors_types, + data.frame(predictors = colnames(o), type = "numeric") ) + # Finally remove the original column from the predictor object + model$biodiversity[[id]]$predictors[[colnames(fv)[k]]] <- NULL + model$biodiversity[[id]]$predictors_names <- model$biodiversity[[id]]$predictors_names[-which( model$biodiversity[[id]]$predictors_names == colnames(fv)[k] )] + model$biodiversity[[id]]$predictors_types <- subset(model$biodiversity[[id]]$predictors_types, subset = predictors != colnames(fv)[k]) + + } + } + # Calculate observation stack INLA stack + # Save stacks by id instead of type + self$make_stack(model = model$biodiversity[[id]], + id = names(model$biodiversity)[id], + intercept = TRUE, + joint = ifelse(nty > 1, TRUE, FALSE) + ) + + # Define mesh.area dependent on whether a single variable only is used or not + if(model$biodiversity[[id]]$family == 'poisson'){ + # Only create on if not already existing + chk <- grep('stk_int', self$list_data()) + if(length(chk)==0){ + # Make integration stack for given poisson model + stk_int <- inla_make_integration_stack( + mesh = self$get_data('mesh'), + mesh.area = self$get_data('mesh.area'), + model = model, + id = names(model$biodiversity)[id], + joint = ifelse(nty > 1, TRUE, FALSE) + ) + # Save integration stack + self$set_data(paste0('stk_int_',names(model$biodiversity)[id]),stk_int) + } + } + } + + # ------------------ # + # Get all stacks defined so far and join them + stk_inference <- lapply( + self$list_data()[grep(paste(names(model$biodiversity),collapse = '|'), self$list_data())], + function(x) self$get_data(x) + ) + stk_inference <- do.call(INLA::inla.stack, stk_inference) + + # Clamp? + if( settings$get("clamp") ) model$predictors <- clamp_predictions(model, model$predictors) + + # Make projection stack if not directly supplied + if(is.null(self$data$stk_pred)){ + + stk_pred <- inla_make_projection_stack( + stk_resp = stk_inference, + model = model, + mesh = self$get_data('mesh'), + mesh.area = self$get_data('mesh.area'), + res = self$get_data('params')$proj_stepsize, + type = model$biodiversity[[id]]$type, + background = model$background, + spde = spde, + settings = settings, + joint = ifelse(nty > 1, TRUE, FALSE) + ) + self$set_data('stk_pred', stk_pred) + } else { + # FIXME: Add some basic assertthat tests for when a prediction stack is directly supplied + stk_pred <- self$get_data('stk_pred') + } + + # Now join all stacks and save in full + # Note: If integrated stack is included, E must be set to relative area (in mesh.area). + self$set_data('stk_full', + INLA::inla.stack(stk_inference, stk_pred$stk_proj) + ) + if(!is.Waiver(ras_back)) model$predictors_object$data # Overwrite model object back to avoid issues with other engines. Hacky! + return(model) + }, + train = function(self, model, settings) { + # Check that all inputs are there + assertthat::assert_that( + inherits(settings,'Settings'), + is.list(model),length(model)>1, + # Check that model id and setting id are identical + settings$modelid == model$id, + any( (c('stk_full','stk_pred') %in% names(self$data)) ), + inherits(self$get_data('stk_full'),'inla.data.stack') + ) + # Messenger + if(getOption('ibis.setupmessages')) myLog('[Estimation]','green','Starting fitting...') + + # Get all datasets with id. This includes the data stacks and integration stacks + stk_inference <- lapply( + self$list_data()[grep(paste(names(model$biodiversity),collapse = '|'), self$list_data())], + function(x) self$get_data(x)) + stk_inference <- do.call(INLA::inla.stack, stk_inference) + + # Get full stack and projection grid + stk_full <- self$get_data('stk_full') + predcoords <- self$get_data('stk_pred')$predcoords + + # Get parameters + params <- self$get_data("params") + + # Get families and links + fam <- unique( as.character( sapply(model$biodiversity, function(x) x$family) ) ) + lin <- sapply(model$biodiversity, function(x) x$link) + # Define control family + cf <- list() + for(i in 1:length(fam)) cf[[i]] <- list(link = ifelse(fam[i] == 'poisson','log','cloglog' )) + if(length(fam)==1 && fam == 'binomial') cf[[1]]$link <- 'logit' + + # Shared link? Set to + if(length(fam)==1) {li <- 1} else { li <- NULL} # FIXME: Check whether links have to be set individually per observation + + if('spde' %in% all.vars(model$biodiversity[[1]]$equation) ){ + spde <- self$get_data('latentspatial') + stack_data_resp <- INLA::inla.stack.data(stk_inference, spde = self$get_data('latentspatial')) + stack_data_full <- INLA::inla.stack.data(stk_full, spde = self$get_data('latentspatial')) + } else { + adjmat <- spde <- self$get_data('latentspatial') + stack_data_resp <- INLA::inla.stack.data(stk_inference) + stack_data_full <- INLA::inla.stack.data(stk_full) + } + # ----------- # + # Provided or default formula + master_form <- stats::as.formula( + paste0("observed ~ ", + # # If multiple datasets, remove intercept + ifelse(length(model$biodiversity)>1,"0 + ", ""), + paste0(sapply(model$biodiversity, function(x){ + attr(stats::terms.formula(x$equation),"term.labels") + }) |> c() |> unique(),collapse = " + ") + ) + ) + + # Perform variable selection + if( settings$get(what='optim_hyperparam')){ + if(getOption('ibis.setupmessages')) myLog('[Estimation]','green','Performing backstep variable selection (hacky)...') + + k <- NULL + # Specify offsets and spde to be retained + # FIXME: Also set priors and offsets here? + if(is.Waiver(spde)) k <- NULL else k <- self$get_equation_latent_spatial('spde') + + # Use backward variable elimination + vs <- inla.backstep(master_form = master_form, + stack_data_resp = stack_data_resp, + stk_inference = stk_inference, + fam = fam, + cf = cf,li = li, + response = 'observed', + keep = k + ) + master_form <- to_formula(vs$form) + } + + # ------------------------------------------ # + # Train the model on the response + fit_resp <- INLA::inla(formula = master_form, # The specified formula + data = stack_data_resp, # The data stack + quantiles = c(0.05, 0.5, 0.95), + E = INLA::inla.stack.data(stk_inference)$e, # Expectation (Eta) for Poisson model + Ntrials = INLA::inla.stack.data(stk_inference)$Ntrials, + family = fam, # Family the data comes from + control.family = cf, # Control options + control.predictor = list(A = INLA::inla.stack.A(stk_inference), + link = li, # Link to NULL for multiple likelihoods! + compute = TRUE), # Compute for marginals of the predictors. + control.compute = list(cpo = FALSE, waic = TRUE, config = TRUE), #model diagnostics and config = TRUE gives you the GMRF + # control.fixed = list(mean = 0),# prec = list( initial = log(0.000001), fixed = TRUE)), # Added to see whether this changes GMRFlib convergence issues + verbose = settings$get(what='verbose'), # To see the log of the model runs + control.inla = INLA::control.inla(strategy = params$strategy, + int.strategy = params$int.strategy), + num.threads = getOption('ibis.nthread') + ) + + # Predict spatially + if(!settings$get(what='inference_only')){ + # Messenger + if(getOption('ibis.setupmessages')) myLog('[Estimation]','green','Starting prediction...') + + # Predict on full + fit_pred <- try({INLA::inla(formula = master_form, # The specified formula + data = stack_data_full, # The data stack + quantiles = c(0.05, 0.5, 0.95), + E = INLA::inla.stack.data(stk_full)$e, + Ntrials = INLA::inla.stack.data(stk_full)$Ntrials, + family= fam, # Family the data comes from + control.family = cf, # Control options + control.predictor = list(A = INLA::inla.stack.A(stk_full), + link = li, # Link to NULL for multiple likelihoods! + compute = TRUE), # Compute for marginals of the predictors. + control.compute = list(cpo = FALSE, waic = TRUE, config = TRUE, openmp.strategy = 'huge' ), + # control.mode = list(theta = thetas, restart = FALSE), # To speed up use previous thetas + verbose = settings$get(what='verbose'), # To see the log of the model runs + # control.results = list(return.marginals.random = FALSE, + # return.marginals.predictor = FALSE), # Don't predict marginals to save speed + # control.fixed = INLA::control.fixed(mean = 0),#, prec = list( initial = log(0.000001), fixed = TRUE)), # Added to see whether this changes GMRFlib convergence issues + control.inla = INLA::control.inla(strategy = params$strategy, + int.strategy = params$int.strategy), + num.threads = getOption('ibis.nthread') + ) + },silent = FALSE) + if(inherits(fit_pred,'try-error')) { print(fit_pred); stop('Model did not converge. Try to simplify structure and check priors!') } + # Create a spatial prediction + index.pred <- INLA::inla.stack.index(stk_full, 'stk_pred')$data + # Which type of prediction (linear predictor or response scale) + # The difference between both is that response applies the (inverse of the) link function, + # so it doesn't include the observation distribution part (measurement noise) of posterior predictions. + if(params$type == "predictor"){ + post <- fit_pred$summary.linear.predictor[index.pred, ] + } else { + post <- fit_pred$summary.fitted.values[index.pred, ] + } + assertthat::assert_that(nrow(post)>0, + nrow(post) == nrow(predcoords) ) # Check with cells in projection + # Back-transform for predictor + if(params$type == "predictor"){ + if(length(fam)==1){ + if(fam == 'poisson') post[,c('mean','0.05quant','0.5quant','0.95quant','mode')] <- exp( post[,c('mean','0.05quant','0.5quant','0.95quant','mode')] ) + if(fam == 'binomial') post[,c('mean','0.05quant','0.5quant','0.95quant','mode')] <- logistic(post[,c('mean','0.05quant','0.5quant','0.95quant','mode')]) + } else { + # Joint likelihood of Poisson log and binomial cloglog following Simpson et al. + post[,c('mean','0.05quant','0.5quant','0.95quant','mode')] <- exp( post[,c('mean','0.05quant','0.5quant','0.95quant','mode')] ) + } + } + post <- subset(post, select = c('mean','sd','0.05quant','0.5quant','0.95quant','mode') ) + post$cv <- post$sd / post$mean + # Rename + names(post) <- c("mean", "sd", "q05", "q50", "q95", "mode","cv") + + # Fill prediction + # suppressWarnings( + # prediction <- terra::rast( + # sp::SpatialPixelsDataFrame( + # points = predcoords, + # data = post, + # proj4string = sp::CRS( self$get_data('mesh')$crs@projargs ) # x$engine$data$mesh$crs@projargs + # ) + # ) + # ) + suppressWarnings( + prediction <- terra::rast(cbind( as.data.frame(predcoords), post), type = "xyz", + crs = terra::crs(self$get_data('mesh')$crs@projargs) ) + ) + + prediction <- terra::mask(prediction, model$background) # Mask with background + # Align with background + # temp <- terra::rast( + # sp::SpatialPixelsDataFrame( + # points = model$predictors[,c('x','y')], + # data = model$predictors[,c('x','y')], + # proj4string = sp::CRS( self$get_data('mesh')$crs@projargs ) # x$engine$data$mesh$crs@projargs + # ) + # ) + temp <- emptyraster( model$predictors_object$get_data() ) + prediction <- terra::resample(prediction, temp, method = 'bilinear') + + } else { + # No prediction to be conducted + fit_pred <- NULL + prediction <- NULL + } + + # Compute end of computation time + settings$set('end.time', Sys.time()) + + # Definition of INLA Model object ---- + out <- bdproto( + "INLA-Model", + DistributionModel, + id = model$id, + model = model, + settings = settings, + fits = list( + "fit_best" = fit_resp, + "fit_pred" = fit_pred, + "fit_best_equation" = master_form, + "mesh" = self$get_data('mesh'), + "spde" = self$get_data('latentspatial'), + "prediction" = prediction + ), + # Projection function + project = function(self, newdata, mode = 'coef', backtransf = NULL, layer = "mean"){ + assertthat::assert_that('fit_best' %in% names(self$fits), + is.data.frame(newdata) || is.matrix(newdata), + mode %in% c('coef','sim','full'), + assertthat::has_name(newdata,c('x','y')) + ) + stop("Projection using engine INLA is deprecated. Use engine_inlabru !") + + # Try and guess backtransformation + if(is.null(backtransf)){ + fam <- self$get_data('fit_best')$.args$family + backtransf <- ifelse(fam == 'poisson', exp, logistic) + } + + if(mode == 'coef'){ + # We use the coefficient prediction + out <- coef_prediction(mesh = self$get_data('mesh'), + mod = self, + type = 'mean', + backtransf = backtransf + ) + } else if(mode == 'sim'){ + # Simulate from posterior. Not yet coded + stop('Simulation from posterior not yet implemented. Use inlabru instead!') + } else { + stop('Full prediction not yet added.') + } + # Return result + return(out) + }, + # Partial response + # FIXME: Create external function + partial = function(self, x, x.var, constant = NULL, variable_length = 100, values = NULL, plot = FALSE, type = "response"){ + # Goal is to create a sequence of value and constant and append to existing stack + # Alternative is to create a model-matrix through INLA::inla.make.lincomb() and + # model.matrix(~ vars, data = newDummydata) fed to make.lincomb + # provide via lincomb = M to an INLA call. + # Both should be identical + stop("Partial function not implemented. Consider using inlabru instead!") + # Check that provided model exists and variable exist in model + mod <- self$get_data('fit_best') + assertthat::assert_that(inherits(mod,'inla'), + 'model' %in% names(self), + inherits(x,'BiodiversityDistribution'), + length(x.var) == 1, is.character(x.var), + is.null(constant) || is.numeric(constant) + ) + varn <- mod$names.fixed + variable <- match.arg(x.var, varn, several.ok = FALSE) + assertthat::assert_that(variable %in% varn, length(variable)==1,!is.null(variable)) + + # ------------------ # + # Get all datasets with id in model. This includes the data stacks and integration stacks + stk_inference <- lapply( + x$engine$list_data()[grep(paste(names(model$biodiversity),collapse = '|'), x$engine$list_data())], + function(z) x$engine$get_data(z)) + stk_inference <- do.call(INLA::inla.stack, stk_inference) + # FIXME: Test that this works with SPDE present + stack_data_resp <- INLA::inla.stack.data(stk_inference) + # ------------------ # + + # If constant is null, calculate average across other values + if(is.null(constant)){ + constant <- lapply(stack_data_resp, function(x) mean(x,na.rm = T))[varn[varn %notin% variable]] + } + # For target variable calculate range + variable_range <- range(stack_data_resp[[variable]],na.rm = TRUE) + + # Create dummy data.frame + dummy <- data.frame(observed = rep(NA, variable_length)) + + seq(variable_range[1],variable_range[2],length.out = variable_length) + + # # add sequence of data and na to data.frame. predict those + # control.predictor = list(A = INLA::inla.stack.A(stk_full), + # link = li, # Link to NULL for multiple likelihoods! + # compute = TRUE), # Compute for marginals of the predictors. + + print('Refitting model for partial effect') + ufit <- INLA::inla(formula = stats::as.formula(mod$.args$formula), # The specified formula + data = stk_inference, # The data stack + quantiles = c(0.05, 0.5, 0.95), + E = INLA::inla.stack.data(stk_inference)$e, # Expectation (Eta) for Poisson model + Ntrials = INLA::inla.stack.data(stk_inference)$Ntrials, + family = mod$.args$family, # Family the data comes from + control.family = mod$.args$control.family, # Control options + control.predictor = mod$.args$control.predictor, # Compute for marginals of the predictors. + control.compute = mod$.args$control.compute, + control.fixed = mod$.args$control.fixed, + verbose = FALSE, # To see the log of the model runs + control.inla = mod$.args$control.inla, + num.threads = getOption('ibis.nthread') + ) + control.predictor = list(A = INLA::inla.stack.A(stk_inference)) + + # Plot and return result + }, + # Model convergence check + has_converged = function(self){ + fit <- self$get_data("fit_best") + if(is.Waiver(fit)) return(FALSE) + return(TRUE) + }, + # Get residuals + get_residuals = function(self){ + new_waiver() + }, + # Get coefficients + get_coefficients = function(self){ + # Returns a vector of the coefficients with direction/importance + cofs <- self$summary() + cofs <- subset(cofs, select = c("variable", "mean", "sd")) + names(cofs) <- c("Feature", "Beta", "Sigma") + # Remove intercept(s) + int <- grep("Intercept",cofs$Feature,ignore.case = TRUE) + if(length(int)>0) cofs <- cofs[-int,] + return(cofs) + }, + # Function to plot SPDE if existing + plot_spatial = function(self, dim = c(300,300), kappa_cor = FALSE, what = "spatial.field1", ...){ + assertthat::assert_that(is.vector(dim), + is.character(what)) + + if( length( self$fits$fit_best$size.spde2.blc ) == 1) + { + # Get spatial projections from model + # FIXME: Potentially make the plotting of this more flexible + gproj <- INLA::inla.mesh.projector(self$get_data('mesh'), dims = dim) + g.mean <- INLA::inla.mesh.project(gproj, + self$get_data('fit_best')$summary.random[[what]]$mean) + g.sd <- INLA::inla.mesh.project(gproj, self$get_data('fit_best')$summary.random[[what]]$sd) + + # Convert to rasters + g.mean <- t(g.mean) + g.mean <- g.mean[rev(1:length(g.mean[,1])),] + r.m <- terra::rast(g.mean, + xmin = range(gproj$x)[1], xmax = range(gproj$x)[2], + ymin = range(gproj$y)[1], ymax = range(gproj$y)[2], + crs = terra::crs(self$get_data('mesh')$crs) + ) + g.sd <- t(g.sd) + g.sd <- g.sd[rev(1:length(g.sd[,1])),] + r.sd <- terra::rast(g.sd, + xmn = range(gproj$x)[1], xmx = range(gproj$x)[2], + ymn = range(gproj$y)[1], ymx = range(gproj$y)[2], + crs = terra::crs( self$get_data('mesh')$crs) + ) + + spatial_field <- c(r.m, r.sd);names(spatial_field) <- c('SPDE_mean','SPDE_sd') + # Mask with prediction if exists + if(!is.null(self$get_data('prediction'))){ + spatial_field <- terra::resample(spatial_field, self$get_data('prediction')[[1]]) + spatial_field <- terra::mask(spatial_field, self$get_data('prediction')[[1]]) + } + + # -- # + if(kappa_cor){ + # Also build correlation fun + # Get SPDE results + spde_results <- INLA::inla.spde2.result( + inla = self$get_data('fit_best'), + name = what, + spde = self$get_data('spde'), + do.transfer = TRUE) + + # Large kappa (inverse range) equals a quick parameter change in space. + # Small kappa parameter have much longer, slower gradients. + Kappa <- INLA::inla.emarginal(function(x) x, spde_results$marginals.kappa[[1]]) + sigmau <- INLA::inla.emarginal(function(x) sqrt(x), spde_results$marginals.variance.nominal[[1]]) + r <- INLA::inla.emarginal(function(x) x, spde_results$marginals.range.nominal[[1]]) + + # Get Mesh and distance between points + mesh <- self$get_data('mesh') + D <- as.matrix( stats::dist(mesh$loc[, 1:2]) ) + + # Distance vector. + dis.cor <- data.frame(distance = seq(0, max(D), length = 100)) + # Maximum distance by quarter of extent + dis.max <- abs((xmin(self$get_data('prediction')) - xmax(self$get_data('prediction')) ) / 2) # Take a quarter of the max distance + + # Modified Bessel function to get correlation strength + dis.cor$cor <- as.numeric((Kappa * dis.cor$distance) * base::besselK(Kappa * dis.cor$distance, 1)) + dis.cor$cor[1] <- 1 + # --- + # Build plot + graphics::layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE)) + plot(dis.cor$cor ~ dis.cor$distance, type = 'l', lwd = 3, + xlab = 'Distance (proj. unit)', ylab = 'Correlation', main = paste0('Kappa: ', round(Kappa,2) ) ) + graphics::abline(v = dis.max,lty = 'dotted') + plot(spatial_field[[1]],col = ibis_colours[['viridis_cividis']], main = 'mean spatial effect') + plot(spatial_field[[2]], main = 'sd spatial effect') + } else { + # Just plot the SPDE + graphics::par(mfrow=c(1,2)) + plot(spatial_field[[1]],col = ibis_colours[['viridis_cividis']], main = 'mean spatial effect') + plot(spatial_field[[2]], main = 'sd spatial effect') + # And return + return(spatial_field) + } + + + } else { + message(text_red('No spatial covariance in model specified.')) + } + } + ) + return(out) + } + )) +} diff --git a/R/engine_inlabru.R b/R/engine_inlabru.R index ac8d3620..63a921dc 100644 --- a/R/engine_inlabru.R +++ b/R/engine_inlabru.R @@ -6,10 +6,10 @@ NULL #' @description Model components are specified with general inputs and mapping methods to the #' latent variables, and the predictors are specified via general R expressions, #' with separate expressions for each observation likelihood model in multi-likelihood models. -#' The inlabru engine - similar as the [`engine_inla`] function acts a wrapper for [INLA::inla], -#' albeit [inlabru] has a number of convenience functions implemented that make in particular predictions +#' The inlabru engine - similar as the [`engine_inla`] function acts a wrapper for INLA, +#' albeit \code{"inlabru"} has a number of convenience functions implemented that make in particular predictions #' with new data much more straight forward (e.g. via posterior simulation instead of fitting). -#' Since more recent versions [inlabru] also supports the addition of multiple likelihoods, therefore +#' Since more recent versions \code{"inlabru"} also supports the addition of multiple likelihoods, therefore #' allowing full integrated inference. #' @details #' All \code{INLA} engines require the specification of a mesh that needs to be provided to the @@ -32,7 +32,7 @@ NULL #' @note #' **How INLA Meshes are generated, substantially influences prediction outcomes. See Dambly et al. (2023).** #' @param x [distribution()] (i.e. [`BiodiversityDistribution-class`]) object. -#' @param optional_mesh A directly supplied [`INLA`] mesh (Default: \code{NULL}) +#' @param optional_mesh A directly supplied \code{"INLA"} mesh (Default: \code{NULL}) #' @param max.edge The largest allowed triangle edge length, must be in the same scale units as the coordinates. #' Default is an educated guess (Default: \code{NULL}). #' @param offset interpreted as a numeric factor relative to the approximate data diameter. @@ -53,7 +53,8 @@ NULL #' * Dambly, L. I., Isaac, N. J., Jones, K. E., Boughey, K. L., & O'Hara, R. B. (2023). Integrated species distribution models fitted in INLA are sensitive to mesh parameterisation. Ecography, e06391. #' @source [https://inlabru-org.github.io/inlabru/articles/](https://inlabru-org.github.io/inlabru/articles/) #' @family engine -#' @returns An [engine]. +#' @returns An [Engine]. +#' @aliases engine_inlabru #' @examples #' \dontrun{ #' # Add inlabru as an engine @@ -183,6 +184,7 @@ engine_inlabru <- function(x, # Get all coordinates of observations locs <- collect_occurrencepoints(model, include_absences = FALSE) + locs <- locs[,c("x","y")] # Take only the coordinates # Try and infer mesh parameters if not set if(is.null(params$max.edge)){ @@ -1136,7 +1138,9 @@ engine_inlabru <- function(x, proj4string = self$get_data('mesh')$crs ) df_partial <- subset(df_partial, stats::complete.cases(df_partial@data)) # Remove missing data - df_partial <- methods::as(df_partial, 'SpatialPixelsDataFrame') + suppressWarnings( + df_partial <- methods::as(df_partial, 'SpatialPixelsDataFrame') + ) # Add all others as constant if(is.null(constant)){ @@ -1186,6 +1190,23 @@ engine_inlabru <- function(x, ) } }, + # Model convergence check + has_converged = function(self){ + fit <- self$get_data("fit_best") + if(is.Waiver(fit)) return(FALSE) + return(TRUE) + }, + # Residual function + get_residuals = function(self){ + # Get best object + obj <- self$get_data("fit_best") + if(is.Waiver(obj)) return(obj) + # Get residuals + rd <- obj$residuals$deviance.residuals + assertthat::assert_that(length(rd)>0) + return(rd) + }, + # Get coefficients get_coefficients = function(self){ # Returns a vector of the coefficients with direction/importance cofs <- self$summary() diff --git a/R/engine_stan.R b/R/engine_stan.R index 2eb95495..59642ee0 100644 --- a/R/engine_stan.R +++ b/R/engine_stan.R @@ -8,7 +8,7 @@ NULL #' such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. #' Stan code has to be written separately and this function acts as compiler to #' build the stan-model. -#' **Requires the [cmdstanr] package to be installed!** +#' **Requires the \code{"cmdstanr"} package to be installed!** #' @details #' By default the posterior is obtained through sampling, however stan also supports #' approximate inference forms through penalized maximum likelihood estimation (see Carpenter et al. 2017). @@ -20,14 +20,14 @@ NULL #' adaptation is run (and hence these warmup samples should not be used for inference). #' The number of warmup iterations should be smaller than \code{iter} and the default is \code{iter/2}. #' @param cores If set to NULL take values from specified ibis option \code{getOption('ibis.nthread')}. -#' @param init Initial values for parameters (Default: \code{'random'}). Can also be specified as [list] (see: [`rstan::stan`]) +#' @param init Initial values for parameters (Default: \code{'random'}). Can also be specified as [list] (see: \code{"rstan::stan"}) #' @param algorithm Mode used to sample from the posterior. Available options are \code{"sampling"}, \code{"optimize"}, #' or \code{"variational"}. -#' See [`cmdstanr`] package for more details. (Default: \code{"sampling"}). -#' @param control See [`rstan::stan`] for more details on specifying the controls. +#' See \code{"cmdstanr"} package for more details. (Default: \code{"sampling"}). +#' @param control See \code{"rstan::stan"} for more details on specifying the controls. #' @param type The mode used for creating posterior predictions. Either summarizing the linear \code{"predictor"} or \code{"response"} (Default: \code{"response"}). #' @param ... Other variables -#' @seealso [rstan], [cmdstanr] +#' @seealso rstan, cmdstanr #' @note #' The function \code{obj$stancode()} can be used to print out the stancode of the model. #' @references @@ -35,7 +35,8 @@ NULL #' * Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., ... & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of statistical software, 76(1), 1-32. #' * Piironen, J., & Vehtari, A. (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2), 5018-5051. #' @family engine -#' @returns An [engine]. +#' @aliases engine_stan +#' @returns An [Engine]. #' @examples #' \dontrun{ #' # Add Stan as an engine @@ -739,8 +740,8 @@ engine_stan <- function(x, if(plot){ o <- pred_part pm <- ggplot2::ggplot(data = o, ggplot2::aes(x = partial_effect, y = mean, - ymin = mean-sd, - ymax = mean+sd) ) + + ymin = mean-stats::sd, + ymax = mean+stats::sd) ) + ggplot2::theme_classic() + ggplot2::geom_ribbon(fill = "grey90") + ggplot2::geom_line() + @@ -816,6 +817,19 @@ engine_stan <- function(x, } return(prediction) }, + # Model convergence check + has_converged = function(self){ + fit <- self$get_data("fit_best") + if(is.Waiver(fit)) return(FALSE) + return(TRUE) + }, + # Residual function + get_residuals = function(self){ + # Get best object + message("Not yet implemented.. :-( ") + new_waiver() + }, + # Get coefficients get_coefficients = function(self){ # Returns a vector of the coefficients with direction/importance cofs <- self$summary() diff --git a/R/engine_xgboost.R b/R/engine_xgboost.R index 6d45260d..1c77ccdb 100644 --- a/R/engine_xgboost.R +++ b/R/engine_xgboost.R @@ -22,7 +22,7 @@ NULL #' Lower values generally being better but also computationally more costly. (Default: \code{1e-3}) #' @param iter [`numeric`] value giving the the maximum number of boosting iterations for cross-validation (Default: \code{8e3L}). #' @param gamma [`numeric`] A regularization parameter in the model. Lower values for better estimates (Default: \code{3}). -#' Also see [reg_lambda] parameter for the L2 regularization on the weights +#' Also see \code{"reg_lambda"} parameter for the L2 regularization on the weights #' @param reg_lambda [`numeric`] L2 regularization term on weights (Default: \code{0}). #' @param reg_alpha [`numeric`] L1 regularization term on weights (Default: \code{0}). #' @param max_depth [`numeric`] The Maximum depth of a tree (Default: \code{3}). @@ -38,7 +38,8 @@ NULL #' @references #' * Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, https://arxiv.org/abs/1603.02754 #' @family engine -#' @returns An [engine]. +#' @aliases engine_xgboost +#' @returns An [Engine]. #' @examples #' \dontrun{ #' # Add xgboost as an engine @@ -644,7 +645,8 @@ engine_xgboost <- function(x, if(!is.null(constant)) message("Constant is ignored for xgboost!") check_package("pdp") mod <- self$get_data('fit_best') - df <- self$model$biodiversity[[length( self$model$biodiversity )]]$predictors + model <- self$model + df <- model$biodiversity[[length( model$biodiversity )]]$predictors df <- subset(df, select = mod$feature_names) # Match x.var to argument @@ -654,6 +656,14 @@ engine_xgboost <- function(x, x.var <- match.arg(x.var, mod$feature_names, several.ok = FALSE) } + # Calculate range of predictors + if(any(model$predictors_types$type=="factor")){ + rr <- sapply(df[model$predictors_types$predictors[model$predictors_types$type=="numeric"]], + function(x) range(x, na.rm = TRUE)) |> as.data.frame() + } else { + rr <- sapply(df, function(x) range(x, na.rm = TRUE)) |> as.data.frame() + } + # if values are set, make sure that they cover the data.frame if(!is.null(values)){ assertthat::assert_that(length(x.var) == 1) @@ -667,7 +677,11 @@ engine_xgboost <- function(x, df2 <- df2 |> as.data.frame() df2 <- df2[, mod$feature_names] } else { - df2 <- df + df2 <- list() + for(i in x.var) { + df2[[i]] <- base::as.data.frame(seq(rr[1,i],rr[2,i], length.out = variable_length)) + } + df2 <- do.call(cbind, df2); names(df2) <- x.var } # Check that variables are in @@ -800,6 +814,31 @@ engine_xgboost <- function(x, prediction[] <- pred_xgb return(prediction) }, + # Model convergence check + has_converged = function(self){ + fit <- self$get_data("fit_best") + if(is.Waiver(fit)) return(FALSE) + # Get evaluation log + evl <- fit$evaluation_log + if(fit$best_iteration >= (nrow(evl)-(nrow(evl)*.01))) return(FALSE) + return(TRUE) + }, + # Residual function + get_residuals = function(self){ + # Get best object + obj <- self$get_data("fit_best") + if(is.Waiver(obj)) return(obj) + message("Not yet implemented!") + return(new_waiver()) + # Get residuals + model <- self$model + pred <- model$biodiversity[[length(model$biodiversity)]] + predf <- pred$predictors |> subset(select = obj$feature_names) + newdata <- xgboost::xgb.DMatrix(as.matrix(predf)) + + fn <- predict(obj, newdata,type = "class") + return(fn) + }, # Get coefficients get_coefficients = function(self){ # Returns a vector of the coefficients with direction/importance diff --git a/R/ensemble.R b/R/ensemble.R index d1847ec8..98c7d2a2 100644 --- a/R/ensemble.R +++ b/R/ensemble.R @@ -36,7 +36,7 @@ NULL #' #' @note #' If a list is supplied, then it is assumed that each entry in the list is a fitted [`DistributionModel`] object. -#' Take care not to create an ensemble of models constructed with different link functions, e.g. [logistic] vs [log]. In this case +#' Take care not to create an ensemble of models constructed with different link functions, e.g. logistic vs [log]. In this case #' the \code{"normalize"} parameter has to be set. #' @param ... Provided [`DistributionModel`] objects. #' @param method Approach on how the ensemble is to be created. See details for available options (Default: \code{'mean'}). @@ -75,7 +75,7 @@ methods::setGeneric("ensemble", #' @name ensemble #' @rdname ensemble -#' @usage \S4method{ensemble}{ANY, character, numeric, numeric, character, logical, character}(..., method, weights, min.value, layer, normalize, uncertainty) +#' @usage \S4method{ensemble}{ANY,character,numeric,numeric,character,logical,character}(...,method,weights,min.value,layer,normalize,uncertainty) methods::setMethod( "ensemble", methods::signature("ANY"), @@ -215,9 +215,9 @@ methods::setMethod( } else rasp <- NULL # Add uncertainty ras_uncertainty <- switch (uncertainty, - "sd" = terra::app(ras, sd, na.rm = TRUE), - "cv" = terra::app(ras, sd, na.rm = TRUE) / terra::mean(ras, na.rm = TRUE), - "range" = terra::max(ras, na.rm = TRUE) - terra::min(ras, na.rm = TRUE), + "sd" = terra::app(ras, stats::sd, na.rm = TRUE), + "cv" = terra::app(ras, stats::sd, na.rm = TRUE) / terra::mean(ras, na.rm = TRUE), + "range" = max(ras, na.rm = TRUE) - min(ras, na.rm = TRUE), "pca" = terra::mean(rasp, na.rm = TRUE) ) names(ras_uncertainty) <- paste0(uncertainty, "_", lyr) @@ -296,11 +296,14 @@ methods::setMethod( # Add attributes on the method of ensemble attr(new, "method") <- method if(uncertainty != "none"){ + if(uncertainty == "pca") { + stop("Currently, uncertainty = 'pca' is not implemented for SpatRaster input.") + } # Add uncertainty ras_uncertainty <- switch (uncertainty, "sd" = terra::app(ras, fun = "sd", na.rm = TRUE), "cv" = terra::app(ras, fun = "sd", na.rm = TRUE) / terra::mean(ras, na.rm = TRUE), - "range" = terra::max(ras, na.rm = TRUE) - terra::min(ras, na.rm = TRUE) + "range" = max(ras, na.rm = TRUE) - min(ras, na.rm = TRUE) ) names(ras_uncertainty) <- paste0(uncertainty, "_", lyr) # Add attributes on the method of ensembling @@ -385,10 +388,13 @@ methods::setMethod( # --- # if(uncertainty != 'none'){ + if(uncertainty == "pca") { + stop("Currently, uncertainty = 'pca' is not implemented for stars input.") + } # Add uncertainty out_uncertainty <- switch (uncertainty, - "sd" = apply(lmat[,4:ncol(lmat)], 1, function(x) sd(x, na.rm = TRUE)), - "cv" = apply(lmat[,4:ncol(lmat)], 1, function(x) sd(x, na.rm = TRUE) / mean(x, na.rm = TRUE)), + "sd" = apply(lmat[,4:ncol(lmat)], 1, function(x) stats::sd(x, na.rm = TRUE)), + "cv" = apply(lmat[,4:ncol(lmat)], 1, function(x) stats::sd(x, na.rm = TRUE) / mean(x, na.rm = TRUE)), "range" = apply(lmat[,4:ncol(lmat)], 1, function(x) (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))) ) if(any(is.infinite(out_uncertainty))) out_uncertainty[is.infinite(out_uncertainty)] <- NA @@ -439,7 +445,7 @@ methods::setMethod( #' #' @note #' If a list is supplied, then it is assumed that each entry in the list is a fitted [`DistributionModel`] object. -#' Take care not to create an ensemble of models constructed with different link functions, e.g. [logistic] vs [log]. +#' Take care not to create an ensemble of models constructed with different link functions, e.g. logistic vs [log]. #' By default the response functions of each model are normalized. #' @param ... Provided [`DistributionModel`] objects from which partial responses can be called. In the future provided data.frames might be supported as well. #' @param x.var A [`character`] of the variable from which an ensemble is to be created. @@ -466,7 +472,7 @@ methods::setGeneric("ensemble_partial", #' @name ensemble_partial #' @rdname ensemble_partial -#' @usage \S4method{ensemble_partial}{ANY, character, character, character, logical}(..., x.var, method, layer, normalize) +#' @usage \S4method{ensemble_partial}{ANY,character,character,character,logical}(...,x.var,method,layer,normalize) methods::setMethod( "ensemble_partial", methods::signature("ANY"), @@ -540,13 +546,13 @@ methods::setMethod( if(method == 'mean'){ new <- aggregate(out[,layer], by = list(partial_effect = out$partial_effect), FUN = function(x = out[[layer]]) { - return(cbind( mean = mean(x), sd = sd(x))) + return(cbind( mean = mean(x), sd = stats::sd(x))) }) |> as.matrix() |> as.data.frame() colnames(new) <- c("partial_effect", "mean", "sd") } else if(method == 'median'){ new <- aggregate(out[,layer], by = list(partial_effect = out$partial_effect), FUN = function(x = out[[layer]]) { - return(cbind( median = stats::median(x), mad = mad(x))) + return(cbind( median = stats::median(x), mad = stats::mad(x))) }) |> as.matrix() |> as.data.frame() colnames(new) <- c("partial_effect", "median", "mad") } diff --git a/R/ibis.iSDM-package.R b/R/ibis.iSDM-package.R index 8c7ffdda..0f1f0c40 100644 --- a/R/ibis.iSDM-package.R +++ b/R/ibis.iSDM-package.R @@ -1,30 +1,31 @@ -#' @keywords internal -"_PACKAGE" - -#' @importFrom foreach %do% %dopar% -#' @importFrom methods as -#' @importFrom stats effects - -## usethis namespace: start -## usethis namespace: end -NULL - -globalVariables(c("background", "band", "bi_class", "bias", - "change", "cid", - "data", - "form2", - "id", "included", - "km", - "limit", "lower", - "median", "model", - "name", - "observed", "oversampled", - "partial_effect", "polpo", "predictor", "predictors", - "q05", "q50", "q95", - "ras", "region.poly", - "s", "state", "suitability", - "tothin", "type", "time", - "upper", - "var1", "var2", "value", "variable", - "x", "y", "z", - ".")) +#' @keywords internal +"_PACKAGE" + +#' @importFrom foreach %do% %dopar% +#' @importFrom methods as +#' @importFrom stats effects + +## usethis namespace: start +## usethis namespace: end +NULL + +globalVariables(c("background", "band", "bi_class", "bias", + "change", "cid", + "data", + "form2", + "geometry", + "id", "included", + "km", + "limit", "lower", + "median", "model", + "name", + "observed", "oversampled", + "partial_effect", "polpo", "predictor", "predictors", + "q05", "q50", "q95", + "ras", "region.poly", + "s", "state", "suitability", + "tothin", "type", "time", + "upper", + "var1", "var2", "value", "variable", + "x", "y", "z", + ".")) diff --git a/R/identifier.R b/R/identifier.R index 23f02940..7ef4c540 100644 --- a/R/identifier.R +++ b/R/identifier.R @@ -11,7 +11,7 @@ NULL #' #' @details Identifiers are made using the [uuid::UUIDgenerate()]. #' -#' @return [`Id`] object. +#' @return \code{"Id"} object. #' #' @seealso [uuid::UUIDgenerate()]. #' @@ -28,7 +28,7 @@ NULL #' # check if it is an Id object #' is.Id(i) #' -#' @aliases Id +#' @aliases Id, new_id #' @name new_id #' @keywords misc #' @export diff --git a/R/limiting.R b/R/limiting.R new file mode 100644 index 00000000..d95030a6 --- /dev/null +++ b/R/limiting.R @@ -0,0 +1,104 @@ +#' @include utils.R +NULL + +#' Identify local limiting factor +#' +#' @description +#' Calculates a [`SpatRaster`] of locally limiting factors from a given projected model. +#' To calculate this first the [`spartial`] effect of each individual covariate +#' in the model is calculated. +#' +#' The effect is estimated as that variable most responsible for decreasing suitability +#' at that cell. The decrease in suitability is calculated, +#' for each predictor in turn, relative to thesuitability +#' that would be achieved if that predictor took the value equal to the mean +#' The predictor associated with the largest decrease in suitability is +#' the most limiting factor. +#' +#' @param mod A fitted \code{'DistributionModel'} object from which limited factors are to +#' be identified. +#' @param plot Should the result be plotted? (Default: \code{TRUE}). +#' @concept Partly inspired by the rmaxent package. +#' @return A `terra` object of the most important variable for a given grid cell. +#' @examples +#' \dontrun{ +#' o <- limiting(fit) +#' plot(o) +#' } +#' @aliases limiting +#' @references +#' * Elith, J., Kearney, M. and Phillips, S. (2010), [The art of modelling range-shifting species](http://doi.org/10.1111/j.2041-210X.2010.00036.x). _Methods in Ecology and Evolution_, 1: 330-342. doi: 10.1111/j.2041-210X.2010.00036.x +#' @keywords partial +#' +#' @name limiting +#' @exportMethod limiting +#' @export +methods::setGeneric("limiting", + signature = methods::signature("mod"), + function(mod, plot = TRUE) standardGeneric("limiting")) + +#' @name limiting +#' @rdname limiting +#' @usage \S4method{limiting}{ANY,logical}(mod,plot) +methods::setMethod( + "limiting", + methods::signature("ANY"), + function(mod, plot = TRUE){ + assertthat::assert_that(!missing(mod) || inherits(mod,'DistributionModel'), + is.logical(plot)) + + # Assert that prediction is there + assertthat::assert_that(is.Raster(mod$get_data())) + + # Get model object and variables respectively + model <- mod$model + pred <- mod$get_data()[[1]] + # If threshold is set, mask + if(!is.Waiver(mod$get_thresholdvalue())){ + tr <- mod$get_thresholdvalue() + pred[pred < tr] <- NA + } + vars <- model$predictors_names + + # Output spatRaster + ras <- terra::rast() + pb <- progress::progress_bar$new(total = length(vars), + format = "Processing :what") + for(v in vars){ + pb$tick(tokens = list(what = v)) + + # Build spartial model + o <- try({spartial(mod, x.var = v, plot = FALSE)},silent = TRUE) + # If error it is assumed that variable is not in model (regularized out) + if(inherits(o, "try-error")) next() + o <- o[[1]]# Get only the first layer (mean) + names(o) <- v + suppressWarnings( ras <- c(ras,o) ) + rm(o) + } + assertthat::assert_that(terra::nlyr(ras)>1, + msg = "Only a single or no spartial coefficients could be computed!") + + # Determine the maximum value + out <- terra::which.max(ras - pred) + names(out) <- "Limiting.variable" + out <- terra::mask(out, model$background) + + # Define factor levels + cls <- data.frame(id = unique(out)[,1]) + cls$cover <- names(ras)[cls$id] + levels(out) <- cls + assertthat::assert_that(is.factor(out)) + + # Plot the result + if(plot){ + cols <- ibis_colours$distinct_random[1:nrow(cls)] + terra::plot(out, col = cols, + grid = FALSE, + smooth = FALSE, + all_levels = FALSE, + axes = FALSE) + } + return(out) + } +) diff --git a/R/misc.R b/R/misc.R index c06f1360..053f528e 100644 --- a/R/misc.R +++ b/R/misc.R @@ -7,6 +7,7 @@ NULL #' @name ibis_colours #' @examples #' ibis_colours[['viridis_plasma']] +#' @aliases ibis_colours #' @keywords internal #' @noRd ibis_colours <- list( @@ -45,10 +46,9 @@ ibis_colours <- list( #' * \code{'ibis.use_future'} : [`logical`] on whether the \pkg{future} package should be used for parallel computing. #' @return The output of \code{getOptions} for all ibis related variables. #' @keywords misc +#' @aliases ibis_options #' @examples -#' \dontrun{ #' ibis_options() -#' } #' @export ibis_options <- function(){ what <- grep('ibis',names(options()),value = TRUE) @@ -70,6 +70,7 @@ ibis_options <- function(){ #' @param deps A [`vector`] with the names of the packages to be installed (Default: \code{"ibis.dependencies"} in [`ibis_options`]). #' @param update A [`logical`] flag of whether all (installed) packages should also be checked for updates (Default: \code{TRUE}). #' @returns Nothing. Packages will be installed. +#' @aliases ibis_dependencies #' @examples #' \dontrun{ #' # Install and update all dependencies @@ -126,6 +127,7 @@ ibis_dependencies <- function(deps = getOption("ibis.dependencies"), update = TR #' (Default: \code{"multisession"}). #' @seealso [future] #' @return None +#' @aliases ibis_future #' @examples #' \dontrun{ #' # Starts future job diff --git a/R/partial.R b/R/partial.R index f6979f3b..640c97ab 100644 --- a/R/partial.R +++ b/R/partial.R @@ -3,7 +3,8 @@ NULL #' Obtain partial effects of trained model #' -#' Create a partial response or effect plot of a trained model +#' @description +#' Create a partial response or effect plot of a trained model. #' #' @param mod A trained `DistributionModel` object with \code{fit_best} model within. #' @param x.var A [character] indicating the variable for which a partial effect is to be calculated. @@ -18,6 +19,7 @@ NULL #' @details By default the mean is calculated across all parameters that are not \code{x.var}. #' Instead a *constant* can be set (for instance \code{0}) to be applied to the output. #' @return A [data.frame] with the created partial response. +#' @aliases partial #' @examples #' \dontrun{ #' # Do a partial calculation of a trained model @@ -33,7 +35,7 @@ methods::setGeneric( #' @name partial #' @rdname partial -#' @usage \S4method{partial}{ANY,character}(mod, x.var) +#' @usage \S4method{partial}{ANY,character,ANY,numeric,ANY,logical,character}(mod,x.var,constant,variable_length,values,plot,type,...) methods::setMethod( "partial", methods::signature(mod = "ANY", x.var = "character"), @@ -76,6 +78,7 @@ partial.DistributionModel <- function(mod, ...) mod$partial(...) #' @details By default the [mean] is calculated across all parameters that are not \code{x.var}. #' Instead a *constant* can be set (for instance \code{0}) to be applied to the output. #' @returns A [SpatRaster] containing the mapped partial response of the variable. +#' @aliases spartial #' @examples #' \dontrun{ #' # Create and visualize the spartial effect @@ -91,7 +94,7 @@ methods::setGeneric( #' @name spartial #' @rdname spartial -#' @usage \S4method{spartial}{ANY,character}(mod, x.var) +#' @usage \S4method{spartial}{ANY,character,ANY,logical}(mod,x.var,constant,plot,...) methods::setMethod( "spartial", methods::signature(mod = "ANY", x.var = "character"), @@ -116,3 +119,208 @@ methods::setMethod( #' @keywords partial #' @export spartial.DistributionModel <- function(mod, ...) mod$spartial(...) + +#--------------------------# +#' Visualize the density of the data over the environmental data +#' +#' @description +#' Based on a fitted model, plot the density of observations over the estimated variable and environmental space. +#' Opposed to the [partial] and [spartial] functions, which are rather low-level interfaces, this function provides more +#' detail in the light of the data. It is also able to contrast different variables against each other and show the used data. +#' +#' @details +#' This functions calculates the observed density of presence and absence points over the whole surface of a specific +#' variable. It can be used to visually inspect the fit of the model to data. +#' +#' @note +#' By default all variables that are not \code{x.var} are hold constant at the mean. +#' @param mod A trained `DistributionModel` object. Requires a fitted model and inferred prediction. +#' @param x.var A [character] indicating the variable to be investigated. Can be a [`vector`] of length \code{1} or \code{2}. +#' @param df [`logical`] if plotting data should be returned instead (Default: \code{FALSE}). +#' @param ... Other engine specific parameters. +#' +#' @seealso [partial] +#' @concept Visual style emulated from ENMTools package. +#' @references +#' * Warren, D.L., Matzke, N.J., Cardillo, M., Baumgartner, J.B., Beaumont, L.J., Turelli, M., Glor, R.E., Huron, N.A., Simões, M., Iglesias, T.L. Piquet, J.C., and Dinnage, R. 2021. ENMTools 1.0: an R package for comparative ecological biogeography. Ecography, 44(4), pp.504-511. +#' @returns A [`ggplot2`] object showing the marginal response in light of the data. +#' @aliases partial_density +#' @examples +#' \dontrun{ +#' # Do a partial calculation of a trained model +#' partial_density(fit, x.var = "Forest.cover") +#' # Or with two variables +#' partial_density(fit, x.var = c("Forest.cover", "bio01")) +#' } +#' @keywords partial +#' @export +#' @name partial_density +methods::setGeneric( + "partial_density", + signature = methods::signature("mod","x.var"), + function(mod, x.var, df = FALSE,...) standardGeneric("partial_density")) + +#' @name partial_density +#' @rdname partial_density +#' @usage \S4method{partial_density}{ANY,character,logical}(mod,x.var,df,...) +methods::setMethod( + "partial_density", + methods::signature(mod = "ANY", x.var = "character"), + function(mod, x.var, df = FALSE,...) { + assertthat::assert_that(!missing(x.var),msg = 'Specify a variable name in the model!') + assertthat::assert_that(inherits(mod, "DistributionModel"), + all(is.character(x.var)), + length(x.var)>=1, length(x.var) <=2, + is.logical(df) + ) + # Check that mod has all the information necessary + assertthat::assert_that(!is.Waiver(mod$get_data()), + !is.Waiver(mod$get_data("fit_best")) + ) + + # Ensure that x.var variables are among the coefficients + assertthat::assert_that(all(x.var %in% mod$get_coefficients()[[1]]), + msg = "Provided variables not found among the coefficients!") + + # --- # + model <- mod$model # Get model + pred <- mod$get_data() # Get prediction + # Get target variables from the model object + vars <- model$predictors_object$get_data()[[x.var]] + assertthat::assert_that(is.Raster(pred), is.Raster(vars)) + + # Collect observations from model object + obs <- collect_occurrencepoints(model = model, + include_absences = TRUE, + addName = TRUE, + tosf = FALSE) + + # Get variable bounds + vars_lims <- terra::minmax(vars) + if(any(is.na(vars_lims))){ + # Quick check in case range can not be correctly extracted + vars <- terra::setMinMax(vars) + vars_lims <- terra::minmax(vars) + } + + # --- # + normalise <- function(z) (z - min(z))/(max(z)-min(z)) + # Extract variables and make partial prediction + out <- data.frame() + for(v in x.var){ + # Create a partial prediction with the model + pp <- partial.DistributionModel(mod, v, + values = seq(vars_lims["min",v], vars_lims["max",v], length = 100), + variable_length = 100, plot = FALSE, type = "response") + # Normalize and save + pp[['mean']] <- normalise(pp[['mean']]) + + # For each data type + for(ty in unique(obs$type)){ + sub <- subset(obs, type == ty) + # First extract for each observation the predicted covariate + ss <- guess_sf(sub) + suppressWarnings( + ss <- ss |> sf::st_set_crs(value = sf::st_crs(model$background)) + ) + if(is.factor(ss$observed)) ss$observed <- as.numeric(as.character(ss$observed)) + + # Extract for all presence and absence points + pres <- get_rastervalue(ss |> dplyr::filter(observed >= 1), vars, ngb_fill = TRUE)[,v] + pres.dens <- density(pres, + from = vars_lims["min",v], + to = vars_lims["max",v], + n = 100, + na.rm = TRUE)$y + # Save in object + pp$presence.density <- pres.dens/max(pres.dens) + + # Absence + if(any(ss$observed==0)){ + abs <- get_rastervalue(ss |> dplyr::filter(observed == 0), vars, ngb_fill = TRUE)[,v] + abs.dens <- density(abs, + from = vars_lims["min",v], + to = vars_lims["max",v], + n = 100, + na.rm = TRUE)$y + pp$absence.density <- abs.dens/max(abs.dens) + } else { + # FIXME: Maybe do a randomized background extraction? + pp$absence.density <- NA + } + if(!utils::hasName(pp, "variable")) pp$variable <- v + # (Re)set type + pp$type <- ty + # Attach + out <- rbind(out, pp) + } + } + assertthat::assert_that(nrow(out)>0, + any(out$presence.density>0), + all( unique(out$variable) %in% x.var ) + ) + + # --- # + # Format the data for the plot into long format + # Could be done easier with tidyr, but avoiding another dependency here + plot.df <- data.frame() + for(v in out$variable){ + o <- data.frame(variable = v, + layer = c(out$partial_effect[which(out$variable==v)], + out$partial_effect[which(out$variable==v)], + out$partial_effect[which(out$variable==v)]), + value = c(out$mean[which(out$variable==v)], + out$presence.density[which(out$variable==v)], + out$absence.density[which(out$variable==v)]), + source = c(rep("Suitability", 100), + rep("Presence", 100), + rep("Background/Absence", 100))) + # Also add the actual variable density from the layers + o$rug <- terra::spatSample(vars[[v]], nrow(o), na.rm= TRUE)[,1] + plot.df <- rbind(plot.df, o) + } + + if(df){ return(plot.df)} + + # Otherwise create the plot + if(length(x.var) == 1){ + + density.plot <- ggplot2::ggplot(data = plot.df, + ggplot2::aes(x = layer, y = value)) + + ggplot2::theme_bw() + + # Add the lines + ggplot2::geom_line(ggplot2::aes(colour = source, linetype = source)) + + ggplot2::scale_color_manual(values = c("green4", "red", "blue")) + + ggplot2::scale_linetype_manual(values = c( "dashed", "twodash", "solid")) + + # Add a rug + ggplot2::geom_rug(data = dplyr::sample_n(plot.df,100), ggplot2::aes(x = rug),sides = "b", + alpha = .7,length = ggplot2::unit(0.05, "npc")) + + ggplot2::labs(x = x.var, y = "Value") + + # ggplot2::facet_wrap(~variable,scales = "free_x") + + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5)) + + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5)) + + ggplot2::theme(legend.title = ggplot2::element_blank()) + + } else { + stop("NOT yet implemented! Need to adapt partial calculation to allow multiple values.") + # Build new data.frame container + new <- data.frame( + var1 = seq(vars_lims[1,1],vars_lims[2,1], length.out = 100), # plot.df$rug[plot.df$variable==x.var[1]] + var2 = seq(vars_lims[1,2],vars_lims[2,2], length.out = 100) + ) + + # Bivariate plot instead + density.plot <- ggplot2::ggplot(data = new, ggplot2::aes(x = var1, y = var2)) + + ggplot2::theme_bw() + + # Add the raster + ggplot2::geom_raster(ggplot2::aes(fill = pred)) + + ggplot2::scale_fill_viridis_c(option = "B", guide = guide_colourbar(title = "Suitability")) + + ggplot2::labs(x = x.var[1], y = x.var[2], title = "Predicted suitability in environment space") + + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5)) + + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5)) + + ggplot2::theme(legend.title = ggplot2::element_blank()) + } + return(density.plot) + } +) + diff --git a/R/plot.R b/R/plot.R index 27b489e5..391d6769 100644 --- a/R/plot.R +++ b/R/plot.R @@ -70,7 +70,7 @@ plot.BiodiversityScenario <- function(x,...) x$plot(...) #' but also pixel-based estimates of uncertainty from the posterior such as the standard deviation (SD) #' or the coefficient of variation of a given prediction. #' -#' This function makes use of the [`biscale`] R-package to create bivariate plots of the fitted distribution object, +#' This function makes use of the \code{"biscale"} R-package to create bivariate plots of the fitted distribution object, #' allowing to visualize two variables at once. It is mostly thought of as a convenience function to #' create such bivariate plots for quick visualization. #' @@ -93,6 +93,7 @@ plot.BiodiversityScenario <- function(x,...) x$plot(...) #' Although a work around without the package could be developed, it was not deemed necessary at this point. #' See also this [gist](https://gist.github.com/scbrown86/2779137a9378df7b60afd23e0c45c188). #' @return Saved bivariate plot in \code{'fname'} if specified, otherwise plot. +#' @aliases bivplot #' @keywords misc #' @export #' @name bivplot @@ -103,7 +104,7 @@ methods::setGeneric( #' @name bivplot #' @rdname bivplot -#' @usage \S4method{bivplot}{ANY}(mod) +#' @usage \S4method{bivplot}{ANY,character,character,logical,ANY,ANY,character}(mod,xvar,yvar,plot,fname,title,col,...) methods::setMethod( "bivplot", methods::signature(mod = "ANY"), diff --git a/R/prior_bart.R b/R/prior_bart.R index ffd82cbe..acba7ecc 100644 --- a/R/prior_bart.R +++ b/R/prior_bart.R @@ -41,7 +41,7 @@ methods::setGeneric( #' @name BARTPrior #' @rdname BARTPrior -#' @usage \S4method{BARTPrior}{character, numeric}(variable, hyper) +#' @usage \S4method{BARTPrior}{character,numeric}(variable,hyper,...) methods::setMethod( "BARTPrior", methods::signature(variable = "character"), @@ -77,6 +77,7 @@ methods::setMethod( #' @rdname BARTPriors #' @exportMethod BARTPriors #' @inheritParams BARTPrior +#' @aliases BARTPriors #' @family prior #' @keywords priors #' @export @@ -87,7 +88,7 @@ methods::setGeneric( #' @name BARTPriors #' @rdname BARTPriors -#' @usage \S4method{BARTPriors}{character, numeric}(variable, hyper) +#' @usage \S4method{BARTPriors}{character,numeric}(variable,hyper,...) methods::setMethod( "BARTPriors", methods::signature(variable = "character"), diff --git a/R/prior_breg.R b/R/prior_breg.R index 3f3e51bd..93fd22d6 100644 --- a/R/prior_breg.R +++ b/R/prior_breg.R @@ -57,7 +57,7 @@ methods::setGeneric( #' @name BREGPrior #' @rdname BREGPrior -#' @usage \S4method{BREGPrior}{character, numeric, numeric}(variable, hyper, ip) +#' @usage \S4method{BREGPrior}{character,ANY,ANY}(variable,hyper,ip) methods::setMethod( "BREGPrior", methods::signature(variable = "character"), @@ -98,6 +98,7 @@ methods::setMethod( #' @rdname BREGPriors #' @exportMethod BREGPriors #' @inheritParams BREGPrior +#' @aliases BREGPriors #' @family prior #' @keywords priors #' @export @@ -108,7 +109,7 @@ methods::setGeneric( #' @name BREGPriors #' @rdname BREGPriors -#' @usage \S4method{BREGPriors}{character, numeric, numeric}(variable, hyper, ip) +#' @usage \S4method{BREGPriors}{character,ANY,ANY}(variable,hyper,ip) methods::setMethod( "BREGPriors", methods::signature(variable = "character"), diff --git a/R/prior_gdb.R b/R/prior_gdb.R index 72b7bda4..805684c6 100644 --- a/R/prior_gdb.R +++ b/R/prior_gdb.R @@ -38,7 +38,7 @@ methods::setGeneric( #' @name GDBPrior #' @rdname GDBPrior -#' @usage \S4method{GDBPrior}{character, character}(variable, hyper) +#' @usage \S4method{GDBPrior}{character,character}(variable,hyper,...) methods::setMethod( "GDBPrior", methods::signature(variable = "character"), @@ -76,6 +76,7 @@ methods::setMethod( #' @rdname GDBPriors #' @exportMethod GDBPriors #' @inheritParams GDBPrior +#' @aliases GDBPriors #' @keywords priors #' @family prior #' @export @@ -86,7 +87,7 @@ methods::setGeneric( #' @name GDBPriors #' @rdname GDBPriors -#' @usage \S4method{GDBPriors}{character, character}(variable, hyper) +#' @usage \S4method{GDBPriors}{character,character}(variable,hyper,...) methods::setMethod( "GDBPriors", methods::signature(variable = "character"), diff --git a/R/prior_glmnet.R b/R/prior_glmnet.R index ce8dd59c..cdd39cb3 100644 --- a/R/prior_glmnet.R +++ b/R/prior_glmnet.R @@ -53,7 +53,7 @@ methods::setGeneric( #' @name GLMNETPrior #' @rdname GLMNETPrior -#' @usage \S4method{GLMNETPrior}{character, numeric, numeric}(variable, hyper, lims) +#' @usage \S4method{GLMNETPrior}{character,numeric,ANY}(variable,hyper,lims,...) methods::setMethod( "GLMNETPrior", methods::signature(variable = "character"), @@ -99,6 +99,7 @@ methods::setMethod( #' @rdname GLMNETPriors #' @exportMethod GLMNETPriors #' @inheritParams GLMNETPrior +#' @aliases GLMNETPriors #' @family prior #' @keywords priors #' @export @@ -109,7 +110,7 @@ methods::setGeneric( #' @name GLMNETPriors #' @rdname GLMNETPriors -#' @usage \S4method{GLMNETPriors}{character, numeric, numeric}(variable, hyper, lims) +#' @usage \S4method{GLMNETPriors}{character,numeric,ANY}(variable,hyper,lims,...) methods::setMethod( "GLMNETPriors", methods::signature(variable = "character"), diff --git a/R/prior_inla.R b/R/prior_inla.R index 634271ed..d9edf178 100644 --- a/R/prior_inla.R +++ b/R/prior_inla.R @@ -57,7 +57,7 @@ methods::setGeneric( #' @name INLAPrior #' @rdname INLAPrior -#' @usage \S4method{INLAPrior}{character, character}(variable, type) +#' @usage \S4method{INLAPrior}{character,character,ANY}(variable,type,hyper,...) methods::setMethod( "INLAPrior", methods::signature(variable = "character", type = "character"), @@ -114,6 +114,7 @@ methods::setMethod( #' @param ... Variables passed on to prior object. #' @rdname INLAPriors #' @exportMethod INLAPriors +#' @aliases INLAPriors #' @keywords priors #' @family prior #' @export @@ -124,7 +125,7 @@ methods::setGeneric( #' @name INLAPriors #' @rdname INLAPriors -#' @usage \S4method{INLAPriors}{vector, character}(variables, type) +#' @usage \S4method{INLAPriors}{vector,character,ANY}(variables,type,hyper,...) methods::setMethod( "INLAPriors", methods::signature(variables = "vector", type = "character"), diff --git a/R/prior_stan.R b/R/prior_stan.R index 0c2513fb..92e8b811 100644 --- a/R/prior_stan.R +++ b/R/prior_stan.R @@ -17,7 +17,11 @@ NULL #' * Lemoine, N. P. (2019). Moving beyond noninformative priors: why and how to choose weakly informative priors in Bayesian analyses. Oikos, 128(7), 912-928. #' * Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., ... & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of statistical software, 76(1), 1-32. #' @seealso [`Prior-class`]. -#' s +#' @examples +#' \dontrun{ +#' pp <- STANPrior("forest", "normal", c(0,1)) +#' } +#' #' @family prior #' @keywords priors #' @aliases STANPrior @@ -35,7 +39,7 @@ methods::setGeneric( #' @name STANPrior #' @rdname STANPrior -#' @usage \S4method{STANPrior}{character, character}(variable, type) +#' @usage \S4method{STANPrior}{character,character,ANY}(variable,type,hyper,...) methods::setMethod( "STANPrior", methods::signature(variable = "character", type = "character"), @@ -82,6 +86,7 @@ methods::setMethod( #' @param ... Variables passed on to prior object #' @rdname STANPriors #' @family prior +#' @aliases STANPriors #' @keywords priors #' @exportMethod STANPriors #' @export @@ -92,7 +97,7 @@ methods::setGeneric( #' @name STANPriors #' @rdname STANPriors -#' @usage \S4method{STANPriors}{vector, character}(variables, type) +#' @usage \S4method{STANPriors}{vector,character,ANY}(variables,type,hyper,...) methods::setMethod( "STANPriors", methods::signature(variables = "vector", type = "character"), diff --git a/R/prior_xgb.R b/R/prior_xgb.R index d98cf643..c4cbbb0e 100644 --- a/R/prior_xgb.R +++ b/R/prior_xgb.R @@ -17,6 +17,11 @@ NULL #' @references #' * Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., & Cho, H. (2015). Xgboost: extreme gradient boosting. R package version 0.4-2, 1(4), 1-4. #' @seealso [`Prior-class`] and [`GDBPrior`]. +#' @examples +#' \dontrun{ +#' pp <- XGBPrior("forest", "increasing") +#' } +#' #' @family prior #' @keywords priors #' @aliases XGBPrior @@ -34,7 +39,7 @@ methods::setGeneric( #' @name XGBPrior #' @rdname XGBPrior -#' @usage \S4method{XGBPrior}{character, character}(variable, hyper) +#' @usage \S4method{XGBPrior}{character,character}(variable,hyper,...) methods::setMethod( "XGBPrior", methods::signature(variable = "character", hyper = "character"), @@ -71,6 +76,7 @@ methods::setMethod( #' @rdname XGBPriors #' @exportMethod XGBPriors #' @inheritParams XGBPrior +#' @aliases XGBPriors #' @family prior #' @keywords priors #' @export @@ -81,7 +87,7 @@ methods::setGeneric( #' @name XGBPriors #' @rdname XGBPriors -#' @usage \S4method{XGBPriors}{character, character}(variable, hyper) +#' @usage \S4method{XGBPriors}{character,character}(variable,hyper,...) methods::setMethod( "XGBPriors", methods::signature(variable = "character"), diff --git a/R/priors.R b/R/priors.R index 5ffa7643..e48be4fa 100644 --- a/R/priors.R +++ b/R/priors.R @@ -37,7 +37,7 @@ methods::setGeneric( #' @name priors #' @rdname priors -#' @usage \S4method{priors}{Prior}(x) +#' @usage \S4method{priors}{Prior}(x,...) methods::setMethod( "priors", methods::signature(x = "ANY"), @@ -142,7 +142,7 @@ methods::setGeneric( #' @name priors #' @rdname priors -#' @usage \S4method{priors}{Prior}(x) +#' @usage \S4method{priors}{Prior}(x,...) methods::setMethod( "priors", methods::signature(x = "ANY"), diff --git a/R/project.R b/R/project.R index 5a142e26..91900313 100644 --- a/R/project.R +++ b/R/project.R @@ -13,16 +13,16 @@ NULL #' #' @details #' In the background the function \code{x$project()} for the respective model object is called, where -#' \code{x} is fitted model object. For specifics on the constrains, see the relevant [`constrain`] functions, +#' \code{x} is fitted model object. For specifics on the constrains, see the relevant \code{constrain} functions, #' respectively: -#' * [`add_constrain()`] for generic wrapper to add any of the available constrains. -#' * [`add_constrain_dispersal()`] for specifying dispersal constrain on the temporal projections at each step. -#' * [`add_constrain_MigClim()`] Using the \pkg{MigClim} R-package to simulate dispersal in projections. -#' * [`add_constrain_connectivity()`] Apply a connectivity constrain at the projection, for instance by adding +#' * [`add_constraint()`] for generic wrapper to add any of the available constrains. +#' * [`add_constraint_dispersal()`] for specifying dispersal constrain on the temporal projections at each step. +#' * [`add_constraint_MigClim()`] Using the \pkg{MigClim} R-package to simulate dispersal in projections. +#' * [`add_constraint_connectivity()`] Apply a connectivity constrain at the projection, for instance by adding #' a barrier that prevents migration. -#' * [`add_constrain_adaptability()`] Apply an adaptability constrain to the projection, for instance +#' * [`add_constraint_adaptability()`] Apply an adaptability constrain to the projection, for instance #' constraining the speed a species is able to adapt to new conditions. -#' * [`add_constrain_boundary()`] To artificially limit the distribution change. Similar as specifying projection limits, but +#' * [`add_constraint_boundary()`] To artificially limit the distribution change. Similar as specifying projection limits, but #' can be used to specifically constrain a projection within a certain area (e.g. a species range or an island). #' #' Many constrains also requires thresholds to be calculated. Adding [`threshold()`] to a @@ -38,7 +38,7 @@ NULL #' #' @seealso [`scenario()`] #' @param x A [`BiodiversityScenario`] object with set predictors. -#' Note that some constrains such as [`MigClim`] can still simulate future change without projections. +#' Note that some constrains such as \code{MigClim} can still simulate future change without projections. #' @param date_interpolation A [`character`] on whether dates should be interpolated. Options #' include \code{"none"} (Default), \code{"annual"}, \code{"monthly"}, \code{"daily"}. #' @param stabilize A [`logical`] value indicating whether the suitability projection should be stabilized (Default: \code{FALSE}). @@ -62,7 +62,7 @@ NULL #' } #' #' @keywords scenarios -#' +#' @import terra #' @name project #' @exportMethod project #' @aliases project, project-method @@ -77,7 +77,7 @@ project.BiodiversityScenario <- function(x,...) project(x,...) #' @name project #' @rdname project -#' @usage \S4method{project}{BiodiversityScenario, character, logical, character, character}(mod, date_interpolation, stabilize, stabilize_method, layer) +#' @usage \S4method{project}{BiodiversityScenario,character,logical,character,character}(x,date_interpolation,stabilize,stabilize_method,layer) methods::setMethod( "project", methods::signature(x = "BiodiversityScenario"), @@ -131,12 +131,12 @@ methods::setMethod( suppressMessages( suppressWarnings( zones <- st_intersection(sf::st_as_sf(tr, coords = c('x','y'), crs = sf::st_crs(fit$model$background)), - mod$get_limits() + mod$get_limits()$layer ) ) ) # Limit zones - zones <- subset(mod$get_limits(), limit %in% unique(zones$limit) ) + zones <- subset(mod$get_limits()$layer, limit %in% unique(zones$limit) ) # Now clip all provided new predictors and background to this new_preds$crop_data(zones) } diff --git a/R/pseudoabsence.R b/R/pseudoabsence.R index 522a978c..b42ee116 100644 --- a/R/pseudoabsence.R +++ b/R/pseudoabsence.R @@ -87,7 +87,7 @@ methods::setGeneric("pseudoabs_settings", #' @name pseudoabs_settings #' @rdname pseudoabs_settings -#' @usage \S4method{pseudoabs_settings}{ANY, numeric, numeric, character, numeric, logical, logical ANY}(background, nrpoints, min_ratio, method, buffer_distance, inside, layer, bias) +#' @usage \S4method{pseudoabs_settings}{ANY,numeric,numeric,character,numeric,logical,logical,ANY}(background,nrpoints,min_ratio,method,buffer_distance,inside,layer,bias,...) methods::setMethod( "pseudoabs_settings", methods::signature(background = "ANY"), @@ -161,6 +161,7 @@ methods::setMethod( #' Wotherspoon, S., Krkosek, M., Stuart-Smith, J.F. and Pecl, G.T., 2014. Statistical solutions #' for error and bias in global citizen science datasets. Biological Conservation, 173, pp.144-154. #' @keywords train +#' @aliases add_pseudoabsence #' @returns A [`data.frame`] containing the newly created pseudo absence points. #' @export add_pseudoabsence <- function(df, field_occurrence = "observed", template = NULL, settings = getOption("ibis.pseudoabsence")){ @@ -214,6 +215,7 @@ add_pseudoabsence <- function(df, field_occurrence = "observed", template = NULL method <- settings$get('method') buffer_distance <- settings$get('buffer_distance') + inside <- settings$get('inside') # If the nr of points is 0, set it equal to the number of min_ratio or presented presence points nrpoints <- max(nrpoints, round( nrow(df) * min_ratio )) @@ -256,12 +258,18 @@ add_pseudoabsence <- function(df, field_occurrence = "observed", template = NULL assertthat::assert_that(is.numeric(buffer_distance),msg = "Buffer distance parameter not numeric!") # Get units of projection and print for un <- sf:::crs_parameters(sf::st_crs(df))$ud_unit - if(getOption('ibis.setupmessages')) myLog('[Export]','yellow', paste0('Calculating pseudo-absence outside a ', buffer_distance ,units::deparse_unit(un),' buffer')) + if(is.null(un)) un <- " map units" else un <- units::deparse_unit(un) + if(getOption('ibis.setupmessages')) myLog('[Export]','yellow', paste0('Calculating pseudo-absence outside a ', buffer_distance , un,' buffer')) # Calculate buffer - buf <- sf::st_buffer(x = df, dist = buffer_distance) + suppressMessages( + suppressWarnings( buf <- sf::st_buffer(x = df, dist = buffer_distance) ) + ) buf <- terra::rasterize(buf, emptyraster(template), field = 1) - bg2 <- terra::mask(template, buf, inverse = FALSE, updatevalue = 1) - assertthat::assert_that(terra::global(bg2, "max", na.rm = TRUE) >0,msg = "Considered buffer distance too big!") + if(!inside){ + bg2 <- terra::mask(template, buf, inverse = TRUE) + } else {bg2 <- buf} + assertthat::assert_that(terra::global(bg2, "max", na.rm = TRUE)[,1]> 0, + msg = "Considered buffer distance too big!") # Now sample from all cells not occupied if(!is.null(bias)){ # Get probability values for cells where no sampling has been conducted diff --git a/R/scenario.R b/R/scenario.R index 062d7aab..5c5ea0ff 100644 --- a/R/scenario.R +++ b/R/scenario.R @@ -26,7 +26,7 @@ methods::setGeneric("scenario", function(fit, limits = NULL, copy_model = FALSE) standardGeneric("scenario")) #' @name scenario -#' @usage \S4method{scenario}{ANY, ANY, logical}(fit, limits, copy_model) +#' @usage \S4method{scenario}{ANY,ANY,logical}(fit,limits,copy_model) #' @rdname scenario methods::setMethod( "scenario", diff --git a/R/similarity.R b/R/similarity.R index 72bdbad9..ce68d2db 100644 --- a/R/similarity.R +++ b/R/similarity.R @@ -18,7 +18,7 @@ NULL #' @param full should similarity values be returned for all variables (Default: \code{FALSE})? #' @param plot Should the result be plotted? Otherwise return the output list (Default: \code{TRUE}). #' @param ... other options (Non specified). -#' @return +#' @returns #' This function returns a list containing: #' * `similarity`: A `SpatRaster` object with multiple layers giving the environmental #' similarities for each variable in `x` (only included when \code{"full=TRUE"}); @@ -41,7 +41,8 @@ NULL #' * Mesgaran, M.B., Cousens, R.D. and Webber, B.L. (2014) "Here be dragons: a tool #' for quantifying novelty due to covariate range and correlation change when projecting #' species distribution models" https://doi.org/10.1111/ddi.12209 _Diversity and Distributions_, 20: 1147-1159. -#' @seealso [`dismo`] R-package. +#' @seealso dismo R-package. +#' @aliases similarity #' @name similarity #' @export #' @examples @@ -65,7 +66,7 @@ methods::setGeneric( #' Similarity of used predictors from a trained distribution model #' @name similarity #' @rdname similarity -#' @usage \S4method{similarity}{BiodiversityDistribution, character, character, character, logical, logical}(obj, ref_type, method, predictor_names, full, plot) +#' @usage \S4method{similarity}{BiodiversityDistribution,character,character,character,logical,logical}(obj,ref_type,method,predictor_names,full,plot,...) methods::setMethod( "similarity", methods::signature(obj = "BiodiversityDistribution"), @@ -165,7 +166,7 @@ methods::setMethod( #' Similarity of used predictors by providing a SpatRaster directly #' @name similarity #' @rdname similarity -#' @usage \S4method{similarity}{SpatRaster}(obj) +#' @usage \S4method{similarity}{SpatRaster,sf,character,logical,logical}(obj,ref,method,full,plot,...) methods::setMethod( "similarity", methods::signature(obj = "SpatRaster"), @@ -209,7 +210,7 @@ methods::setMethod( # Relabel most important out$mod <- terra::as.factor(out$mod) levels(out$mod) <- data.frame(ID = levels(out$mod)[[1]][,1], - variable = names(covs)) + variable = names(obj)) } else { stop('Not yet implemented!') diff --git a/R/threshold.R b/R/threshold.R index 6adc0202..7dbcca76 100644 --- a/R/threshold.R +++ b/R/threshold.R @@ -1,387 +1,381 @@ -#' @include utils.R -NULL - -#' Threshold a continuous prediction to a categorical layer -#' -#' @description -#' It is common in many applications of species distribution modelling that estimated -#' continuous suitability surfaces are converted into discrete representations of where -#' suitable habitat might or might not exist. This so called *threshold'ing* -#' can be done in various ways which are further described in the details. -#' -#' In case a [`SpatRaster`] is provided as input in this function -#' for \code{obj}, it is furthermore necessary to provide a [`sf`] object for validation as -#' there is no [`DistributionModel`] to read this information from. -#' **Note:** This of course also allows to estimate the threshold based on withheld data, for instance -#' those created from an a-priori cross-validation procedure. -#' -#' For [`BiodiversityScenario`] objects, adding this function to the processing pipeline -#' stores a threshold attribute in the created [scenario] object. -#' -#' @param obj A trained [`DistributionModel`] or alternatively a [`SpatRaster`] object. -#' @param method A specifc method for thresholding. See details for available options. -#' @param value A [`numeric`] value for thresholding if method is fixed (Default: \code{NULL}). -#' @param point A [`sf`] object containing observational data used for model training. -#' @param format [`character`] indication of whether \code{"binary"}, \code{"normalize"} or \code{"percentile"} -#' formatted thresholds are to be created (Default: \code{"binary"}). Also see Muscatello et al. (2021). -#' @param ... other parameters not yet set. -#' @param return_threshold Should threshold value be returned instead (Default: \code{FALSE}) -#' @details -#' The following options are currently implemented: -#' * \code{'fixed'} = applies a single pre-determined threshold. Requires \code{value} to be set. -#' * \code{'mtp'} = minimum training presence is used to find and set the lowest predicted suitability for any occurrence point. -#' * \code{'percentile'} = For a percentile threshold. A \code{value} as parameter has to be set here. -#' * \code{'min.cv'} = Threshold the raster so to minimize the coefficient of variation (cv) of the posterior. Uses the lowest tercile of the cv in space. Only feasible with Bayesian engines. -#' * \code{'TSS'} = Determines the optimal TSS (True Skill Statistic). Requires the [`modEvA`] package to be installed. -#' * \code{'kappa'} = Determines the optimal kappa value (Kappa). Requires the [`modEvA`] package to be installed. -#' * \code{'F1score'} = Determines the optimal F1score (also known as Sorensen similarity). Requires the [`modEvA`] package to be installed. -#' * \code{'F1score'} = Determines the optimal sensitivity of presence records. Requires the [`modEvA`] package to be installed. -#' * \code{'Sensitivity'} = Determines the optimal sensitivity of presence records. Requires the [`modEvA`] package to be installed. -#' * \code{'Specificity'} = Determines the optimal sensitivity of presence records. Requires the [`modEvA`] package to be installed. -#' @name threshold -#' @references -#' * Lawson, C.R., Hodgson, J.A., Wilson, R.J., Richards, S.A., 2014. Prevalence, thresholds and the performance of presence-absence models. Methods Ecol. Evol. 5, 54–64. https://doi.org/10.1111/2041-210X.12123 -#' * Liu, C., White, M., Newell, G., 2013. Selecting thresholds for the prediction of species occurrence with presence-only data. J. Biogeogr. 40, 778–789. https://doi.org/10.1111/jbi.12058 -#' * Muscatello, A., Elith, J., Kujala, H., 2021. How decisions about fitting species distribution models affect conservation outcomes. Conserv. Biol. 35, 1309–1320. https://doi.org/10.1111/cobi.13669 -#' @seealso [`modEvA`] -#' @returns A [SpatRaster] if a [SpatRaster] object as input. -#' Otherwise the threshold is added to the respective [`DistributionModel`] or [`BiodiversityScenario`] object. -#' @examples -#' \dontrun{ -#' # Where mod is an estimated DistributionModel -#' tr <- threshold(mod) -#' tr$plot_threshold() -#' } -#' @export -NULL - -#' @name threshold -#' @rdname threshold -#' @exportMethod threshold -#' @export -methods::setGeneric( - "threshold", - signature = methods::signature("obj", "method", "value"), - function(obj, method = 'mtp', value = NULL, point = NULL, format = "binary", return_threshold = FALSE, ...) standardGeneric("threshold")) - -#' Generic threshold with supplied DistributionModel object -#' @name threshold -#' @rdname threshold -#' @usage \S4method{threshold}{ANY}(obj) -methods::setMethod( - "threshold", - methods::signature(obj = "ANY"), - function(obj, method = 'mtp', value = NULL, format = "binary", return_threshold = FALSE, ...) { - assertthat::assert_that(any( class(obj) %in% getOption('ibis.engines') ), - is.character(method), - is.null(value) || is.numeric(value), - is.character(format) - ) - # Check other and add legacy handling - dots <- list(...) - if("truncate" %in% names(dots)) format <- ifelse(dots[[truncate]],"normalize", "binary") - format <- match.arg(format, c("binary", "normalize", "percentile"), several.ok = FALSE) - - # Get prediction raster - ras <- obj$get_data('prediction') - # Get model object - model <- obj$model - - # Check that the object actually contains a prediction - assertthat::assert_that( - is.Raster(ras), - !is.Waiver(ras), - msg = 'No fitted prediction in object!' - ) - # Matching for correct method - method <- match.arg(method, c('fixed','mtp','percentile','min.cv', - 'TSS','kappa','F1score','Sensitivity','Specificity'), several.ok = FALSE) - - # If method is min.cv, check that posterior is accessible - if(method == "min.cv") assertthat::assert_that("cv" %in% names(ras), msg = "Method min.cv requires a posterior prediction and coefficient of variation!") - - # Get all point data in distribution model - point <- do.call(sf:::rbind.sf, - lapply(obj$model$biodiversity, function(y){ - o <- guess_sf(y$observations) - o$name <- y$name; o$type <- y$type - subset(o, select = c('observed', "name", "type", "geometry")) - } ) - ) |> tibble::remove_rownames() - suppressWarnings( - point <- sf::st_set_crs(point, value = sf::st_crs(obj$get_data('prediction'))) - ) - - # If TSS or kappa is chosen, check whether there is poipa data among the sources - if((!any(point$observed==0) & method %in% c('TSS','kappa','F1score','Sensitivity','Specificity')) || length(unique(point$name)) > 1){ - if(getOption('ibis.setupmessages')) myLog('[Threshold]','red','Threshold method needs absence-data. Generating some now...') - bg <- terra::rasterize(obj$model$background, emptyraster(obj$get_data('prediction'))) - abs <- add_pseudoabsence(df = point, - field_occurrence = 'observed', - template = bg, - # Assuming that settings are comparable among objects - settings = model$biodiversity[[1]]$pseudoabsence_settings - ) - - abs <- subset(abs, select = c('x','y'));abs$observed <- 0 - abs <- guess_sf(abs) - abs$name <- 'Background point'; abs$type <- "generated" - suppressWarnings( - abs <- sf::st_set_crs(abs, value = sf::st_crs(obj$get_data('prediction'))) - ) - point <- subset(point, select = c("observed", "name", "type","geometry")) - abs <- subset(abs, select = c("observed", "name", "type","geometry")) - point <- rbind(point, abs);rm(abs) - } - - # Convert to sf - if(!inherits(point,"sf")){ point <- guess_sf(point) } - - # Now self call threshold - out <- threshold(ras, method = method, value = value, point = point, format = format,...) - assertthat::assert_that(is.Raster(out)) - # Add result to new obj and clean up old thresholds before - tr_lyr <- grep('threshold', obj$show_rasters(),value = TRUE) - new_obj <- obj - if(length(tr_lyr)>0) for(v in tr_lyr) new_obj$rm_threshold() - new_obj <- new_obj$set_data(paste0("threshold_", method), out) - # Return altered object - return(new_obj) - } -) - -#' @noRd -#' @keywords internal -.stackthreshold <- function(obj, method = 'fixed', value = NULL, - point = NULL, format = "binary", return_threshold = FALSE, ...) { - assertthat::assert_that(is.Raster(obj), - is.character(method), - inherits(point,'sf'), - is.null(value) || is.numeric(value), - is.character(format) - ) - # Match format - format <- match.arg(format, c("binary", "normalize", "percentile"), several.ok = FALSE) - - # Apply threshold on each entry - if(return_threshold){ - # Return the threshold directly - out <- vector() - for(i in names(obj)) out <- c(out, threshold(obj[[i]], method = method, - value = value, point = point, format = format, return_threshold = return_threshold, ...) ) - names(out) <- names(obj) - } else { - # Return the raster instead - out <- terra::rast() - if(method == "min.cv"){ - # If the coefficient of variation is to be minmized, mask first all values with the threshold only - assertthat::assert_that(terra::nlyr(obj)>2, "sd" %in% names(obj)) - # Get global coefficient of variation - errortr <- quantile(obj[["cv"]], .3) - assertthat::assert_that(is.numeric(errortr)) - # Create mask - mm <- obj[["cv"]] - mm[mm > errortr] <- NA - obj <- terra::mask(obj, mm); rm(mm) - # Set the value to errortr - value <- errortr - } - # Now loop - for(i in names(obj)) out <- c(out, threshold(obj[[i]], method = method, - value = value, point = point, format = format, - return_threshold = return_threshold, ...) ) - } - return(out) -} - -#' @name threshold -#' @rdname threshold -#' @inheritParams threshold -#' @usage \S4method{threshold}{SpatRasterDataset}(obj) -methods::setMethod("threshold", methods::signature(obj = "SpatRasterDataset"), .stackthreshold) - -#' @name threshold -#' @rdname threshold -#' @usage \S4method{threshold}{SpatRaster}(obj) -methods::setMethod( - "threshold", - methods::signature(obj = "SpatRaster"), - function(obj, method = 'fixed', value = NULL, point = NULL, format = "binary", return_threshold = FALSE, plot = FALSE) { - assertthat::assert_that(is.Raster(obj), - inherits(obj,'SpatRaster'), - is.character(method), - is.null(value) || is.numeric(value), - is.character(format) - ) - # Match format - format <- match.arg(format, c("binary", "normalize", "percentile"), several.ok = FALSE) - - # If poi is set, try to convert sf - if(!is.null(point)) try({point <- sf::st_as_sf(point)}, silent = TRUE) - assertthat::assert_that(is.null(point) || inherits(point,'sf')) - - # Match to correct spelling mistakes - method <- match.arg(method, c('fixed','mtp','percentile','min.cv', - 'TSS','kappa','F1score','Sensitivity','Specificity'), several.ok = FALSE) - - # Check that raster has at least a mean prediction in name - if(!is.null(point)) { - # If observed is a factor, convert to numeric - if(is.factor(point$observed)){ - point$observed <- as.numeric(as.character( point$observed )) - } - assertthat::assert_that(unique(sf::st_geometry_type(point)) %in% c('POINT','MULTIPOINT')) - assertthat::assert_that(utils::hasName(point, 'observed')) - poi_pres <- subset(point, observed > 0) # Remove any eventual absence data for a poi_pres evaluation - } else poi_pres <- NULL - - # Loop through each raster - if(return_threshold) out <- terra::rast() else out <- c() - for(val in names(obj)){ - # Get the raster layer - raster_thresh <- subset(obj, val) - - # Specify by type: - if(method == "fixed"){ - # Fixed threshold. Confirm to be set - assertthat::assert_that(is.numeric(value), msg = 'Fixed value is missing!') - tr <- value - } else if(method == "mtp"){ - assertthat::assert_that(!is.null(poi_pres),msg = "Threshold method requires supplied point data!") - # minimum training presence - pointVals <- get_rastervalue(coords = poi_pres, env = raster_thresh)[[val]] - # Minimum threshold - tr <- min( stats::na.omit(pointVals) ) - - } else if(method == "percentile"){ - assertthat::assert_that(!is.null(poi_pres), msg = "Threshold method requires supplied point data!") - pointVals <- get_rastervalue(coords = poi_pres, env = raster_thresh)[[val]] - pointVals <- subset(pointVals, stats::complete.cases(pointVals)) # Remove any NA or NAN data here - # percentile training threshold - if(is.null(value)) value <- 0.1 # If value is not set, use 10% - if(length(pointVals) < 10) { - perc <- floor(length(pointVals) * (1 - value)) - } else { - perc <- ceiling(length(pointVals) * (1 - value)) - } - tr <- rev(sort(pointVals))[perc] # Percentile threshold - - } else if(method == "min.cv"){ - assertthat::assert_that(!is.null(poi_pres),msg = "Threshold method requires supplied point data!") - assertthat::assert_that(!is.null(value),msg = "Global minimum cv needs to be supplied as value!") - pointVals <- get_rastervalue(coords = poi_pres, env = raster_thresh)[[val]] # Extract point only estimates - - # Get standard deviation and calculate percentile - tr <- min( stats::na.omit(pointVals) ) - names(tr) <- "tr" - names(value) <- "min.cv" - # Combine as a vector - tr <- c(tr, value) - - } else { - # Optimized threshold statistics using the modEvA package - # FIXME: Could think of porting these functions but too much effort for now. Rather have users install the package here - check_package("modEvA") - # Assure that point data is correctly specified - assertthat::assert_that(inherits(point, 'sf'), utils::hasName(point, 'observed')) - point$observed <- ifelse(point$observed>1, 1, point$observed) # Ensure that observed is <=1 - assertthat::assert_that(all( unique(point$observed) %in% c(0,1) )) - - # Re-extract point vals but with the full dataset - pointVals <- get_rastervalue(coords = point, env = raster_thresh)[[val]] - assertthat::assert_that(length(pointVals)>2) - # Calculate the optimal thresholds - suppressWarnings( - opt <- modEvA::optiThresh(obs = point$observed, pred = pointVals, - measures = c("TSS","kappa","F1score","Misclass","Omission","Commission", - "Sensitivity","Specificity"), - optimize = "each", plot = plot) - ) - if(method %in% opt$optimals.each$measure){ - tr <- opt$optimals.each$threshold[which(opt$optimals.each$measure==method)] - } else { - # Returning a collection of them as vector - tr <- opt$optimals.each$threshold; names(tr) <- opt$optimals.each$measure - } - } - # Security check - assertthat::assert_that(is.numeric(tr) || is.vector(tr)) - - # -- Threshold -- # - if(return_threshold){ - o <- tr - names(o) <- method - out <- c(out, o) - } else { - # Finally threshold the raster - # Process depending on format - if(format == "binary"){ - # Default is to create a binary presence-absence. Otherwise truncated hinge - raster_thresh[raster_thresh < tr[1]] <- 0 - raster_thresh[raster_thresh >= tr[1]] <- 1 - raster_thresh <- terra::as.factor(raster_thresh) - } else if(format == "normalize"){ - raster_thresh[raster_thresh < tr[1]] <- NA - # If truncate, ensure that resulting values are normalized - raster_thresh <- predictor_transform(raster_thresh, option = "norm") - raster_thresh[is.na(raster_thresh)] <- 0 - raster_thresh <- terra::mask(raster_thresh, obj[val]>=0) - base::attr(raster_thresh, 'truncate') <- TRUE # Legacy truncate attribute - - } else if(format == "percentile") { - raster_thresh[raster_thresh < tr[1]] <- NA - raster_thresh <- predictor_transform(raster_thresh, option = "percentile") - raster_thresh <- terra::mask(raster_thresh, obj[val]>=0) - base::attr(raster_thresh, 'truncate') <- TRUE - } - - names(raster_thresh) <- paste0('threshold_',val,'_',method) - # Assign attributes - base::attr(raster_thresh, 'method') <- method - base::attr(raster_thresh, 'format') <- format - base::attr(raster_thresh, 'threshold') <- tr - - # Append - suppressWarnings( out <- c(out, raster_thresh) ) - } - } - - # Return output - if(is.list(out)) out <- do.call(c, out) - return( out ) - } -) - -#### For scenarios #### - -#' Thresholds in scenario estimation -#' -#' @name threshold -#' @inheritParams threshold -#' @param tr A [`numeric`] value specifiying the specific threshold for scenarios. -#' @rdname threshold -#' @usage \S4method{threshold}{BiodiversityScenario}(obj) -methods::setMethod( - "threshold", - methods::signature(obj = "BiodiversityScenario"), - function(obj, tr = new_waiver(), ...) { - - # Assert that predicted raster is present - assertthat::assert_that( is.Raster(obj$get_model()$get_data('prediction')) ) - # Unless set, check - if(is.Waiver(tr)){ - # Check that a threshold layer is available and get the methods and data from it - assertthat::assert_that( length( grep('threshold', obj$get_model()$show_rasters()) ) >0 , - msg = 'Call \' threshold \' for prediction first!') - # Get threshold layer - tr_lyr <- grep('threshold', obj$get_model()$show_rasters(),value = TRUE) - if(length(tr_lyr)>1) warning("There appear to be multiple thresholds. Using the first one.") - ras_tr <- obj$get_model()$get_data( tr_lyr[1] ) - tr <- attr(ras_tr[[1]], 'threshold') - names(tr) <- attr(ras_tr[[1]], 'method') - } else { - assertthat::assert_that(is.numeric(tr)) - } - bdproto(NULL, obj, threshold = tr) - } -) +#' @include utils.R +NULL + +#' Threshold a continuous prediction to a categorical layer +#' +#' @description +#' It is common in many applications of species distribution modelling that estimated +#' continuous suitability surfaces are converted into discrete representations of where +#' suitable habitat might or might not exist. This so called *threshold'ing* +#' can be done in various ways which are further described in the details. +#' +#' In case a [`SpatRaster`] is provided as input in this function +#' for \code{obj}, it is furthermore necessary to provide a [`sf`] object for validation as +#' there is no [`DistributionModel`] to read this information from. +#' **Note:** This of course also allows to estimate the threshold based on withheld data, for instance +#' those created from an a-priori cross-validation procedure. +#' +#' For [`BiodiversityScenario`] objects, adding this function to the processing pipeline +#' stores a threshold attribute in the created [scenario] object. +#' +#' @param obj A trained [`DistributionModel`] or alternatively a [`SpatRaster`] object. +#' @param method A specifc method for thresholding. See details for available options. +#' @param value A [`numeric`] value for thresholding if method is fixed (Default: \code{NULL}). +#' @param point A [`sf`] object containing observational data used for model training. +#' @param format [`character`] indication of whether \code{"binary"}, \code{"normalize"} or \code{"percentile"} +#' formatted thresholds are to be created (Default: \code{"binary"}). Also see Muscatello et al. (2021). +#' @param return_threshold Should threshold value be returned instead (Default: \code{FALSE}) +#' @param ... other parameters not yet set. +#' @details +#' The following options are currently implemented: +#' * \code{'fixed'} = applies a single pre-determined threshold. Requires \code{value} to be set. +#' * \code{'mtp'} = minimum training presence is used to find and set the lowest predicted suitability for any occurrence point. +#' * \code{'percentile'} = For a percentile threshold. A \code{value} as parameter has to be set here. +#' * \code{'min.cv'} = Threshold the raster so to minimize the coefficient of variation (cv) of the posterior. Uses the lowest tercile of the cv in space. Only feasible with Bayesian engines. +#' * \code{'TSS'} = Determines the optimal TSS (True Skill Statistic). Requires the \code{"modEvA"} package to be installed. +#' * \code{'kappa'} = Determines the optimal kappa value (Kappa). Requires the \code{"modEvA"} package to be installed. +#' * \code{'F1score'} = Determines the optimal F1score (also known as Sorensen similarity). Requires the \code{"modEvA"} package to be installed. +#' * \code{'F1score'} = Determines the optimal sensitivity of presence records. Requires the \code{"modEvA"} package to be installed. +#' * \code{'Sensitivity'} = Determines the optimal sensitivity of presence records. Requires the \code{"modEvA"} package to be installed. +#' * \code{'Specificity'} = Determines the optimal sensitivity of presence records. Requires the \code{"modEvA"} package to be installed. +#' @name threshold +#' @references +#' * Lawson, C.R., Hodgson, J.A., Wilson, R.J., Richards, S.A., 2014. Prevalence, thresholds and the performance of presence-absence models. Methods Ecol. Evol. 5, 54–64. https://doi.org/10.1111/2041-210X.12123 +#' * Liu, C., White, M., Newell, G., 2013. Selecting thresholds for the prediction of species occurrence with presence-only data. J. Biogeogr. 40, 778–789. https://doi.org/10.1111/jbi.12058 +#' * Muscatello, A., Elith, J., Kujala, H., 2021. How decisions about fitting species distribution models affect conservation outcomes. Conserv. Biol. 35, 1309–1320. https://doi.org/10.1111/cobi.13669 +#' @seealso \code{"modEvA"} +#' @returns A [SpatRaster] if a [SpatRaster] object as input. +#' Otherwise the threshold is added to the respective [`DistributionModel`] or [`BiodiversityScenario`] object. +#' @aliases threshold +#' @examples +#' \dontrun{ +#' # Where mod is an estimated DistributionModel +#' tr <- threshold(mod) +#' tr$plot_threshold() +#' } +#' @export +NULL + +#' @name threshold +#' @rdname threshold +#' @exportMethod threshold +#' @export +methods::setGeneric( + "threshold", + signature = methods::signature("obj", "method", "value"), + function(obj, method = 'mtp', value = NULL, point = NULL, format = "binary", return_threshold = FALSE, ...) standardGeneric("threshold")) + +#' Generic threshold with supplied DistributionModel object +#' @name threshold +#' @rdname threshold +#' @usage \S4method{threshold}{ANY,character,numeric,ANY,character,logical}(obj,method,value,point,format,return_threshold,...) +methods::setMethod( + "threshold", + methods::signature(obj = "ANY"), + function(obj, method = 'mtp', value = NULL, point = NULL, format = "binary", return_threshold = FALSE, ...) { + assertthat::assert_that(any( class(obj) %in% getOption('ibis.engines') ), + is.character(method), + is.null(value) || is.numeric(value), + is.character(format) + ) + # Check other and add legacy handling + dots <- list(...) + if("truncate" %in% names(dots)) format <- ifelse(dots[[truncate]],"normalize", "binary") + format <- match.arg(format, c("binary", "normalize", "percentile"), several.ok = FALSE) + + # Get prediction raster + ras <- obj$get_data('prediction') + # Get model object + model <- obj$model + + # Check that the object actually contains a prediction + assertthat::assert_that( + is.Raster(ras), + !is.Waiver(ras), + msg = 'No fitted prediction in object!' + ) + # Matching for correct method + method <- match.arg(method, c('fixed','mtp','percentile','min.cv', + 'TSS','kappa','F1score','Sensitivity','Specificity'), several.ok = FALSE) + + # If method is min.cv, check that posterior is accessible + if(method == "min.cv") assertthat::assert_that("cv" %in% names(ras), msg = "Method min.cv requires a posterior prediction and coefficient of variation!") + + # Get all point data in distribution model + if(is.null(point)){ + point <- collect_occurrencepoints(model = obj$model, + include_absences = FALSE, + point_column = "observed", + addName = TRUE, tosf = TRUE + ) + } else { + assertthat::assert_that(sf::st_crs(point) == sf::st_crs(obj$get_data('prediction'))) + } + + # If TSS or kappa is chosen, check whether there is poipa data among the sources + if((!any(point$observed==0) & method %in% c('TSS','kappa','F1score','Sensitivity','Specificity')) || length(unique(point$name)) > 1){ + if(getOption('ibis.setupmessages')) myLog('[Threshold]','red','Threshold method needs absence-data. Generating some now...') + bg <- terra::rasterize(obj$model$background, emptyraster(obj$get_data('prediction'))) + abs <- add_pseudoabsence(df = point, + field_occurrence = 'observed', + template = bg, + # Assuming that settings are comparable among objects + settings = model$biodiversity[[1]]$pseudoabsence_settings + ) + + abs <- subset(abs, select = c('x','y'));abs$observed <- 0 + abs <- guess_sf(abs) + abs$name <- 'Background point'; abs$type <- "generated" + suppressWarnings( + abs <- sf::st_set_crs(abs, value = sf::st_crs(obj$get_data('prediction'))) + ) + point <- subset(point, select = c("observed", "name", "type","geometry")) + abs <- subset(abs, select = c("observed", "name", "type","geometry")) + point <- rbind(point, abs);rm(abs) + } + + # Convert to sf + if(!inherits(point,"sf")){ point <- guess_sf(point) } + + # Now self call threshold + out <- threshold(ras, method = method, value = value, point = point, format = format,...) + assertthat::assert_that(is.Raster(out)) + # Add result to new obj and clean up old thresholds before + tr_lyr <- grep('threshold', obj$show_rasters(),value = TRUE) + new_obj <- obj + if(length(tr_lyr)>0) for(v in tr_lyr) new_obj$rm_threshold() + new_obj <- new_obj$set_data(paste0("threshold_", method), out) + # Return altered object + return(new_obj) + } +) + +#' @noRd +#' @keywords internal +.stackthreshold <- function(obj, method = 'fixed', value = NULL, + point = NULL, format = "binary", return_threshold = FALSE, ...) { + assertthat::assert_that(is.Raster(obj), + is.character(method), + inherits(point,'sf'), + is.null(value) || is.numeric(value), + is.character(format) + ) + # Match format + format <- match.arg(format, c("binary", "normalize", "percentile"), several.ok = FALSE) + + # Apply threshold on each entry + if(return_threshold){ + # Return the threshold directly + out <- vector() + for(i in names(obj)) out <- c(out, threshold(obj[[i]], method = method, + value = value, point = point, format = format, return_threshold = return_threshold, ...) ) + names(out) <- names(obj) + } else { + # Return the raster instead + out <- terra::rast() + if(method == "min.cv"){ + # If the coefficient of variation is to be minmized, mask first all values with the threshold only + assertthat::assert_that(terra::nlyr(obj)>2, "sd" %in% names(obj)) + # Get global coefficient of variation + errortr <- quantile(obj[["cv"]], .3) + assertthat::assert_that(is.numeric(errortr)) + # Create mask + mm <- obj[["cv"]] + mm[mm > errortr] <- NA + obj <- terra::mask(obj, mm); rm(mm) + # Set the value to errortr + value <- errortr + } + # Now loop + for(i in names(obj)) out <- c(out, threshold(obj[[i]], method = method, + value = value, point = point, format = format, + return_threshold = return_threshold, ...) ) + } + return(out) +} + +#' @name threshold +#' @rdname threshold +#' @usage \S4method{threshold}{SpatRaster,character,ANY,ANY,character,logical}(obj,method,value,point,format,return_threshold) +methods::setMethod( + "threshold", + methods::signature(obj = "SpatRaster"), + function(obj, method = 'fixed', value = NULL, point = NULL, format = "binary", return_threshold = FALSE) { + assertthat::assert_that(is.Raster(obj), + inherits(obj,'SpatRaster'), + is.character(method), + is.null(value) || is.numeric(value), + is.character(format) + ) + # Match format + format <- match.arg(format, c("binary", "normalize", "percentile"), several.ok = FALSE) + + # If poi is set, try to convert sf + if(!is.null(point)) try({point <- sf::st_as_sf(point)}, silent = TRUE) + assertthat::assert_that(is.null(point) || inherits(point,'sf')) + + # Match to correct spelling mistakes + method <- match.arg(method, c('fixed','mtp','percentile','min.cv', + 'TSS','kappa','F1score','Sensitivity','Specificity'), several.ok = FALSE) + + # Check that raster has at least a mean prediction in name + if(!is.null(point)) { + # If observed is a factor, convert to numeric + if(is.factor(point$observed)){ + point$observed <- as.numeric(as.character( point$observed )) + } + assertthat::assert_that(unique(sf::st_geometry_type(point)) %in% c('POINT','MULTIPOINT')) + assertthat::assert_that(utils::hasName(point, 'observed')) + poi_pres <- subset(point, observed > 0) # Remove any eventual absence data for a poi_pres evaluation + } else poi_pres <- NULL + + # Loop through each raster + if(return_threshold) out <- terra::rast() else out <- c() + for(val in names(obj)){ + # Get the raster layer + raster_thresh <- subset(obj, val) + + # Specify by type: + if(method == "fixed"){ + # Fixed threshold. Confirm to be set + assertthat::assert_that(is.numeric(value), msg = 'Fixed value is missing!') + tr <- value + } else if(method == "mtp"){ + assertthat::assert_that(!is.null(poi_pres),msg = "Threshold method requires supplied point data!") + # minimum training presence + pointVals <- get_rastervalue(coords = poi_pres, env = raster_thresh)[[val]] + # Minimum threshold + tr <- min( stats::na.omit(pointVals) ) + + } else if(method == "percentile"){ + assertthat::assert_that(!is.null(poi_pres), msg = "Threshold method requires supplied point data!") + pointVals <- get_rastervalue(coords = poi_pres, env = raster_thresh)[[val]] + pointVals <- subset(pointVals, stats::complete.cases(pointVals)) # Remove any NA or NAN data here + # percentile training threshold + if(is.null(value)) value <- 0.1 # If value is not set, use 10% + if(length(pointVals) < 10) { + perc <- floor(length(pointVals) * (1 - value)) + } else { + perc <- ceiling(length(pointVals) * (1 - value)) + } + tr <- rev(sort(pointVals))[perc] # Percentile threshold + + } else if(method == "min.cv"){ + assertthat::assert_that(!is.null(poi_pres),msg = "Threshold method requires supplied point data!") + assertthat::assert_that(!is.null(value),msg = "Global minimum cv needs to be supplied as value!") + pointVals <- get_rastervalue(coords = poi_pres, env = raster_thresh)[[val]] # Extract point only estimates + + # Get standard deviation and calculate percentile + tr <- min( stats::na.omit(pointVals) ) + names(tr) <- "tr" + names(value) <- "min.cv" + # Combine as a vector + tr <- c(tr, value) + + } else { + # Optimized threshold statistics using the modEvA package + # FIXME: Could think of porting these functions but too much effort for now. Rather have users install the package here + check_package("modEvA") + # Assure that point data is correctly specified + assertthat::assert_that(inherits(point, 'sf'), utils::hasName(point, 'observed')) + point$observed <- ifelse(point$observed>1, 1, point$observed) # Ensure that observed is <=1 + assertthat::assert_that(all( unique(point$observed) %in% c(0,1) )) + + # Re-extract point vals but with the full dataset + pointVals <- get_rastervalue(coords = point, env = raster_thresh)[[val]] + assertthat::assert_that(length(pointVals)>2) + # Calculate the optimal thresholds + suppressWarnings( + opt <- modEvA::optiThresh(obs = point$observed, pred = pointVals, + measures = c("TSS","kappa","F1score","Misclass","Omission","Commission", + "Sensitivity","Specificity"), + optimize = "each", plot = FALSE) + ) + if(method %in% opt$optimals.each$measure){ + tr <- opt$optimals.each$threshold[which(opt$optimals.each$measure==method)] + } else { + # Returning a collection of them as vector + tr <- opt$optimals.each$threshold; names(tr) <- opt$optimals.each$measure + } + } + # Security check + assertthat::assert_that(is.numeric(tr) || is.vector(tr)) + + # -- Threshold -- # + if(return_threshold){ + o <- tr + names(o) <- method + out <- c(out, o) + } else { + # Finally threshold the raster + # Process depending on format + if(format == "binary"){ + # Default is to create a binary presence-absence. Otherwise truncated hinge + raster_thresh[raster_thresh < tr[1]] <- 0 + raster_thresh[raster_thresh >= tr[1]] <- 1 + raster_thresh <- terra::as.factor(raster_thresh) + } else if(format == "normalize"){ + raster_thresh[raster_thresh < tr[1]] <- NA + # If truncate, ensure that resulting values are normalized + raster_thresh <- predictor_transform(raster_thresh, option = "norm") + raster_thresh[is.na(raster_thresh)] <- 0 + raster_thresh <- terra::mask(raster_thresh, obj[val]>=0) + base::attr(raster_thresh, 'truncate') <- TRUE # Legacy truncate attribute + + } else if(format == "percentile") { + raster_thresh[raster_thresh < tr[1]] <- NA + raster_thresh <- predictor_transform(raster_thresh, option = "percentile") + raster_thresh <- terra::mask(raster_thresh, obj[val]>=0) + base::attr(raster_thresh, 'truncate') <- TRUE + } + + names(raster_thresh) <- paste0('threshold_',val,'_',method) + # Assign attributes + base::attr(raster_thresh, 'method') <- method + base::attr(raster_thresh, 'format') <- format + base::attr(raster_thresh, 'threshold') <- tr + + # Append + suppressWarnings( out <- c(out, raster_thresh) ) + } + } + + # Return output + if(is.list(out)) out <- do.call(c, out) + return( out ) + } +) + +#### For scenarios #### + +#' Thresholds in scenario estimation +#' +#' @name threshold +#' @param obj A [BiodiversityScenario] object to which an existing threshold is to be added. +#' @param tr A [`numeric`] value specifying the specific threshold for scenarios (Default: Grab from object). +#' @rdname threshold +#' @usage \S4method{threshold}{BiodiversityScenario,ANY}(obj,tr) +methods::setMethod( + "threshold", + methods::signature(obj = "BiodiversityScenario"), + function(obj, tr = new_waiver()) { + + # Assert that predicted raster is present + assertthat::assert_that( is.Raster(obj$get_model()$get_data('prediction')) ) + # Unless set, check + if(is.Waiver(tr)){ + # Check that a threshold layer is available and get the methods and data from it + assertthat::assert_that( length( grep('threshold', obj$get_model()$show_rasters()) ) >0 , + msg = 'Call \' threshold \' for prediction first!') + # Get threshold layer + tr_lyr <- grep('threshold', obj$get_model()$show_rasters(),value = TRUE) + if(length(tr_lyr)>1) warning("There appear to be multiple thresholds. Using the first one.") + ras_tr <- obj$get_model()$get_data( tr_lyr[1] ) + tr <- attr(ras_tr[[1]], 'threshold') + names(tr) <- attr(ras_tr[[1]], 'method') + } else { + assertthat::assert_that(is.numeric(tr)) + } + bdproto(NULL, obj, threshold = tr) + } +) diff --git a/R/train.R b/R/train.R index 96e80861..c5d0c65b 100644 --- a/R/train.R +++ b/R/train.R @@ -50,10 +50,10 @@ NULL #' * \code{"none"} No prior variable removal is performed (Default). #' * \code{"pearson"}, \code{"spearman"} or \code{"kendall"} Makes use of pairwise comparisons to identify and #' remove highly collinear predictors (Pearson's \code{r >= 0.7}). -#' * \code{"abess"} A-priori adaptive best subset selection of covariates via the [`abess`] package (see References). +#' * \code{"abess"} A-priori adaptive best subset selection of covariates via the \code{"abess"} package (see References). #' Note that this effectively fits a separate generalized linear model to #' reduce the number of covariates. -#' * \code{"boruta"} Uses the [`Boruta`] package to identify non-informative features. +#' * \code{"boruta"} Uses the \code{"Boruta"} package to identify non-informative features. #' #' @param optim_hyperparam Parameter to tune the model by iterating over input parameters or selection #' of predictors included in each iteration. Can be set to \code{TRUE} if extra precision is @@ -77,7 +77,7 @@ NULL #' added to the predictor stack and thus are predictors for subsequent models (Default). #' * \code{"offset"} The predicted output of the first (or previously fitted) models are #' added as spatial offsets to subsequent models. Offsets are back-transformed depending -#' on the model family. This option might not be supported for every [`engine`]. +#' on the model family. This option might not be supported for every [`Engine`]. #' * \code{"interaction"} Instead of fitting several separate models, the observations from each dataset #' are combined and incorporated in the prediction as a factor interaction with the "weaker" data source being #' partialed out during prediction. Here the first dataset added determines the reference level @@ -128,17 +128,17 @@ methods::setGeneric( signature = methods::signature("x"), function(x, runname, filter_predictors = "none", optim_hyperparam = FALSE, inference_only = FALSE, only_linear = TRUE, method_integration = "predictor", - aggregate_observations = TRUE, clamp = FALSE, verbose = FALSE,...) standardGeneric("train")) + aggregate_observations = TRUE, clamp = FALSE, verbose = getOption('ibis.setupmessages'),...) standardGeneric("train")) #' @name train #' @rdname train -#' @usage \S4method{train}{BiodiversityDistribution, character, character, logical, logical, logical, character, logical, logical, logical}(x,runname,filter_predictors,optim_hyperparam,inference_only,only_linear,method_integration,aggregate_observations,clamp,verbose) +#' @usage \S4method{train}{BiodiversityDistribution,character,character,logical,logical,logical,character,logical,logical,logical}(x,runname,filter_predictors,optim_hyperparam,inference_only,only_linear,method_integration,aggregate_observations,clamp,verbose,...) methods::setMethod( "train", methods::signature(x = "BiodiversityDistribution"), function(x, runname, filter_predictors = "none", optim_hyperparam = FALSE, inference_only = FALSE, only_linear = TRUE, method_integration = "predictor", - aggregate_observations = TRUE, clamp = FALSE, verbose = FALSE,...) { + aggregate_observations = TRUE, clamp = FALSE, verbose = getOption('ibis.setupmessages'),...) { if(missing(runname)) runname <- "Unnamed run" # Make load checks @@ -593,67 +593,89 @@ methods::setMethod( } else { spec_priors <- new_waiver() } model[['priors']] <- spec_priors - # Applying prediction filter based on model input data if specified - # TODO: Potentially outsource to a function in the future + # - Applying prediction filter based on model input data if specified + # Check if MCP should be calculated if(!is.Waiver(x$limits)){ - # Get biodiversity data - coords <- do.call(rbind, lapply(model$biodiversity, function(z) z[['observations']][,c('x','y','observed')] ) ) - coords <- subset(coords, observed > 0) # Remove absences - # Get zones from the limiting area, e.g. those intersecting with input - suppressMessages( - suppressWarnings( - zones <- sf::st_intersection(sf::st_as_sf(coords, coords = c('x','y'), - crs = sf::st_crs(model$background)), - x$limits) - ) - ) - # Limit zones - zones <- subset(x$limits, limit %in% unique(zones$limit) ) - - # Only if some points actually fall in the zones - if(nrow(zones)>0){ - # Now clip all predictors and background to this - model$background <- suppressMessages( - suppressWarnings( sf::st_union( sf::st_intersection(zones, model$background), by_feature = TRUE) |> - sf::st_buffer(dist = 0) |> # 0 distance buffer trick - sf::st_cast("MULTIPOLYGON") + # Build MCP based zones ? + if(x$limits$limits_method=="mcp"){ + # Create a polygon using all available information + # Then overwrite limits + x <- x$set_limits(x = create_mcp(model$biodiversity, x$limits), + mcp_buffer = x$limits$mcp_buffer, + limits_clip = x$limits$limits_clip) + } + + # No clip if limits are to TRUE + if(x$limits$limits_clip){ + # TODO: Potentially outsource to a function in the future + + # Get biodiversity data + coords <- do.call(rbind, lapply(model$biodiversity, function(z) z[['observations']] |> + guess_sf() |> + dplyr::select('observed', 'x', 'y') ) ) + coords <- subset(coords, observed > 0) # Remove absences + # Reproject if necessary + if(sf::st_crs(coords) != sf::st_crs(model$background)){ + coords <- sf::st_transform(coords, sf::st_crs(model$background)) + } + # Get zones from the limiting area, e.g. those intersecting with input + suppressMessages( + suppressWarnings( + zones <- sf::st_intersection(sf::st_as_sf(coords, coords = c('x','y'), + crs = sf::st_crs(model$background)), + x$limits$layer) ) ) + # Limit zones + zones <- subset(x$limits$layer, limit %in% unique(zones$limit) ) + + # Only if some points actually fall in the zones + if(nrow(zones)>0){ + # Now clip all predictors and background to this + model$background <- suppressMessages( + suppressWarnings( sf::st_union( sf::st_intersection(zones, model$background), + by_feature = TRUE) |> + sf::st_buffer(dist = 0) |> # 0 distance buffer trick + sf::st_cast("MULTIPOLYGON") + ) + ) - # Extract predictors and offsets again if set - if(!is.Waiver(model$predictors_object)){ - # Using the raster operations is generally faster than point in polygon tests - pred_ov <- model$predictors_object$get_data(df = FALSE) - # Make a rasterized mask of the background - pred_ov <- terra::mask( pred_ov, model$background ) - # Convert Predictors to data.frame, including error catching for raster errors - # FIXME: This could be outsourced - o <- try({ terra::as.data.frame(pred_ov, xy = TRUE, na.rm = FALSE) },silent = TRUE) - if(inherits(o, "try-error")){ - o <- as.data.frame( cbind( terra::crds(pred_ov), - as.matrix( pred_ov )) ) - if(any(is.factor(pred_ov))){ - o[names(pred_ov)[which(is.factor(pred_ov))]] <- factor(o[names(pred_ov)[which(is.factor(pred_ov))]] ) + # Extract predictors and offsets again if set + if(!is.Waiver(model$predictors_object)){ + # Using the raster operations is generally faster than point in polygon tests + pred_ov <- model$predictors_object$get_data(df = FALSE) + # Make a rasterized mask of the background + pred_ov <- terra::mask( pred_ov, model$background ) + # Convert Predictors to data.frame, including error catching for raster errors + # FIXME: This could be outsourced + o <- try({ terra::as.data.frame(pred_ov, xy = TRUE, na.rm = FALSE) },silent = TRUE) + if(inherits(o, "try-error")){ + o <- as.data.frame( cbind( terra::crds(pred_ov), + as.matrix( pred_ov )) ) + if(any(is.factor(pred_ov))){ + o[names(pred_ov)[which(is.factor(pred_ov))]] <- factor(o[names(pred_ov)[which(is.factor(pred_ov))]] ) + } } + model[['predictors']] <- o + model[['predictors_object']]$data <- fill_rasters(o[,c(1,2)*-1], # Remove x and y coordinates for overwriting raster data + model$predictors_object$data) + rm(pred_ov, o) + } else { + model$predictors[which( is.na( + point_in_polygon(poly = model$background, points = model$predictors[,c('x','y')] )[['limit']] + )),model$predictors_names] <- NA # Fill with NA } - model[['predictors']] <- o - model[['predictors_object']]$data <- fill_rasters(o[,c(1,2)*-1], # Remove x and y coordinates for overwriting raster data - model$predictors_object$data) - rm(pred_ov, o) - } else { - model$predictors[which( is.na( - point_in_polygon(poly = model$background, points = model$predictors[,c('x','y')] )[['limit']] - )),model$predictors_names] <- NA # Fill with NA + # The same with offset if specified, Note this operation below is computationally quite costly + # MJ: 18/10/22 Removed below as (re)-extraction further in the pipeline makes this step irrelevant + # if(!is.Waiver(x$offset)){ + # model$offset[which( is.na( + # point_in_polygon(poly = zones, points = model$offset[,c('x','y')] )[['limit']] + # )), "spatial_offset" ] <- NA # Fill with NA + # } } - # The same with offset if specified, Note this operation below is computationally quite costly - # MJ: 18/10/22 Removed below as (re)-extraction further in the pipeline makes this step irrelevant - # if(!is.Waiver(x$offset)){ - # model$offset[which( is.na( - # point_in_polygon(poly = zones, points = model$offset[,c('x','y')] )[['limit']] - # )), "spatial_offset" ] <- NA # Fill with NA - # } } } + # Messenger if(getOption('ibis.setupmessages')) myLog('[Estimation]','green','Adding engine-specific parameters.') @@ -1262,7 +1284,7 @@ methods::setMethod( # Clip to limits again to be sure if(!is.Waiver(x$limits)) { if(settings$get('inference_only')==FALSE){ - out <- out$set_data("prediction", terra::mask(out$get_data("prediction"), model$background)) + out <- out$set_data("prediction", terra::mask(out$get_data("prediction"), x$limits$layer)) } out$settings$set("has_limits", TRUE) } else { diff --git a/R/utils-bart.R b/R/utils-bart.R index 261c8843..618ad303 100644 --- a/R/utils-bart.R +++ b/R/utils-bart.R @@ -53,6 +53,7 @@ built_formula_bart <- function(obj){ #' @param model A fitted [dbarts] model. #' @concept Taken from the \pkg{embarcadero} package. #' @return A [`data.frame`] with the variable importance information. +#' @aliases varimp.bart #' @keywords utils, internal #' @noRd varimp.bart <- function(model){ @@ -89,6 +90,7 @@ varimp.bart <- function(model){ #' @param smooth Smoothing factor for the x breaks (works like partials). (Default: \code{1}). #' @param transform Backtransform using pnorm or not. Set to \code{FALSE} if response was not binomial. #' @param values Either a [`numeric`] vector of supplied value ranges or \code{NULL} (Default). +#' @param variable_length A [`numeric`] on the number of partial effects to be derived. #' @param plot Whether a model should be created (Default: \code{TRUE}). #' @concept Function taken and adapted from the [embarcadero] package. #' @references @@ -104,10 +106,12 @@ varimp.bart <- function(model){ #' #' } #' @return A [`SpatRaster`] layer containing the partial effect +#' @aliases bart_partial_effect #' @keywords utils #' @noRd bart_partial_effect <- function (model, x.vars = NULL, equal = FALSE, - smooth = 1, transform = TRUE, values = NULL, plot = TRUE) { + smooth = 1, transform = TRUE, values = NULL, + variable_length = 100,plot = TRUE) { assertthat::assert_that( inherits(model,'bart'), @@ -143,7 +147,7 @@ bart_partial_effect <- function (model, x.vars = NULL, equal = FALSE, } lev <- lapply(c(1:nrow(minmax)), function(i) { seq(minmax$mins[i], minmax$maxs[i], (minmax$maxs[i] - - minmax$mins[i])/(10 * smooth)) + minmax$mins[i])/(variable_length * smooth)) }) for (i in 1:length(lev)) { if (length(lev) == 1) { @@ -157,13 +161,12 @@ bart_partial_effect <- function (model, x.vars = NULL, equal = FALSE, } } } - pd <- dbarts::pdbart(model, xind = x.vars, levs = lev, keepevery = 10, pl = FALSE) + pd <- dbarts::pdbart(model, xind = x.vars, levs = lev, keepevery = variable_length, pl = FALSE) } else { - levq = c(0.05, seq(0.1, 0.9, 0.1/smooth), - 0.95) + levq = c(0.05, seq(0.1, 0.9, length.out = (variable_length-2)/smooth), 0.95) pd <- dbarts::pdbart(model, xind = x.vars, levquants = levq, - keepevery = 10, + keepevery = 10, #levs = list(levq), pl = FALSE) } @@ -195,7 +198,7 @@ bart_partial_effect <- function (model, x.vars = NULL, equal = FALSE, ggplot2::theme_light(base_size = 20) + ggplot2::geom_ribbon(ggplot2::aes(ymin = q05, ymax = q95), fill = "deepskyblue1", alpha = 0.3) + - ggplot2::geom_line(size = 1.25) + + ggplot2::geom_line(linewidth = 1.25) + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5), axis.title.y = ggplot2::element_text(vjust = 1.7)) # If multiple variables, add facets @@ -220,6 +223,7 @@ bart_partial_effect <- function (model, x.vars = NULL, equal = FALSE, #' @references #' * Carlson, CJ. embarcadero: Species distribution modelling with Bayesian additive regression trees in r. Methods Ecol Evol. 2020; 11: 850– 858. https://doi.org/10.1111/2041-210X.13389 #' @return A [`SpatRaster`] layer containing the partial effect. +#' @aliases bart_partial_space #' @keywords utils #' @noRd bart_partial_space <- function(model, envs, x.vars = NULL, equal = FALSE, smooth = 1, transform = TRUE){ @@ -283,11 +287,11 @@ bart_partial_space <- function(model, envs, x.vars = NULL, equal = FALSE, smooth dfbin$is <- as.numeric(as.character(dfbin$is)) if (is.Raster(envs) && (terra::nlyr(envs)>1) ) { lyrtmp <- envs[[pd$xlbs[[i]]]] - lyrtr <- terra::reclassify(lyrtmp, as.matrix(dfbin)) + lyrtr <- terra::classify(lyrtmp, as.matrix(dfbin)) } else if (inherits(envs, "list")) { lyrtr <- lapply(envs, function(x) { lyrtmp <- x[[pd$xlbs[[i]]]] - return(terra::reclassify(lyrtmp, as.matrix(dfbin))) + return(terra::classify(lyrtmp, as.matrix(dfbin))) }) } if (exists("pdstack")) { @@ -308,13 +312,13 @@ bart_partial_space <- function(model, envs, x.vars = NULL, equal = FALSE, smooth lyrtmp <- envs[[pd$xlbs[[i]]]] xmat <- data.frame(from = c(min( terra::global(lyrtmp, "min", na.rm = TRUE)[,1], min(df$x)), xmeds), to = c(xmeds, max( terra::global(lyrtmp, "max", na.rm = TRUE)[,1], max(df$x))), becomes = df$med) - lyrtr <- terra::reclassify(lyrtmp, xmat, include.lowest = TRUE) + lyrtr <- terra::classify(lyrtmp, xmat, include.lowest = TRUE) } else if (inherits(x = envs, what = "list")) { lyrtr <- lapply(envs, function(x) { lyrtmp <- x[[pd$xlbs[[i]]]] xmat <- data.frame(from = c(min(terra::global(lyrtmp, "min", na.rm = TRUE)[,1], min(df$x)), xmeds), to = c(xmeds, max(terra::global(lyrtmp, "max", na.rm = TRUE)[,1], max(df$x))), becomes = df$med) - return(terra::reclassify(lyrtmp, xmat, include.lowest = TRUE)) + return(terra::classify(lyrtmp, xmat, include.lowest = TRUE)) }) } # Check if stack exists, otherwise create diff --git a/R/utils-breg.R b/R/utils-breg.R index 9bde8ba9..6b595ede 100644 --- a/R/utils-breg.R +++ b/R/utils-breg.R @@ -148,6 +148,7 @@ setup_prior_boom <- function(form, data, priors, family, exposure = NULL){ #' @param w A [`numeric`] [`vector`] containing the exposure variables for PPMs. Can #' be \code{NULL} if the model is not a PPM. #' @returns A [`data.frame`] with the respective prediction. +#' @aliases predict_boom #' @keywords utils, internal #' @noRd predict_boom <- function(obj, newdata, fam, params, w = NULL) { diff --git a/R/utils-form.R b/R/utils-form.R index d31ba15f..51601bc0 100644 --- a/R/utils-form.R +++ b/R/utils-form.R @@ -30,6 +30,7 @@ logistic <- function(a){ #' @concept bossMaps #' @references #' * Richards, F. J. 1959. A flexible growth function for empirical use. \emph{Journal of Experimental Botany} \strong{10}:290–301 +#' @aliases logisticRichard #' @keywords utils #' @noRd logisticRichard <- function(x, lower = 1, upper = 1, rate = 1, skew = 1, @@ -63,6 +64,10 @@ logit <- function(a){ #' @description back transforms a [numeric] vector using the appropriate link function #' @param x A [`numeric`] vector generated by a model #' @param link [`character`] indicating the link function to use (Default: \code{"log"}). +#' @aliases ilink +#' @examples +#' ilink(rpois(10,.7), link = "log") +#' #' @noRd ilink <- function(x, link = "log"){ assertthat::assert_that(is.numeric(x), diff --git a/R/utils-gdb.R b/R/utils-gdb.R index c128c2f8..c38f8d79 100644 --- a/R/utils-gdb.R +++ b/R/utils-gdb.R @@ -125,6 +125,7 @@ built_formula_gdb <- function(model, id, x, settings){ #' @param nd A new data.frame with all predictiors used in fit. #' @param template A [`SpatRaster`] object that can be used as spatial template. #' @returns A [`RasterLayer`] containing a presence-absence prediction. +#' @aliases predict_gdbclass #' @keywords utils #' @noRd predict_gdbclass <- function(fit, nd, template){ @@ -262,6 +263,7 @@ rm_insufficient_covs <- function(model, tr = 5){ #' * Renner, I.W., Elith, J., Baddeley, A., Fithian, W., Hastie, T., Phillips, S.J., Popovic, G. and Warton, D.I., 2015. Point process models for presence‐only analysis. Methods in Ecology and Evolution, 6(4), pp.366-379. #' * Fithian, W. & Hastie, T. (2013) Finite-sample equivalence in statistical models for presence-only data. The Annals of Applied Statistics 7, 1917–1939 #' @return A vector with the weights +#' @aliases ppm_weights #' @keywords utils #' @noRd ppm_weights <- function(df, pa, bg, use_area = FALSE, weight = 1e-6, type = "DWPR"){ diff --git a/R/utils-glmnet.R b/R/utils-glmnet.R index d499a9d4..79a37344 100644 --- a/R/utils-glmnet.R +++ b/R/utils-glmnet.R @@ -92,7 +92,7 @@ default.regularization <- function(p, m){ if (!ishinge[i]) return(0) avg <- mean(mm[, i]) - std <- max(sd(mm[, i]), 1/sqrt(np)) + std <- max(stats::sd(mm[, i]), 1/sqrt(np)) std * 0.5/sqrt(np) }) tmindev <- sapply(1:ncol(mm), function(i) { @@ -100,7 +100,7 @@ default.regularization <- function(p, m){ 0 || sum(mm[, i]) == nrow(mm)), 1, 0) }) pmax(0.001 * (apply(m, 2, max) - apply(m, 2, min)), hmindev, - tmindev, apply(as.matrix(mm), 2, sd) * classregularization) + tmindev, apply(as.matrix(mm), 2, stats::sd) * classregularization) } #' Determine best lambda @@ -111,6 +111,7 @@ default.regularization <- function(p, m){ #' By default use the one within 1 SE of minimum lambda, unless it falls on the very first value, #' likely indicating an overregularized model. In this case take the minimum value of all lambda's. #' @param obj A \code{"glmnet"} object. +#' @aliases determine_lambda #' @keywords internal, utils #' @noRd determine_lambda <- function(obj){ diff --git a/R/utils-inla.R b/R/utils-inla.R index 9051c163..6013b93a 100644 --- a/R/utils-inla.R +++ b/R/utils-inla.R @@ -490,7 +490,7 @@ coords_in_mesh <- function(mesh, coords) { #' TODO: Switch to posterior sampling #' https://groups.google.com/g/r-inla-discussion-group/c/y-rQlDVtzmM #' -#' @param mesh x A [`distribution`] object used for fitting an [`INLA`] model. +#' @param mesh x A [`distribution`] object used for fitting an INLA model. #' @param mod A trained [`distribution`] model. #' @param type The summary statistic to use. #' @param backtransf Either NULL or a function. @@ -713,10 +713,10 @@ post_prediction <- function(mod, nsamples = 100, myLog('[Summary]','green',paste('Formatted', length(vals), 'posterior samples')) # evaluate_model Function - A <- inlabru:::amatrix_eval(model, data = preds) + A <- inlabru:::ibm_amatrix(model, data = preds) A <- x$engine$data$stk_pred$stk_proj$A - effects <- inlabru:::evaluate_effect_multi_state( + effects <- inlabru::evaluate_effect_multi_state( model$effects[included], state = vals, data = preds, @@ -728,7 +728,7 @@ post_prediction <- function(mod, nsamples = 100, return(effects) } - values <- inlabru::evaluate_predictor( + values <- inlabru:::evaluate_predictor( model, state = state, data = data, @@ -781,7 +781,7 @@ post_prediction <- function(mod, nsamples = 100, } else { smy <- data.frame(apply(data, MARGIN = 1, mean, na.rm = TRUE), - apply(data, MARGIN = 1, sd, na.rm = TRUE), + apply(data, MARGIN = 1, stats::sd, na.rm = TRUE), t(apply(data,MARGIN = 1, quantile, prob = c(0.025, 0.5, 0.975),na.rm = TRUE)), apply(data, MARGIN = 1, min, na.rm = TRUE), apply(data, MARGIN = 1, max, na.rm = TRUE)) diff --git a/R/utils-predictors.R b/R/utils-predictors.R index 681249ff..dccab075 100644 --- a/R/utils-predictors.R +++ b/R/utils-predictors.R @@ -40,6 +40,7 @@ NULL #' new_x <- predictor_transform(x, option = 'scale') #' } #' @keywords utils +#' @aliases predictor_transform #' @export predictor_transform <- function(env, option, windsor_props = c(.05,.95), pca.var = 0.8, method = NULL, ...){ assertthat::assert_that( @@ -272,7 +273,8 @@ predictor_transform <- function(env, option, windsor_props = c(.05,.95), pca.var #' @param method As \code{'option'} for more intuitive method setting. Can be left empty (in this case option has to be set). #' @param ... other options (Non specified). #' @return Returns the derived adjusted [`SpatRaster`] objects of identical resolution. -#' @seealso predictor_derivate +#' @seealso predictor_transform +#' @aliases predictor_derivate #' @examples #' \dontrun{ #' # Create a hinge transformation of one or multiple SpatRaster. @@ -504,6 +506,7 @@ predictor_derivate <- function(env, option, nknots = 4, deriv = NULL, int_variab #' @param fill_method A [`character`] of the method for filling gaps to be used (Default: \code{'ngb'}). #' @param return_na_cells A [`logical`] value of whether the ids of grid cells with NA values is to be returned instead (Default: \code{FALSE}). #' @returns A [`SpatRaster`] object with the same number of layers as the input. +#' @aliases predictor_homogenize_na #' @examples #' \dontrun{ #' # Harmonize predictors @@ -700,7 +703,7 @@ makeBin <- function(v, n, nknots, cutoffs = NULL){ #' This function helps to remove highly correlated variables from a set of predictors. It supports multiple options #' some of which require both environmental predictors and observations, others only predictors. #' -#' Some of the options require different packages to be pre-installed, such as [`ranger`] or [`Boruta`]. +#' Some of the options require different packages to be pre-installed, such as \code{ranger} or \code{Boruta}. #' #' @details #' Available options are: @@ -708,9 +711,9 @@ makeBin <- function(v, n, nknots, cutoffs = NULL){ #' * \code{"none"} No prior variable removal is performed (Default). #' * \code{"pearson"}, \code{"spearman"} or \code{"kendall"} Makes use of pairwise comparisons to identify and #' remove highly collinear predictors (Pearson's \code{r >= 0.7}). -#' * \code{"abess"} A-priori adaptive best subset selection of covariates via the [`abess`] package (see References). Note that this +#' * \code{"abess"} A-priori adaptive best subset selection of covariates via the \code{abess} package (see References). Note that this #' effectively fits a separate generalized linear model to reduce the number of covariates. -#' * \code{"boruta"} Uses the [`Boruta`] package to identify non-informative features. +#' * \code{"boruta"} Uses the \code{Boruta} package to identify non-informative features. #' #' @note #' Using this function on predictors effectively means that a separate model is fitted on the data @@ -726,6 +729,7 @@ makeBin <- function(v, n, nknots, cutoffs = NULL){ #' @keywords utils #' @return A [`character`] [`vector`] of variable names to be excluded. #' If the function fails due to some reason return \code{NULL}. +#' @aliases predictor_filter #' @examples #' \dontrun{ #' # Remove highly correlated predictors @@ -794,7 +798,7 @@ predictors_filter_collinearity <- function( env, keep = NULL, cutoff = getOption x <- x[, !(colnames(x) %in% non.numeric.columns)] # Get all variables that are singular or unique in value - singular_var <- which(round( apply(x, 2, var),4) == 0) + singular_var <- which(round( apply(x, 2, stats::var),4) == 0) if(length(singular_var)>0) x <- x[,-singular_var] # Calculate correlation matrix @@ -835,13 +839,14 @@ predictors_filter_collinearity <- function( env, keep = NULL, cutoff = getOption #' @param observed A [`vector`] with observational records to use for determining variable importance. #' @param family A [`character`] indicating the family the observational data originates from. #' @param tune.type [`character`] indicating the type used for subset evaluation. -#' Options are \code{c("gic", "ebic", "bic", "aic", "cv")} as listed in [`abess`]. +#' Options are \code{c("gic", "ebic", "bic", "aic", "cv")} as listed in \code{abess}. #' @param lambda A [`numeric`] single lambda value for regularized best subset selection (Default: \code{0}). #' @param weight Observation weights. When weight = \code{NULL}, we set weight = \code{1} for each observation as default. #' @references #' * abess: A Fast Best Subset Selection Library in Python and R. Jin Zhu, Liyuan Hu, Junhao Huang, Kangkang Jiang, Yanhang Zhang, Shiyun Lin, Junxian Zhu, Xueqin Wang (2021). arXiv preprint arXiv:2110.09697. #' * A polynomial algorithm for best-subset selection problem. Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, Xueqin Wang. Proceedings of the National Academy of Sciences Dec 2020, 117 (52) 33117-33123; doi: 10.1073/pnas.2014241117 #' @keywords utils, internal +#' @aliases predictor_filter_abess #' @returns A [`vector`] of variable names to exclude predictors_filter_abess <- function( env, observed, method, family, tune.type = "cv", lambda = 0, weight = NULL, keep = NULL, ...){ @@ -863,7 +868,9 @@ predictors_filter_abess <- function( env, observed, method, family, tune.type = # Check that abess package is available check_package("abess") - if(!isNamespaceLoaded("abess")) { attachNamespace("abess");requireNamespace('abess') } + if(!("abess" %in% loadedNamespaces()) || ('abess' %notin% utils::sessionInfo()$otherPkgs) ) { + try({requireNamespace('abess');attachNamespace("abess")},silent = TRUE) + } # Build model abess_fit <- abess::abess(x = env, @@ -909,12 +916,12 @@ predictors_filter_abess <- function( env, observed, method, family, tune.type = #' All relevant feature selection using Boruta #' #' @description -#' This function uses the [`Boruta`] package to identify predictor variables with little information content. It iteratively +#' This function uses the \code{Boruta} package to identify predictor variables with little information content. It iteratively #' compares importances of attributes with importances of shadow attributes, created by shuffling original ones. #' Attributes that have significantly worst importance than shadow ones are being consecutively dropped. #' #' @note -#' This package depends on the [`ranger`] package to iteratively fit randomForest models. +#' This package depends on the \code{ranger} package to iteratively fit randomForest models. #' #' @inheritParams predictor_filter #' @param observed A [`vector`] with observational records to use for determining variable importance. @@ -923,6 +930,7 @@ predictors_filter_abess <- function( env, observed, method, family, tune.type = #' @references #' * Miron B. Kursa, Witold R. Rudnicki (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), 1-13. URL https://doi.org/10.18637/jss.v036.i11. #' @keywords utils, internal +#' @aliases predictor_filter_boruta #' @returns A [`vector`] of variable names to exclude. predictors_filter_boruta <- function( env, obs, method, keep = NULL, iter = 100, verbose = getOption('ibis.setupmessages'), ...){ @@ -934,6 +942,9 @@ predictors_filter_boruta <- function( env, obs, method, keep = NULL, is.logical(verbose) ) check_package("Boruta") + if(!("Boruta" %in% loadedNamespaces()) || ('Boruta' %notin% utils::sessionInfo()$otherPkgs) ) { + try({requireNamespace('Boruta');attachNamespace("Boruta")},silent = TRUE) + } # Get all variable names to test vars <- names(env) diff --git a/R/utils-scenario.R b/R/utils-scenario.R index d263f1bc..e9d4f45b 100644 --- a/R/utils-scenario.R +++ b/R/utils-scenario.R @@ -7,6 +7,7 @@ #' @param date_interpolation [`character`] on how missing dates between events should be interpolated. See [`project()`]. #' @return [`logical`] indicating if the two [`SpatRaster-class`] objects have the same. #' @keywords scenario +#' @aliases approximate_gaps #' @noRd approximate_gaps <- function(env, date_interpolation = "annual"){ assertthat::assert_that( @@ -83,6 +84,7 @@ approximate_gaps <- function(env, date_interpolation = "annual"){ #' @param newname A [`character`] with the new name for the variable. #' @param weights An optional variable layer to use for weighting. #' @param fun A function how the respective layers should be combined. +#' @aliases st_reduce #' @examples #' \dontrun{ #' st_reduce(obj, vars = c('forestShare', 'forestShare.2'), @@ -172,6 +174,7 @@ st_reduce <- function(obj, vars, newname, weights = NULL, fun = 'sum'){ #' of numeric time entries corresponding to the time dimension (Default: \code{NULL}). #' @param template An optional [`SpatRaster`] template to which the output should be aligned too. #' @returns A [`list`] containing [`SpatRaster`] objects. +#' @aliases stars_to_raster #' @keywords scenario, internal stars_to_raster <- function(obj, which = NULL, template = NULL){ assertthat::assert_that( @@ -244,6 +247,7 @@ stars_to_raster <- function(obj, which = NULL, template = NULL){ #' stars_to_raster(obj) #' } #' @seealso `stars_to_raster` +#' @aliases raster_to_stars #' @keywords scenario, internal raster_to_stars <- function(obj){ assertthat::assert_that( @@ -269,6 +273,8 @@ raster_to_stars <- function(obj){ new_env <- list() for(i in 1:terra::nlyr(obj)){ oo <- subset(obj, i) + # Check if times are unique + if(length(unique(times))==1) terra::time(oo) <- NULL suppressWarnings( o <- stars::st_as_stars(oo) ) # If CRS is NA if(is.na(sf::st_crs(o))) sf::st_crs(o) <- prj @@ -297,6 +303,7 @@ raster_to_stars <- function(obj){ #' @param obj A [`stars`] object with a time dimension (\code{"time"}). #' @param new A [`SpatRaster`] object with additional covariates to be added. #' @returns A [`stars`] object with the names of the [`SpatRaster`] object added. +#' @aliases st_add_raster #' @keywords scenario, internal st_add_raster <- function(obj, new){ assertthat::assert_that( @@ -339,6 +346,7 @@ st_add_raster <- function(obj, new){ #' A parameter called \code{"relative"} can be set to calculate relative change instead. #' @param scenario A [`stars`] object with a time dimension. #' @param relative A [`logical`] check whether to calculate relative changes instead. +#' @aliases summarise_projection #' @keywords internal, scenario #' @noRd summarise_projection <- function(scenario, fun = "mean", relative = TRUE){ @@ -414,14 +422,60 @@ summarise_projection <- function(scenario, fun = "mean", relative = TRUE){ return(out) } +#' Duplicate a provided stars raster +#' +#' @description +#' This function duplicates a provded [`stars`] object along a given dimension (usually time). +#' This can be useful for simply multiplication tasks, e.g. multiplying every attribute by another +#' stars object that needs to have the same dimensions. +#' @param obj A [`stars`] object that is to be duplicated. +#' @param dim A dimensions file return from \code{st_dimensions(...)}. +#' @param dimname A [`character`] of the dimension name to be used. +#' @examples +#' \dontrun{ +#' o <- st_rep(obj, ) +#' } +#' +#' @returns A [`stars`] object. +#' @keywords internal, scenarip +#' @noRd +st_rep <- function(obj, dim, dimname = "time"){ + assertthat::assert_that( + inherits(obj, "stars"), + inherits(dim, "dimensions"), + dimname %in% names(dim) + ) + # Check that coordinate systems and values are identical + assertthat::assert_that( + dim[[1]]$to == stars::st_dimensions(obj)[[1]]$to, + dim[[1]]$refsys == sf::st_crs(obj), + msg = "Coordinate system between provided dimension and obj are not identical!" + ) + + # Make a sequence of the given dimension + it <- seq(dim[[dimname]]$from, dim[[dimname]]$to-1) + + # Make a dummy + new <- obj + for(i in it) new <- c(new, obj,along = 3) + + stars::st_dimensions(new) <- dim # Redimension + + assertthat::assert_that(inherits(new, "stars")) + return(new) +} + #' Summarize change before to after #' #' @description #' This is a wrapper function to summarize the output of a scenario projection, but specifically #' calculates statistics of change for two time steps, a before and after step. +#' @note +#' This function currently requires the \code{"geosphere"} package installed. #' @param scenario A [`stars`] object with a time dimension. #' @references #' * Godsoe, W. (2014). Inferring the similarity of species distributions using Species’ Distribution Models. Ecography, 37(2), 130-136. +#' @aliases summarise_change #' @keywords internal, scenario #' @noRd summarise_change <- function(scenario){ @@ -430,7 +484,7 @@ summarise_change <- function(scenario){ ) # Check that geosphere is installed and loaded check_package("geosphere") - if(!("geosphere" %in% loadedNamespaces()) || ('geosphare' %notin% utils::sessionInfo()$otherPkgs) ) { + if(!("geosphere" %in% loadedNamespaces()) || ('geosphere' %notin% utils::sessionInfo()$otherPkgs) ) { try({requireNamespace('geosphere');attachNamespace("geosphere")},silent = TRUE) } else { if(getOption("ibis.setupmessages")) myLog("[Summary]","red","This summary function requires the geosphere package.") @@ -504,15 +558,17 @@ summarise_change <- function(scenario){ #' #' @description #' The reprojection of WGS84 currently fails due to some unforeseen bug. -#' This function is meant to reproject back the lasyer +#' This function is meant to reproject back the layer. #' @param obj A ['stars'] object to be clipped and cropped. #' @param template A ['SpatRaster'] or ['sf'] object to which the object should be projected. +#' @param use_gdalutils (Deprecated) [`logical`] on to use gdalutils hack around. #' @keywords internal, scenario #' @noRd -hack_project_stars <- function(obj, template){ +hack_project_stars <- function(obj, template, use_gdalutils = TRUE){ assertthat::assert_that( inherits(obj, "stars"), - is.Raster(template) || inherits(template, "sf") + is.Raster(template) || inherits(template, "sf"), + is.logical(use_gdalutils) ) # Get tempdir td <- terra::terraOptions(print = FALSE)[['tempdir']] @@ -528,25 +584,31 @@ hack_project_stars <- function(obj, template){ out <- c() for(v in names(obj)){ sub <- obj[v] - stars::write_stars(sub, file.path(td, "ReprojectedStars.tif")) - - # FIXME: ideally remove proj4 string dependency here - # Re project with terra - # temp <- terra::rast(x = file.path(td, "ReprojectedStars.tif")) - # temp <- terra::project(x = temp, - # y = template, - # align = TRUE, - # gdal = TRUE, - # threads = FALSE) - # terra::writeRaster(temp, file.path(td, "ReprojectedStars_temp.tif"),overwrite = TRUE) - suppressWarnings( - gdalUtils::gdalwarp(srcfile = file.path(td, "ReprojectedStars.tif"), - dstfile = file.path(td, "ReprojectedStars_temp.tif"), - s_srs = "EPSG:4296", - tr = terra::res(template), - te = terra::ext(template) |> st_bbox(), - t_srs = sf::st_crs(template)$proj4string) - ) + + if(use_gdalutils){ + check_package("gdalUtils") + # Write output + stars::write_stars(sub, file.path(td, "ReprojectedStars.tif")) + suppressWarnings( + gdalUtils::gdalwarp(srcfile = file.path(td, "ReprojectedStars.tif"), + dstfile = file.path(td, "ReprojectedStars_temp.tif"), + s_srs = "EPSG:4296", + tr = terra::res(template), + te = terra::ext(template) |> sf::st_bbox(), + t_srs = sf::st_crs(template)$proj4string) + ) + } else { + # Try and use terra + stars::write_stars(sub, file.path(td, "ReprojectedStars.tif")) + # Re project with terra + temp <- terra::rast(x = file.path(td, "ReprojectedStars.tif")) + temp <- terra::project(x = temp, + y = template, + align = TRUE, + gdal = TRUE, + threads = FALSE) + terra::writeRaster(temp, file.path(td, "ReprojectedStars_temp.tif"),overwrite = TRUE) + } oo <- stars::read_stars(file.path(td, "ReprojectedStars_temp.tif"),proxy = F) names(oo) <- v # Rename @@ -583,6 +645,7 @@ hack_project_stars <- function(obj, template){ #' @param layer A [`SpatRaster`] or [`sf`] object for which the centre of the range is to be calculated. #' If the distribution is continuous, then the centre is calculated as the value centre to all non-NA values. #' @param spatial A [`logical`] of whether outputs should be returned as spatial. +#' @aliases calculate_range_centre #' @keywords scenario, internal #' @noRd calculate_range_centre <- function(layer, spatial = TRUE) { diff --git a/R/utils-spatial.R b/R/utils-spatial.R index e0a265c5..e3ef3d31 100644 --- a/R/utils-spatial.R +++ b/R/utils-spatial.R @@ -6,6 +6,7 @@ #' @param x [`SpatRaster-class`] object. #' @param y [`SpatRaster-class`] or [`sf`] object. #' @keywords internal, utils +#' @aliases is_comparable_raster #' @return [`logical`] indicating if the two [`SpatRaster-class`] objects have the same #' resolution, extent, dimensionality, and coordinate system. #' @noRd @@ -38,12 +39,13 @@ terra_to_raster <- function(input){ assertthat::assert_that( is.Raster(input) ) + message("Converting a SpatRaster to raster, which should not be necessary!") # Check that package is available check_package("raster") if(!isNamespaceLoaded("raster")) { attachNamespace("raster");requireNamespace("raster") } out <- terra::as.raster(input) - if(raster::nlayers(out)>1) out <- raster::stack(out) + if(terra::nlyr(input)>1) out <- raster::stack(out) return(out) } @@ -135,6 +137,59 @@ point_in_polygon <- function(poly, points, coords = c('x','y')){ return(ov) } +#' Help function to create a minimum convex polygon for supplied biodiversity data +#' +#' @description +#' This function create a minimum convex polygon based on supplied spatial data. +#' @note +#' This is an internal function that makes use of data prepared within the `train` +#' call. +#' @param biod A [`list`] supplied by description. +#' @returns A [`sf`] object. +#' @aliases create_mcp +#' @keywords internal +#' @noRd +create_mcp <- function(biod, limits){ + assertthat::assert_that( + is.list(biod), length(biod)>=1, + is.list(limits) + ) + check_package("dplyr") + + # First collate all occurrence points in the supplied object + obs <- lapply( biod, function(x) { + guess_sf( x$observations ) |> dplyr::select("observed") + } ) + obs <- do.call("rbind", obs) + # Get only those observations with presence + obs <- obs[which(obs[[1]]>0),] + + # Assing unique id + obs$id <- 1:nrow(obs) + if(!utils::hasName(obs, "geometry")) obs <- rename_geometry(obs, "geometry") + + suppressMessages( + suppressWarnings( + out <- obs %>% + dplyr::group_by( id ) %>% + dplyr::summarise( geometry = sf::st_combine( geometry ) ) |> + sf::st_union() + ) + ) + # Convert to convex polygon + out <- sf::st_convex_hull(out) |> sf::st_cast("MULTIPOLYGON") |> + sf::st_as_sf() + + # Buffer if specified + if(limits$mcp_buffer>0) out <- out |> sf::st_buffer(dist = limits$mcp_buffer) + # Add a limit field + out$limit <- 1:nrow(out) + attr(out, "limits_method") <- "mcp" + + assertthat::assert_that(inherits(out, "sf")) + return(out) +} + #' Create mask based on a zonal layer #' #' @description @@ -156,6 +211,7 @@ point_in_polygon <- function(poly, points, coords = c('x','y')){ #' type [`sf`] (Default: \code{"limits"}). #' @param template An optional [`SpatRaster`] object on which which the zones should be rasterized (Default: \code{NULL}). #' @returns A [`sf`] or [`SpatRaster`] object. +#' @aliases create_zonaloccurrence_mask #' @keywords utils, internal #' @noRd create_zonaloccurrence_mask <- function(df, zones = NULL, buffer_width = NULL, column = "limits", template = NULL){ @@ -279,6 +335,7 @@ extent_expand <- function(e,f=0.1){ #' @param g A [`sf`] object containing some data. #' @param name A [`character`] with the new name for the geometry. #' @source https://gis.stackexchange.com/questions/386584/sf-geometry-column-naming-differences-r +#' @aliases rename_geometry #' @keywords internal, utils #' @noRd rename_geometry <- function(g, name){ @@ -300,6 +357,8 @@ rename_geometry <- function(g, name){ #' #' @param df A [`data.frame`], [`tibble`] or [`sf`] object. #' @param geom_name A [`character`] indicating the name of the geometry column (Default: \code{'geometry'}). +#' @returns A [`sf`] object. +#' @aliases guess_sf #' @keywords internal, utils #' @noRd guess_sf <- function(df, geom_name = 'geometry'){ @@ -354,6 +413,7 @@ guess_sf <- function(df, geom_name = 'geometry'){ #' @param background A template [`SpatRaster`] object describing the background. #' @param bandwidth A [`numeric`] of the input bandwidth (Default \code{2}). #' @returns A [`SpatRaster`] with the density of point observations. +#' @aliases st_kde #' @keywords utils, internal #' @noRd st_kde <- function(points, background, bandwidth = 3){ @@ -432,6 +492,7 @@ polygon_to_points <- function(poly, template, field_occurrence ) { #' @param ex Either a [`vector`], a [`SpatExtent`] or alternatively a [`SpatRaster`], [`Spatial*`] or [`sf`] object. #' @param lonlat A [`logical`] indication whether the extent is WGS 84 projection (Default: \code{TRUE}). #' @param output_unit [`character`] determining the units. Allowed is 'm' and 'km' (Default: \code{'km'}). +#' @aliases extent_dimensions #' @keywords utils, internal #' @noRd extent_dimensions <- function(ex, lonlat = terra::is.lonlat(ex), output_unit = 'km') { @@ -505,6 +566,7 @@ extent_dimensions <- function(ex, lonlat = terra::is.lonlat(ex), output_unit = ' #' Nearest Neighbour resampling (near) is recommended for discrete and bilinear #' resampling recommended for continuous data. See also help from [terra::resample] for other options. #' @return New [`SpatRaster`] object aligned to the supplied template layer. +#' @aliases alignRasters #' @examples #' \dontrun{ #' # Align one raster to another @@ -551,8 +613,8 @@ alignRasters <- function(data, template, method = "bilinear", func = mean, cl = #' @param x A \code{SpatRaster*} object corresponding. #' @param ... other arguments that can be passed to \code{\link{terra}} #' @return an empty [`SpatRaster`], i.e. all cells are \code{NA}. -#' @import terra -#' @keywords terra, utils +#' @keywords utils +#' @aliases emptyraster #' @examples #' require(terra) #' r <- rast(matrix(1:100, 5, 20)) @@ -584,6 +646,7 @@ emptyraster <- function(x, ...) { # add name, filename, #' @return A [`data.frame`] with the extracted covariate data from each provided data point. #' @details Nearest neighbour matching is done via the [geodist] R-package (\code{geodist::geodist}). #' @note If multiple values are of equal distance during the nearest neighbour check, then the results is by default averaged. +#' @aliases get_ngbvalue #' @examples #' \dontrun{ #' # Create matchup table @@ -685,12 +748,13 @@ get_ngbvalue <- function(coords, env, longlat = TRUE, field_space = c('x','y'), #' a small buffer is applied to try and obtain the remaining values. #' @details #' It is essentially a wrapper for [`terra::extract`]. -#' @param coords A [`Spatial`], [`data.frame`], [`matrix`] or [`sf`] object. +#' @param coords A [`data.frame`], [`matrix`] or [`sf`] object. #' @param env A [`SpatRaster`] object with the provided predictors. #' @param ngb_fill [`logical`] on whether cells should be interpolated from neighbouring values. #' @param rm.na [`logical`] parameter which - if set - removes all rows with a missing data point (\code{NA}) from the result. #' @return A [`data.frame`] with the extracted covariate data from each provided data point. #' @keywords utils +#' @aliases get_rastervalue #' @examples #' \dontrun{ #' # Extract values @@ -872,16 +936,17 @@ clean_rasterfile <- function(x, verbose = FALSE) #' Split raster factor levels to stack #' -#' @description Takes a single raster that is a [`factor`] and creates -#' a new [`SpatRaster`] that contains the individual levels. -#' @param ras A [`SpatRaster`] object that is a [`factor`]. Alternatively a [`SpatRaster`] object -#' can be supplied in which only factor variables are 'exploded'. -#' @param name An optional [`character`] name for the [`SpatRaster`]. -#' @param ... Other parameters (not used). -#' @returns A [`SpatRaster`] object. -#' @keywords utils, internal -#' @noRd -explode_factorized_raster <- function(ras, name = NULL, ...){ +#' @description Takes a single raster that is a [`factor`] and creates +#' a new [`SpatRaster`] that contains the individual levels. +#' +#' @param ras A [`SpatRaster`] object that is a [`factor`]. Alternatively a [`SpatRaster`] object +#' can be supplied in which only factor variables are 'exploded'. +#' @param name An optional [`character`] name for the [`SpatRaster`]. +#' @aliases explode_factorized_raster +#' @returns A [`SpatRaster`] object. +#' @keywords utils, internal +#' @noRd +explode_factorized_raster <- function(ras, name = NULL){ assertthat::assert_that(is.Raster(ras), is.null(name) || is.character(name)) @@ -1005,6 +1070,7 @@ explode_factorized_raster <- function(ras, name = NULL, ...){ #' # using a bias layer #' thin_points <- thin_observations(points, background, method = "bias", env = bias) #' } +#' @aliases thin_observations #' @references #' * Aiello‐Lammens, M. E., Boria, R. A., Radosavljevic, A., Vilela, B., & Anderson, R. P. (2015). spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models. Ecography, 38(5), 541-545. #' * Steen, V. A., Tingley, M. W., Paton, P. W., & Elphick, C. S. (2021). Spatial thinning and class balancing: Key choices lead to variation in the performance of species distribution models with citizen science data. Methods in Ecology and Evolution, 12(2), 216-226. diff --git a/R/utils-stan.R b/R/utils-stan.R index a469fe94..ff903b0c 100644 --- a/R/utils-stan.R +++ b/R/utils-stan.R @@ -1,462 +1,461 @@ -#' Built formula for STAN model -#' -#' @description -#' This function built a formula for a `engine_stan()` model. -#' @param model A [`list()`] object containing the prepared model data for a given biodiversity dataset. -#' @param x A [`BiodiversityDistribution`] object. -#' @param id The id for the species formula. -#' @param settings A [`Settings`] object. -#' @author Martin Jung -#' @note Function is not meant to be run outside the train() call. -#' @keywords internal -#' @noRd -built_formula_stan <- function(model, id, x, settings){ - assertthat::assert_that( - is.list(model), - length(model) > 0, - assertthat::has_name(model, "predictors"), - inherits(x, "BiodiversityDistribution"), - inherits(settings, 'Settings'), - is.character(id) || is.Id(id), - msg = "Error in model object. This function is not meant to be called outside ouf train()." - ) - # Get object for id - obj <- model$biodiversity[[id]] - # Extract basic stats from the model object - types <- as.character( sapply( model$biodiversity, function(x) x$type ) ) - fams <- as.character( sapply( model$biodiversity, function(z) z$family ) ) - bionames = sapply(model$biodiversity, function(x) x$name) - ids <- names(model$biodiversity) - priors <- model$priors - - # Default equation found (e.g. no separate specification of effects) - if(model$biodiversity[[id]]$equation==''){ - - # Go through each variable and build formula for likelihood - form <- to_formula(paste("observed", - " ~ ", "Intercept + ", - ifelse(model$biodiversity[[id]]$family=='poisson', " offset(log(w)) + ", ""), # Use log area as offset - paste(model$biodiversity[[id]]$predictors_names,collapse = " + "), - # Check whether a single dataset is provided, otherwise add other intercepts - ifelse(length(types)==1, - '', - paste('+',paste0('Intercept_', - make.names(tolower(sapply( model$biodiversity, function(x) x$name ))),'_', # Make intercept from name - sapply( model$biodiversity, function(x) x$type ),collapse = ' + ') - ) - ), - # # If multiple datasets, don't use intercept - # ifelse(length(ids)>1,"-1", ""), - collapse = " ") - ) - - # Add offset if specified - if(!is.Waiver(x$offset)){ form <- stats::update.formula(form, paste0('~ . + offset(spatial_offset)') ) } - # if( length( grep('Spatial',x$get_latent() ) ) > 0 ) {} # Possible to be implemented for CAR models - } else { - if(getOption('ibis.setupmessages')) myLog('[Estimation]','yellow','Use custom model equation.') - form <- to_formula(model$biodiversity[[1]]$equation) - # Update formula to weights if forgotten - if(model$biodiversity[[1]]$family=='poisson') form <- stats::update.formula(form, 'observed ~ .') - assertthat::assert_that( - all( all.vars(form) %in% c('observed','w', model[['predictors_names']]) ) - ) - } - return(form) -} - -#' Checks whether cmdstanr is available and otherwise tries to install it -#' -#' @param install A [`logical`] factor to indicate whether [cmdstanr] should be directly installed (Default: \code{TRUE}). -#' @param ask [`logical`] whether the cmdstanr package is to be installed (Default: \code{FALSE}). -#' @keywords stan, utils, internal -stan_check_cmd <- function(install = TRUE, ask = FALSE){ - assertthat::assert_that( - is.logical(install), is.logical(ask) - ) - # Check if available - if(!requireNamespace("cmdstanr", quietly = TRUE)){ - if(install){ - if(ask){ a <- utils::askYesNo("Install cmdstanr?") } else { a <- TRUE} - if(a){ - utils::install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos"))) - cmdstanr::check_cmdstan_toolchain() - cmdstanr::install_cmdstan(cores = 2) - } - } else { - check_package("cmdstanr") - } - } else { - invisible() - } -} - -#' Wrap a list with stan model code -#' -#' @description [engine_stan] builds a list with stan model code. This function -#' concatenates them together. -#' @param sm_code A [list] object with exactly 7 entries. -#' @returns A [character] object. -#' @keywords stan, utils -wrap_stanmodel <- function(sm_code){ - assertthat::assert_that(is.list(sm_code), - length(sm_code)==7) - out <- character(0) - - # Functions - out <- paste0("functions {") - for(i in sm_code$functions) out <- paste0(out, i, "\n") - out <- paste0(out, "\n}\n") - # Data - out <- paste(out, "data {") - for(i in sm_code$data) out <- paste0(out, i, "\n") - out <- paste0(out, "\n}\n") - # Transformed data - out <- paste(out, "transformed data {") - for(i in sm_code$transformed_data) out <- paste0(out, i, "\n") - out <- paste0(out, "\n}\n") - # Parameters - out <- paste(out, "parameters {") - for(i in sm_code$parameters) out <- paste0(out, i, "\n") - out <- paste0(out, "\n}\n") - # Transformed parameters - out <- paste(out, "transformed parameters {") - for(i in sm_code$transformed_parameters) out <- paste0(out, i, "\n") - out <- paste0(out, "\n}\n") - # Model - out <- paste(out, "model {") - for(i in sm_code$model) out <- paste0(out, i, "\n") - out <- paste0(out, "\n}\n") - # Generated quantities - out <- paste(out, "generated quantities {") - for(i in sm_code$generated_quantities) out <- paste0(out, i, "\n") - out <- paste0(out, "}") - - assertthat::assert_that(is.character(out), length(out)>0) - return(out) -} - -#' Write a cmdstanr model output to a specific file -#' -#' @description Write a [cmdstanr] model output to a specific destination -#' @param mod A supplied [cmdstanr] model -#' @param dir The model directory where the model chould be written. Should be a character / existing dir. -#' @keywords stan, utils -write_stanmodel <- function( mod, dir = tempdir() ) { - assertthat::assert_that( - dir.exists(dir) - ) - fname <- file.path( dir , paste0("rt_cmdstanr_", digest::digest(mod,"md5")) ) - file_stan <- paste0( fname, ".stan" ) - fileConn <- file( file_stan ) - writeLines( mod , fileConn ) - close(fileConn) - return(file_stan) -} - -#' Fit [cmdstanr] model and convert to [rstan] object -#' -#' @description This function fits a stan model using the light-weight interface provided -#' by [cmdstanr]. The code was adapted from McElreath [rethinking] package. -#' @param model_code A [`character`] pointing to the stan modelling code. -#' @param data A [`list`] with all the parameters required to run the [model_code] in stan. -#' @param algorithm A [`character`] giving the algorithm to use. Either \code{'sampling'} (Default), \code{'optimize'} or \code{'variational'} for penalized likelihood estimation. -#' @param chains A [`numeric`] indicating the number of chains to use for estimation. -#' @param cores Number of threads for sampling. Default set to \code{'getOption("ibis.nthread")'}. See [ibis_options()]. -#' @param threads [`numeric`] giving the number of threads to be run per chain. Has to be specified in accordance with cores. -#' @param iter A [`numeric`] value giving the number of MCMC samples to generate. -#' @param warmup [`numeric`] for the number of warm-up samples for MCMC. Default set to 1/2 of iter. -#' @param control A [`list`] with further control options for stan. -#' @param cpp_options A [`list`] with options for the Cpp compiling. -#' @param force [`logical`] indication whether to force recompile the model (Default: \code{FALSE}). -#' @param path [`character`] indicating a path to be made available to the stan compiler. -#' @param save_warmup A [`logical`] flag whether to save the warmup samples. -#' @param ... Other non-specified parameters. -#' @seealso [rethinking] R package -#' @returns A [rstan] object -#' @keywords misc, stan -#' @export -run_stan <- function( model_code, data = list(), - algorithm = "sampling", - chains = 4, cores = getOption("ibis.nthread"), - threads = 1, - iter = 1000, warmup = floor(iter / 2), - control = list(adapt_delta = 0.95), - cpp_options = list(), - force = FALSE, - path = base::getwd(), - save_warmup = TRUE, ... ) { - assertthat::assert_that( - is.numeric(chains), is.numeric(cores), - is.numeric(iter), is.numeric(warmup), - is.numeric(threads), - threads < cores, - is.list(data), - is.list(control), is.list(cpp_options), - is.logical(save_warmup), - is.logical(force) - ) - # Check that cmdstanr is available - check_package("cmdstanr") - cmdstanr::check_cmdstan_toolchain(quiet = TRUE) - - # Match the algorithm to be used - algorithm <- match.arg(algorithm, c("sampling", "optimize", "variational"), several.ok = FALSE) - - if( threads > 1 ) cpp_options[['stan_threads']] <- TRUE - - # Check extension - assertthat::assert_that( - is.character(model_code), - assertthat::has_extension(model_code, "stan") - ) - - # Now compile the model - mod <- cmdstanr::cmdstan_model( model_code, - compile = TRUE, - force_recompile = force, - cpp_options = cpp_options, - include_paths = path - # stanc_options = list("O1") # Can result in substantial speedups! - ) - - # Final parameters for sampling - samp <- iter - warmup - warm <- warmup - - # pull out any control arguments - carg_adapt_delta <- 0.95 - if ( !is.null( control[['adapt_delta']] ) ) - carg_adapt_delta <- as.numeric(control[['adapt_delta']]) - carg_max_treedepth <- 11 - if ( !is.null( control[['max_treedepth']] ) ) - carg_max_treedepth <- as.numeric(control[['max_treedepth']]) - - if(algorithm == "sampling"){ - # Sample - if ( threads > 1 ) { - cmdstanfit <- mod$sample( data = data, - chains = chains, - parallel_chains = cores, - iter_sampling = samp, iter_warmup = warm, - adapt_delta = carg_adapt_delta, - max_treedepth = carg_max_treedepth, - threads_per_chain = threads, - save_warmup = save_warmup, - ... ) - # coerce to stanfit object - stanfit <- rstan::read_stan_csv( cmdstanfit$output_files() ) - - } else { - cmdstanfit <- mod$sample( data = data, - chains = chains, - parallel_chains = cores, - iter_sampling = samp , iter_warmup = warm, - adapt_delta = carg_adapt_delta, - max_treedepth = carg_max_treedepth, - save_warmup = save_warmup, - ... ) - } - # coerce to stanfit object - stanfit <- rstan::read_stan_csv( cmdstanfit$output_files() ) - - } else if(algorithm == "optimize"){ - # Optimize for getting point estimates - stanfit <- mod$optimize(data = data, - #seed = seed, # This could be passed on - threads = threads - ) - } else if(algorithm == "variational") { - # Variational for approximating the posterior - stanfit <- mod$variational(data = data, - # seed = seed, - threads = threads - ) - } - - return(stanfit) -} - -#' Create a posterior prediction from a rstanfit object -#' -#' @description This function does simulates from the posterior -#' of a created stan model, therefore providing a fast and efficient way -#' to project coefficients obtained from Bayesian models to new/novel contexts. -#' -#' @param obj A \code{"stanfit"} object (as used by [`rstan`]). -#' @param form A [`formula`] object created for the [ibis.iSDM::DistributionModel]. -#' @param newdata A [data.frame] with new data to be used for prediction. -#' @param mode A [`character`] of whether the linear `predictor` or the `response` is to be summarized. -#' @param family A [`character`] giving the family for simulating linear response values (Default: \code{NULL}) -#' @param offset A [vector] with an optionally specified offset. -#' @param draws [numeric] indicating whether a specific number of draws should be taken. -#' @import posterior -#' @references -#' * [https://medium.com/@alex.pavlakis/making-predictions-from-stan-models-in-r-3e349dfac1ed](https://medium.com/@alex.pavlakis/making-predictions-from-stan-models-in-r-3e349dfac1ed). -#' * The [`brms`] R-package. -#' @export -posterior_predict_stanfit <- function(obj, form, newdata, mode = "predictor", family = NULL, offset = NULL, draws = NULL){ - assertthat::assert_that( - inherits(obj, "stanfit") || inherits(obj, "CmdStanFit"), - is.formula(form), - is.data.frame(newdata), - is.null(family) || is.character(family), - is.null(draws) || is.numeric(draws), - is.null(offset) || (length(offset) == nrow(newdata)) - ) - mode <- match.arg(mode, c("predictor", "response"), several.ok = FALSE) - # Build model matrix - # Note: This removes all NA cells from matrix - A <- stats::model.matrix(object = stats::delete.response(stats::terms(form)), - data = newdata) - assertthat::assert_that(nrow(A)>0, inherits(A, "matrix") || inherits(A, "dgCMatrix")) - # Remove intercept unless set - if(attr(stats::terms(form),"intercept") == 1) { - if(length(grep("Intercept", colnames(A), ignore.case = TRUE))>0){ - A <- A[,-(grep("(Intercept)", colnames(A),fixed = T))] - } - } - - # Draw from the posterior - if(inherits(obj, "stanfit")) { - pp <- posterior::as_draws_df(obj) - } else { - pp <- obj$draws() |> as.data.frame() - } - # Create a subset? - if (!is.null(draws)) { - pp <- pp[sample.int(nrow(pp), draws),] - } - # Get only beta coefficients and Intercept if set - if("Intercept" %in% colnames(pp)) what <- "beta|Intercept" else what <- "beta" - suppressWarnings( pp <- pp[ c(grep(what, colnames(pp), value = TRUE)) ] ) - if(utils::hasName(pp, "b_Intercept")) pp <- pp[ grep("b_Intercept",colnames(pp), invert = T)] - - # Prepare offset if set - if(!is.null(offset)) { - # Get only the rows in the A matrix (minus NA) - offset <- offset[as.numeric(row.names(A))] - } else { offset <- rep(0, nrow(A) ) } - - # Security checks - assertthat::assert_that( - nrow(A)>0, nrow(pp) > 0, - ncol(pp) == ncol(A), - is.numeric(offset) - ) - - # 16/01/2023 - Change towards matrix multiplication by default (below) - # if(mode == "predictor"){ - # # Summarize the coefficients from the posterior - # pp <- posterior::summarise_draws(pp) |> - # subset(select = c("variable", "mean", "q5", "median", "q95", "sd")) |> - # as.data.frame() - # # --- # - # pp$variable <- colnames(A) - # # Calculate b*X + offset if set - # preds <- cbind( - # A %*% pp[,"mean"] + ifelse(is.null(offset),0, offset), - # A %*% pp[,"q5"] + ifelse(is.null(offset),0, offset), - # A %*% pp[,"median"] + ifelse(is.null(offset),0, offset), - # A %*% pp[,"q95"] + ifelse(is.null(offset),0, offset), - # A %*% pp[,"sd"] + ifelse(is.null(offset),0, offset) - # ) - # - # # Add random noise equivalent to the posterior length and sd of the posterior - # # Necessary since we already summarize the moment above - # .rnorm_matrix <- function(mean, sd) { - # stopifnot(length(dim(mean)) == 2) - # error <- matrix(rnorm(length(mean), 0, sd), ncol = ncol(mean), byrow=TRUE) - # mean + error - # } - # preds <- .rnorm_matrix(preds, pp[,"sd"]) # FIXME: This only makes sense for mean. Apply mad to median? - # - # # Apply ilink - # if(!is.null(family)){ - # preds <- switch (family, - # "poisson" = ilink(preds, link = "log"), - # "binomial" = ilink(preds, link = "logit"), - # ilink(preds, link = "log") - # ) - # } - # - # } else { - # Simulate linear response approximating poisson_rng in stan - out <- vector("list", nrow(pp)) - # TODO: Parallelize over threads? - pb <- progress::progress_bar$new(total = nrow(pp), - format = "Simulating posterior samples (:spin) [:bar] :percent") - for(i in 1:nrow(pp)){ - pb$tick() - # Build eta as additive beta with the A matrix row - eta <- 0 + base::tcrossprod(as.matrix(pp)[i,] |> base::unname(), A) + offset - out[[i]] <- base::unname(eta) - } - - # Combine link - a <- do.call(rbind, out) - colnames(a) <- rownames(a) <- NULL - - # Backtransformation - if(mode == "response"){ - if(family == "poisson"){ - a <- apply(a, 2, function(lambda) ilink(lambda, link = "log")) - } else if(family == "binomial") { - a <- apply(a, 2, function(mu) ilink(mu, link = "logit")) - } - } - # # Draw random variable for each draw and lambda value - # if(family == "poisson"){ - # a <- suppressWarnings( lapply(out, function(lambda) rpois(nrow(A), ilink(lambda, link = "log")) ) ) - # } else if(family == "binomial") { - # a <- suppressWarnings( lapply(out, function(mu) rbinom(nrow(A), size = 1, prob = ilink(mu, link = "logit")) ) ) - # } else { - # stop("Not yet implemented method for prediction the linear response.") - # } - - # Finally summarize - preds <- cbind( - matrixStats::colMeans2(a, na.rm = TRUE), - matrixStats::colQuantiles(a, probs = c(.05,.5,.95), na.rm = TRUE), - matrixStats::colSds(a, na.rm = TRUE) - ) - - # ---- # - # Create output with cellid - out <- tibble::rowid_to_column(newdata, var = "cellid")["cellid"] |> as.data.frame() - out$cv <- out$q95 <- out$q50 <- out$q05 <- out$sd <- out$mean <- NA - out$mean[as.numeric(row.names(A))] <- preds[,1] - out$sd[as.numeric(row.names(A))] <- preds[,5] - out$q05[as.numeric(row.names(A))] <- preds[,2] - out$q50[as.numeric(row.names(A))] <- preds[,3] - out$q95[as.numeric(row.names(A))] <- preds[,4] - out$cv[as.numeric(row.names(A))] <- preds[,5] / preds[,1] - out$cellid <- NULL - - return(out) -} - -#' Show the stan code from a trained model -#' -#' @description -#' This helper function shows the code from a trained [DistributionModel] -#' using the [`engine_stan`]. -#' This function is emulated after a similar functionality in the [brms] R-package. -#' **It only works with models inferred with stan!** -#' @param obj Any prepared object. -#' @param ... not used. -#' -#' @return None. -#' @keywords engine -#' @seealso [rstan], [cmdstanr], [brms] -#' @name stancode -NULL -methods::setGeneric("stancode", - signature = methods::signature("obj"), - function(obj, ...) standardGeneric("stancode")) - -#' @rdname stancode -#' @method stancode DistributionModel -#' @keywords engine -#' @export -stancode.DistributionModel <- function(obj, ...) obj$stancode() +#' Built formula for STAN model +#' +#' @description +#' This function built a formula for a `engine_stan()` model. +#' @param model A [`list()`] object containing the prepared model data for a given biodiversity dataset. +#' @param x A [`BiodiversityDistribution`] object. +#' @param id The id for the species formula. +#' @param settings A [`Settings`] object. +#' @author Martin Jung +#' @note Function is not meant to be run outside the train() call. +#' @keywords internal +#' @noRd +built_formula_stan <- function(model, id, x, settings){ + assertthat::assert_that( + is.list(model), + length(model) > 0, + assertthat::has_name(model, "predictors"), + inherits(x, "BiodiversityDistribution"), + inherits(settings, 'Settings'), + is.character(id) || is.Id(id), + msg = "Error in model object. This function is not meant to be called outside ouf train()." + ) + # Get object for id + obj <- model$biodiversity[[id]] + # Extract basic stats from the model object + types <- as.character( sapply( model$biodiversity, function(x) x$type ) ) + fams <- as.character( sapply( model$biodiversity, function(z) z$family ) ) + bionames = sapply(model$biodiversity, function(x) x$name) + ids <- names(model$biodiversity) + priors <- model$priors + + # Default equation found (e.g. no separate specification of effects) + if(model$biodiversity[[id]]$equation==''){ + + # Go through each variable and build formula for likelihood + form <- to_formula(paste("observed", + " ~ ", "Intercept + ", + ifelse(model$biodiversity[[id]]$family=='poisson', " offset(log(w)) + ", ""), # Use log area as offset + paste(model$biodiversity[[id]]$predictors_names,collapse = " + "), + # Check whether a single dataset is provided, otherwise add other intercepts + ifelse(length(types)==1, + '', + paste('+',paste0('Intercept_', + make.names(tolower(sapply( model$biodiversity, function(x) x$name ))),'_', # Make intercept from name + sapply( model$biodiversity, function(x) x$type ),collapse = ' + ') + ) + ), + # # If multiple datasets, don't use intercept + # ifelse(length(ids)>1,"-1", ""), + collapse = " ") + ) + + # Add offset if specified + if(!is.Waiver(x$offset)){ form <- stats::update.formula(form, paste0('~ . + offset(spatial_offset)') ) } + # if( length( grep('Spatial',x$get_latent() ) ) > 0 ) {} # Possible to be implemented for CAR models + } else { + if(getOption('ibis.setupmessages')) myLog('[Estimation]','yellow','Use custom model equation.') + form <- to_formula(model$biodiversity[[1]]$equation) + # Update formula to weights if forgotten + if(model$biodiversity[[1]]$family=='poisson') form <- stats::update.formula(form, 'observed ~ .') + assertthat::assert_that( + all( all.vars(form) %in% c('observed','w', model[['predictors_names']]) ) + ) + } + return(form) +} + +#' Checks whether cmdstanr is available and otherwise tries to install it +#' +#' @param install A [`logical`] factor to indicate whether cmdstanr should be directly installed (Default: \code{TRUE}). +#' @param ask [`logical`] whether the cmdstanr package is to be installed (Default: \code{FALSE}). +#' @keywords stan, utils, internal +stan_check_cmd <- function(install = TRUE, ask = FALSE){ + assertthat::assert_that( + is.logical(install), is.logical(ask) + ) + # Check if available + if(!requireNamespace("cmdstanr", quietly = TRUE)){ + if(install){ + if(ask){ a <- utils::askYesNo("Install cmdstanr?") } else { a <- TRUE} + if(a){ + utils::install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos"))) + cmdstanr::check_cmdstan_toolchain() + cmdstanr::install_cmdstan(cores = 2) + } + } else { + check_package("cmdstanr") + } + } else { + invisible() + } +} + +#' Wrap a list with stan model code +#' +#' @description engine_stan builds a list with stan model code. This function +#' concatenates them together. +#' @param sm_code A [list] object with exactly 7 entries. +#' @returns A [character] object. +#' @keywords stan, utils +wrap_stanmodel <- function(sm_code){ + assertthat::assert_that(is.list(sm_code), + length(sm_code)==7) + out <- character(0) + + # Functions + out <- paste0("functions {") + for(i in sm_code$functions) out <- paste0(out, i, "\n") + out <- paste0(out, "\n}\n") + # Data + out <- paste(out, "data {") + for(i in sm_code$data) out <- paste0(out, i, "\n") + out <- paste0(out, "\n}\n") + # Transformed data + out <- paste(out, "transformed data {") + for(i in sm_code$transformed_data) out <- paste0(out, i, "\n") + out <- paste0(out, "\n}\n") + # Parameters + out <- paste(out, "parameters {") + for(i in sm_code$parameters) out <- paste0(out, i, "\n") + out <- paste0(out, "\n}\n") + # Transformed parameters + out <- paste(out, "transformed parameters {") + for(i in sm_code$transformed_parameters) out <- paste0(out, i, "\n") + out <- paste0(out, "\n}\n") + # Model + out <- paste(out, "model {") + for(i in sm_code$model) out <- paste0(out, i, "\n") + out <- paste0(out, "\n}\n") + # Generated quantities + out <- paste(out, "generated quantities {") + for(i in sm_code$generated_quantities) out <- paste0(out, i, "\n") + out <- paste0(out, "}") + + assertthat::assert_that(is.character(out), length(out)>0) + return(out) +} + +#' Write a cmdstanr model output to a specific file +#' +#' @description Write a cmdstanr model output to a specific destination +#' @param mod A supplied cmdstanr model +#' @param dir The model directory where the model chould be written. Should be a character / existing dir. +#' @keywords stan, utils +write_stanmodel <- function( mod, dir = tempdir() ) { + assertthat::assert_that( + dir.exists(dir) + ) + fname <- file.path( dir , paste0("rt_cmdstanr_", digest::digest(mod,"md5")) ) + file_stan <- paste0( fname, ".stan" ) + fileConn <- file( file_stan ) + writeLines( mod , fileConn ) + close(fileConn) + return(file_stan) +} + +#' Fit cmdstanr model and convert to rstan object +#' +#' @description This function fits a stan model using the light-weight interface provided +#' by cmdstanr. The code was adapted from McElreath rethinking package. +#' @param model_code A [`character`] pointing to the stan modelling code. +#' @param data A [`list`] with all the parameters required to run the model_code in stan. +#' @param algorithm A [`character`] giving the algorithm to use. Either \code{'sampling'} (Default), \code{'optimize'} or \code{'variational'} for penalized likelihood estimation. +#' @param chains A [`numeric`] indicating the number of chains to use for estimation. +#' @param cores Number of threads for sampling. Default set to \code{'getOption("ibis.nthread")'}. See [ibis_options()]. +#' @param threads [`numeric`] giving the number of threads to be run per chain. Has to be specified in accordance with cores. +#' @param iter A [`numeric`] value giving the number of MCMC samples to generate. +#' @param warmup [`numeric`] for the number of warm-up samples for MCMC. Default set to 1/2 of iter. +#' @param control A [`list`] with further control options for stan. +#' @param cpp_options A [`list`] with options for the Cpp compiling. +#' @param force [`logical`] indication whether to force recompile the model (Default: \code{FALSE}). +#' @param path [`character`] indicating a path to be made available to the stan compiler. +#' @param save_warmup A [`logical`] flag whether to save the warmup samples. +#' @param ... Other non-specified parameters. +#' @seealso rethinking R package +#' @returns A rstan object +#' @keywords misc, stan +#' @export +run_stan <- function( model_code, data = list(), + algorithm = "sampling", + chains = 4, cores = getOption("ibis.nthread"), + threads = 1, + iter = 1000, warmup = floor(iter / 2), + control = list(adapt_delta = 0.95), + cpp_options = list(), + force = FALSE, + path = base::getwd(), + save_warmup = TRUE, ... ) { + assertthat::assert_that( + is.numeric(chains), is.numeric(cores), + is.numeric(iter), is.numeric(warmup), + is.numeric(threads), + threads < cores, + is.list(data), + is.list(control), is.list(cpp_options), + is.logical(save_warmup), + is.logical(force) + ) + # Check that cmdstanr is available + check_package("cmdstanr") + cmdstanr::check_cmdstan_toolchain(quiet = TRUE) + + # Match the algorithm to be used + algorithm <- match.arg(algorithm, c("sampling", "optimize", "variational"), several.ok = FALSE) + + if( threads > 1 ) cpp_options[['stan_threads']] <- TRUE + + # Check extension + assertthat::assert_that( + is.character(model_code), + assertthat::has_extension(model_code, "stan") + ) + + # Now compile the model + mod <- cmdstanr::cmdstan_model( model_code, + compile = TRUE, + force_recompile = force, + cpp_options = cpp_options, + include_paths = path + # stanc_options = list("O1") # Can result in substantial speedups! + ) + + # Final parameters for sampling + samp <- iter - warmup + warm <- warmup + + # pull out any control arguments + carg_adapt_delta <- 0.95 + if ( !is.null( control[['adapt_delta']] ) ) + carg_adapt_delta <- as.numeric(control[['adapt_delta']]) + carg_max_treedepth <- 11 + if ( !is.null( control[['max_treedepth']] ) ) + carg_max_treedepth <- as.numeric(control[['max_treedepth']]) + + if(algorithm == "sampling"){ + # Sample + if ( threads > 1 ) { + cmdstanfit <- mod$sample( data = data, + chains = chains, + parallel_chains = cores, + iter_sampling = samp, iter_warmup = warm, + adapt_delta = carg_adapt_delta, + max_treedepth = carg_max_treedepth, + threads_per_chain = threads, + save_warmup = save_warmup, + ... ) + # coerce to stanfit object + stanfit <- rstan::read_stan_csv( cmdstanfit$output_files() ) + + } else { + cmdstanfit <- mod$sample( data = data, + chains = chains, + parallel_chains = cores, + iter_sampling = samp , iter_warmup = warm, + adapt_delta = carg_adapt_delta, + max_treedepth = carg_max_treedepth, + save_warmup = save_warmup, + ... ) + } + # coerce to stanfit object + stanfit <- rstan::read_stan_csv( cmdstanfit$output_files() ) + + } else if(algorithm == "optimize"){ + # Optimize for getting point estimates + stanfit <- mod$optimize(data = data, + #seed = seed, # This could be passed on + threads = threads + ) + } else if(algorithm == "variational") { + # Variational for approximating the posterior + stanfit <- mod$variational(data = data, + # seed = seed, + threads = threads + ) + } + + return(stanfit) +} + +#' Create a posterior prediction from a rstanfit object +#' +#' @description This function does simulates from the posterior +#' of a created stan model, therefore providing a fast and efficient way +#' to project coefficients obtained from Bayesian models to new/novel contexts. +#' +#' @param obj A \code{"stanfit"} object (as used by rstan). +#' @param form A [`formula`] object created for the [ibis.iSDM::DistributionModel]. +#' @param newdata A [data.frame] with new data to be used for prediction. +#' @param mode A [`character`] of whether the linear `predictor` or the `response` is to be summarized. +#' @param family A [`character`] giving the family for simulating linear response values (Default: \code{NULL}) +#' @param offset A [vector] with an optionally specified offset. +#' @param draws [numeric] indicating whether a specific number of draws should be taken. +#' @references +#' * [https://medium.com/@alex.pavlakis/making-predictions-from-stan-models-in-r-3e349dfac1ed](https://medium.com/@alex.pavlakis/making-predictions-from-stan-models-in-r-3e349dfac1ed). +#' * The brms R-package. +#' @export +posterior_predict_stanfit <- function(obj, form, newdata, mode = "predictor", family = NULL, offset = NULL, draws = NULL){ + assertthat::assert_that( + inherits(obj, "stanfit") || inherits(obj, "CmdStanFit"), + is.formula(form), + is.data.frame(newdata), + is.null(family) || is.character(family), + is.null(draws) || is.numeric(draws), + is.null(offset) || (length(offset) == nrow(newdata)) + ) + mode <- match.arg(mode, c("predictor", "response"), several.ok = FALSE) + # Build model matrix + # Note: This removes all NA cells from matrix + A <- stats::model.matrix(object = stats::delete.response(stats::terms(form)), + data = newdata) + assertthat::assert_that(nrow(A)>0, inherits(A, "matrix") || inherits(A, "dgCMatrix")) + # Remove intercept unless set + if(attr(stats::terms(form),"intercept") == 1) { + if(length(grep("Intercept", colnames(A), ignore.case = TRUE))>0){ + A <- A[,-(grep("(Intercept)", colnames(A),fixed = T))] + } + } + + # Draw from the posterior + if(inherits(obj, "stanfit")) { + pp <- posterior::as_draws_df(obj) + } else { + pp <- obj$draws() |> as.data.frame() + } + # Create a subset? + if (!is.null(draws)) { + pp <- pp[sample.int(nrow(pp), draws),] + } + # Get only beta coefficients and Intercept if set + if("Intercept" %in% colnames(pp)) what <- "beta|Intercept" else what <- "beta" + suppressWarnings( pp <- pp[ c(grep(what, colnames(pp), value = TRUE)) ] ) + if(utils::hasName(pp, "b_Intercept")) pp <- pp[ grep("b_Intercept",colnames(pp), invert = T)] + + # Prepare offset if set + if(!is.null(offset)) { + # Get only the rows in the A matrix (minus NA) + offset <- offset[as.numeric(row.names(A))] + } else { offset <- rep(0, nrow(A) ) } + + # Security checks + assertthat::assert_that( + nrow(A)>0, nrow(pp) > 0, + ncol(pp) == ncol(A), + is.numeric(offset) + ) + + # 16/01/2023 - Change towards matrix multiplication by default (below) + # if(mode == "predictor"){ + # # Summarize the coefficients from the posterior + # pp <- posterior::summarise_draws(pp) |> + # subset(select = c("variable", "mean", "q5", "median", "q95", "sd")) |> + # as.data.frame() + # # --- # + # pp$variable <- colnames(A) + # # Calculate b*X + offset if set + # preds <- cbind( + # A %*% pp[,"mean"] + ifelse(is.null(offset),0, offset), + # A %*% pp[,"q5"] + ifelse(is.null(offset),0, offset), + # A %*% pp[,"median"] + ifelse(is.null(offset),0, offset), + # A %*% pp[,"q95"] + ifelse(is.null(offset),0, offset), + # A %*% pp[,"sd"] + ifelse(is.null(offset),0, offset) + # ) + # + # # Add random noise equivalent to the posterior length and sd of the posterior + # # Necessary since we already summarize the moment above + # .rnorm_matrix <- function(mean, sd) { + # stopifnot(length(dim(mean)) == 2) + # error <- matrix(rnorm(length(mean), 0, sd), ncol = ncol(mean), byrow=TRUE) + # mean + error + # } + # preds <- .rnorm_matrix(preds, pp[,"sd"]) # FIXME: This only makes sense for mean. Apply mad to median? + # + # # Apply ilink + # if(!is.null(family)){ + # preds <- switch (family, + # "poisson" = ilink(preds, link = "log"), + # "binomial" = ilink(preds, link = "logit"), + # ilink(preds, link = "log") + # ) + # } + # + # } else { + # Simulate linear response approximating poisson_rng in stan + out <- vector("list", nrow(pp)) + # TODO: Parallelize over threads? + pb <- progress::progress_bar$new(total = nrow(pp), + format = "Simulating posterior samples (:spin) [:bar] :percent") + for(i in 1:nrow(pp)){ + pb$tick() + # Build eta as additive beta with the A matrix row + eta <- 0 + base::tcrossprod(as.matrix(pp)[i,] |> base::unname(), A) + offset + out[[i]] <- base::unname(eta) + } + + # Combine link + a <- do.call(rbind, out) + colnames(a) <- rownames(a) <- NULL + + # Backtransformation + if(mode == "response"){ + if(family == "poisson"){ + a <- apply(a, 2, function(lambda) ilink(lambda, link = "log")) + } else if(family == "binomial") { + a <- apply(a, 2, function(mu) ilink(mu, link = "logit")) + } + } + # # Draw random variable for each draw and lambda value + # if(family == "poisson"){ + # a <- suppressWarnings( lapply(out, function(lambda) rpois(nrow(A), ilink(lambda, link = "log")) ) ) + # } else if(family == "binomial") { + # a <- suppressWarnings( lapply(out, function(mu) rbinom(nrow(A), size = 1, prob = ilink(mu, link = "logit")) ) ) + # } else { + # stop("Not yet implemented method for prediction the linear response.") + # } + + # Finally summarize + preds <- cbind( + matrixStats::colMeans2(a, na.rm = TRUE), + matrixStats::colQuantiles(a, probs = c(.05,.5,.95), na.rm = TRUE), + matrixStats::colSds(a, na.rm = TRUE) + ) + + # ---- # + # Create output with cellid + out <- tibble::rowid_to_column(newdata, var = "cellid")["cellid"] |> as.data.frame() + out$cv <- out$q95 <- out$q50 <- out$q05 <- out$sd <- out$mean <- NA + out$mean[as.numeric(row.names(A))] <- preds[,1] + out$sd[as.numeric(row.names(A))] <- preds[,5] + out$q05[as.numeric(row.names(A))] <- preds[,2] + out$q50[as.numeric(row.names(A))] <- preds[,3] + out$q95[as.numeric(row.names(A))] <- preds[,4] + out$cv[as.numeric(row.names(A))] <- preds[,5] / preds[,1] + out$cellid <- NULL + + return(out) +} + +#' Show the stan code from a trained model +#' +#' @description +#' This helper function shows the code from a trained [DistributionModel] +#' using the [`engine_stan`]. +#' This function is emulated after a similar functionality in the brms R-package. +#' **It only works with models inferred with stan!** +#' @param obj Any prepared object. +#' @param ... not used. +#' +#' @return None. +#' @keywords engine +#' @seealso rstan, cmdstanr, brms +#' @name stancode +NULL +methods::setGeneric("stancode", + signature = methods::signature("obj"), + function(obj, ...) standardGeneric("stancode")) + +#' @rdname stancode +#' @method stancode DistributionModel +#' @keywords engine +#' @export +stancode.DistributionModel <- function(obj, ...) obj$stancode() diff --git a/R/utils.R b/R/utils.R index 6d419348..340a2645 100644 --- a/R/utils.R +++ b/R/utils.R @@ -16,6 +16,7 @@ #' @param ... Any additional outputs or words for display #' @examples #' myLog("[Setup]", "red", "Some error occurred during data preparation.") +#' @aliases myLog #' @keywords internal, utils #' @export myLog <- function(title = "[Processing]", col = 'green', ...) { @@ -85,6 +86,7 @@ to_camelcase <- function(x){ #' @param x A [`vector`] object #' @return [`character`] object. #' @concept function taken from `prioritizr` package +#' @aliases name_atomic #' @keywords internal, utils #' @examples #' name_atomic(letters) @@ -234,6 +236,7 @@ thresholdval <- function(x, knot) { #' vars <- c("Climate-temperature2015", "Elevation__sealevel", "Landuse.forest..meanshare") #' sanitize_names(vars) #' +#' @aliases sanitize_names #' @keywords utils #' @export sanitize_names <- function(names){ @@ -367,7 +370,7 @@ run_parallel <- function(X, FUN, cores = 1, approach = "parallel", export_packag #' of predictor values observed during model training. #' This function takes an internal model matrix and restricts the values seen in the predictor matrix #' to those observed during training. -#' @note This function is meant to be used within a certain [`engine`] or within [`project`]. +#' @note This function is meant to be used within a certain \code{"engine"} or within [`project`]. #' @param model A [`list`] with the input data used for inference. Created during model setup. #' @param pred An optional [`data.frame`] of the prediction container. #' @returns A [`data.frame`] with the clamped predictors. @@ -602,7 +605,7 @@ formula_combinations <- function(form, response = NULL, type= 'forward'){ #' Outlier detection via reverse jackknife #' #' @description -#' Implemententation of a Reverse Jackknife procedure as described by Chapman (2005). +#' Implementation of a Reverse Jackknife procedure as described by Chapman (2005). #' Can be used to identify outliers in environmental predictors or predictions. #' @param vals A [`numeric`] vector from which outliers are to be identified and removed. #' @param procedure [`character`] denoting what to do with outliers. @@ -610,6 +613,7 @@ formula_combinations <- function(form, response = NULL, type= 'forward'){ #' @references #' * Chapman, A.D. (2005) Principles and Methods of Data Cleaning - Primary Species and Species- Occurrence Data, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. #' @source [`bioGeo`] package code served as inspiration +#' @aliases rm_outlier_revjack #' @keywords utils #' @noRd rm_outlier_revjack <- function(vals, procedure = "missing"){ @@ -737,31 +741,63 @@ aggregate_observations2grid <- function(df, template, field_occurrence = 'observ #' **This function is intended to only run within ibis and with the model packages created by it.** #' @param model A [`list`] object containing the biodiversity and predictor objects. #' @param include_absences A [`logical`] of whether absences should be included (Default: \code{FALSE}). -#' @returns A [`sf`] object with the newly aggregated points. +#' @param point_column [`chracter`] on the column with observed values. +#' @param addName [`logical`] Should the name of the feature be added (Default: \code{FALSE}). +#' @param tosf [`logical`] of whether the output should be [`sf`] object (Default: \code{FALSE}). +#' @returns A [`matrix`] or [`sf`] object with the newly aggregated points. #' @keywords internal #' @noRd -collect_occurrencepoints <- function(model, include_absences = FALSE){ +collect_occurrencepoints <- function(model, include_absences = FALSE, + point_column = "observed", + addName = FALSE, + tosf = FALSE){ assertthat::assert_that( is.list(model), assertthat::has_name(model, "id"), assertthat::has_name(model, "biodiversity"), - is.logical(include_absences) + is.character(point_column), + is.logical(addName), + is.logical(include_absences), + is.logical(tosf) ) # Get the locations - locs <- do.call("rbind", - lapply(model$biodiversity, function(x){ + locs <- lapply(model$biodiversity, function(x){ z <- x$observations - if(!include_absences) z <- subset(z, observed > 0) - o <- sf::st_coordinates( guess_sf( z )[,1:2]) - o <- as.matrix(o) - colnames(o) <- c("x", "y") + if(!include_absences) z <- z[point_column > 0,] + if(tosf){ + # o <- subset( + o <- z |> guess_sf() |> + # select = c(point_column, attr(z, "sf_column"))) |> + rename_geometry("geometry") + } else { + o <- sf::st_coordinates( guess_sf( z )[,1:2]) + colnames(o) <- c("x", "y") + o <- as.data.frame(o) + o[[point_column]] <- z[,point_column] + o[["type"]] <- x$type + } + if(addName) suppressWarnings( o$name <- x$name ) return(o) - } + } ) - ) - assertthat::assert_that( - is.matrix(locs), nrow(locs)>1 - ) + # Combine + locs <- do.call(rbind, locs) + # Remove rownames + locs <- locs |> tibble::remove_rownames() + + if(!tosf){ + assertthat::assert_that( + is.matrix(locs) || is.data.frame(locs), + utils::hasName(locs, point_column) + ) + } else { + assertthat::assert_that(inherits(locs, "sf")) + if(is.na(sf::st_crs(locs))){ + suppressWarnings( + locs <- locs |> sf::st_set_crs(value = sf::st_crs(model$background)) + ) + } + } return(locs) } diff --git a/R/validate.R b/R/validate.R index 42f650af..5867608f 100644 --- a/R/validate.R +++ b/R/validate.R @@ -1,549 +1,547 @@ -#' Validation of a fitted distribution object -#' -#' @description -#' This function conducts a model evaluation based on -#' either on the fitted point data or any supplied independent. -#' **Currently only supporting point datasets. For validation of integrated models more work is needed.** -#' -#' @details -#' The \code{'validate'} function calculates different validation metrics -#' depending on the output type. -#' -#' The output metrics for each type are defined as follows: -#' **Continuous:** -#' -#' * \code{'n'} = Number of observations. -#' * \code{'rmse'} = Root Mean Square Error, \deqn{ \sqrt {\frac{1}{N} \sum_{i=1}^{N} (\hat{y_{i}} - y_{i})^2} } -#' * \code{'mae'} = Mean Absolute Error, \deqn{ \frac{ \sum_{i=1}^{N} y_{i} - x_{i} }{n} } -#' * \code{'logloss'} = Log loss, TBD -#' * \code{'normgini'} = Normalized Gini index, TBD -#' * \code{'cont.boyce'} = Continuous Boyce index, TBD -#' -#' **Discrete:** -#' -#' * \code{'n'} = Number of observations. -#' * \code{'auc'} = Area under the curve, TBD -#' * \code{'overall.accuracy'} = Overall Accuracy, TBD -#' * \code{'true.presence.ratio'} = True presence ratio or Jaccard index, TBD -#' * \code{'precision'} = Precision, TBD -#' * \code{'sensitivity'} = Sensitivity, TBD -#' * \code{'specificity'} = Specifivity, TBD -#' * \code{'tss'} = True Skill Statistics, TBD -#' * \code{'f1'} = F1 Score or Positive predictive value, TBD -#' * \code{'logloss'} = Log loss, TBD -#' * \code{'expected.accuracy'} = Expected Accuracy, TBD -#' * \code{'kappa'} = Kappa value, TBD -#' * \code{'brier.score'} = Brier score, TBD -#' -#' @param mod A fitted [`BiodiversityDistribution`] object with set predictors. Alternatively one can also -#' provide directly a [`SpatRaster`], however in this case the `point` layer also needs to be provided. -#' @param method Should the validation be conducted on the continious prediction or a -#' (previously calculated) thresholded layer in binary format? Note that depending -#' on the method different metrics can be computed. See Details. -#' @param layer In case multiple layers exist, which one to use? (Default: \code{'mean'}). -#' @param point A [`sf`] object with type `POINT` or `MULTIPOINT`. -#' @param point_column A [`character`] vector with the name of the column containing the independent observations. -#' (Default: \code{'observed'}). -#' @param ... Other parameters that are passed on. Currently unused. -#' @returns Return a tidy [`tibble`] with validation results. -#' @note If you use the Boyce Index, please cite the original Hirzel et al. (2006) paper. -#' -#' @references -#' * Liu, C., White, M., Newell, G., 2013. Selecting thresholds for the prediction of species occurrence with presence-only data. J. Biogeogr. 40, 778–789. https://doi.org/10.1111/jbi.12058 -#' * Hirzel, A. H., Le Lay, G., Helfer, V., Randin, C., & Guisan, A. (2006). Evaluating the ability of habitat suitability models to predict species presences. Ecological modelling, 199(2), 142-152. -#' @examples -#' \dontrun{ -#' # Assuming that mod is a distribution object and has a thresholded layer -#' mod <- threshold(mod, method = "TSS") -#' validate(mod, method = "discrete") -#' } -#' @name validate -#' @aliases validate -#' @keywords train -#' @exportMethod validate -#' @export -NULL -methods::setGeneric("validate", - signature = methods::signature("mod"), - function(mod, method = 'continuous', layer = "mean", - point = NULL, point_column = 'observed', ...) standardGeneric("validate")) - -#' @name validate -#' @rdname validate -#' @usage \S4method{validate}{ANY, character, sf, character, character}(mod, method, point, layer, point_column) -methods::setMethod( - "validate", - methods::signature(mod = "ANY"), - function(mod, method = 'continuous', layer = "mean", - point = NULL, point_column = 'observed', ...){ - assertthat::assert_that( - inherits(mod, "DistributionModel"), - inherits(point, 'sf') || is.null(point), - is.null(point_column) || is.character(point_column), - is.character(layer), - is.character(method) - ) - assertthat::assert_that( "prediction" %in% mod$show_rasters(),msg = "No prediction of the fitted model found!" ) - # Check that independent data is provided and if so that the used column is there - if(!is.null(point)){ - assertthat::assert_that(is.character(point_column), - utils::hasName(point, point_column), - anyNA(point[[point_column]])==FALSE - ) - } - # Match method to be sure - method <- match.arg(method, c('continuous', 'discrete'), several.ok = FALSE) - - # Get settings from model object - settings <- mod$settings - - # Get prediction and threshold if available - prediction <- mod$get_data('prediction')[[layer]] - if( any(grep('threshold', mod$show_rasters())) ){ - tr_lyr <- grep('threshold', mod$show_rasters(),value = TRUE) - if(length(tr_lyr)>1) warning("There appear to be multiple thresholds. Using the first one.") - threshold <- mod$get_data(tr_lyr[1]) - # Get mean layer if there are multiple - if( grep(layer, names(threshold),value = TRUE ) != "") threshold <- threshold[[grep(layer, names(threshold),value = TRUE )]] - } else { threshold <- NULL } - - # Check that threshold and method match - if(is.null(threshold) && method == 'discrete'){ - if(getOption('ibis.setupmessages')) myLog('[Validation]','red','No threshold data found. Switching to continuous validation metrics.') - method <- 'continuous' - } - # If mode truncate was used, also switch to continuous data - if(method == "discrete"){ - if((attr(threshold,'format')!="binary")){ - if(getOption('ibis.setupmessages')) myLog('[Validation]','red','Only truncated threshold found. Switching to continuous validation metrics.') - method <- 'continuous' - } - } - - # Check whether limits were applied and if so, set background to 0 everywhere for validation - if(settings$get("has_limits")){ - temp <- mod$model$predictors_object$get_data()[[1]]; temp[!is.na(temp)] <- 0 - if(!is.null(threshold)){ - new <- sum(threshold, temp, na.rm = TRUE); new <- terra::mask(new, temp) - attr(new,'format') <- attr(threshold,'format') - if(attr(threshold,'format')=="binary") new <- terra::droplevels(new) - threshold <- new - rm(new) - } - # Same for prediction layer, where missing data are set to 0 for validation - prediction <- sum(prediction, temp, na.rm = TRUE) - prediction <- terra::mask(prediction, temp) - rm(temp) - } - - # Get/check point data - if(!is.null(point)){ - if(is.factor(point[[point_column]])){ - point[[point_column]] <- as.numeric(as.character(point[[point_column]])) - } - assertthat::assert_that( - unique(sf::st_geometry_type(point)) %in% c('POINT', 'MULTIPOINT'), - # Check that the point data has presence-absence information - utils::hasName(point, point_column), - !is.na(sf::st_crs(point)$proj) - ) - # If sf is different, reproject to prediction - if(sf::st_crs(point)!= sf::st_crs(prediction)){ - point <- sf::st_transform(point, crs = sf::st_crs(prediction) ) - } - if(!utils::hasName(point, "name")) point$name <- "Validation data" # Assign a name for validation. Assuming only one dataset is present - if(!utils::hasName(point, "type")) point$type <- ifelse(length(unique(point[[point_column]]))>1, "poipa", "poipo") # Type depending on input - # Ensure comparable columns - point <- subset(point, select = c(point_column, "name", "type", attr(point, "sf_column") )) - } else { - # TODO: Think about how to do validation with non-point data - if(getOption('ibis.setupmessages')) myLog('[Validation]','red','Validating model with non-independent training data. Results can be misleading!') - # Get all point datasets and combine them - point <- do.call(sf:::rbind.sf, - lapply(mod$model$biodiversity, function(y){ - o <- guess_sf(y$observations) - o$name <- y$name; o$type <- y$type - subset(o, select = c(point_column, "name", "type", attr(o, "sf_column"))) - } ) - ) |> tibble::remove_rownames() - if(is.factor(point[[point_column]])){ - point[[point_column]] <- as.numeric(as.character(point[[point_column]])) - } - } - assertthat::assert_that(nrow(point)>0, - utils::hasName(point, point_column)) - # --- # - # Do the extraction - df <- as.data.frame(point) - df$pred <- get_rastervalue(coords = point, env = prediction,rm.na = FALSE)[[layer]] - - if(!is.null(threshold)) df$pred_tr <- get_rastervalue(coords = point, env = threshold)[[grep("threshold", names(threshold),value = TRUE)]] - # Remove any sfc column if present - if(!is.null(attr(df, "sf_column"))) df[[attr(df, "sf_column")]] <- NULL - # Remove any NAs - df <- subset(df, stats::complete.cases(df)) - assertthat::assert_that( nrow(df)> 2, - length( unique(df[[point_column]]) )>1, - msg = "Validation was not possible owing to missing data.") - # --- # - # Messenger - if(getOption('ibis.setupmessages')) myLog('[Validation]','green','Calculating validation statistics') - - # Output container - results <- data.frame() - for(dataset in unique(df$name)){ - # Subset to name - df2 <- subset.data.frame(df, name == dataset) - - # Check that absence points are present, otherwise add some. - # Reason is that some engine such as inlabru don't save their integration points - if( !any(df2[[point_column]]==0) && method == "discrete"){ - if(getOption('ibis.setupmessages')) myLog('[Validation]','yellow','No absence data found for threshold. Generating random points.') - - # Use the pseudo-absence generation - o <- add_pseudoabsence(df = point, - field_occurrence = point_column, - template = threshold, - settings = pseudoabs_settings(background = threshold,nrpoints = nrow(df2)*2)) |> - subset(subset = observed == 0) - - abs <- list(); abs[[point_column]] <- o[[point_column]] - abs[["name"]] <- dataset; abs[["type"]] <- "poipo" - abs[["pred"]] <- get_rastervalue(coords = o, env = prediction)[[layer]] - abs[["pred_tr"]] <- get_rastervalue(coords = o, env = threshold)[[names(threshold)]] - - df2 <- rbind(df2, as.data.frame(abs)) - } - # Validate the threshold - out <- try({.validatethreshold(df2 = df2, point_column = point_column, mod = mod, - name = dataset, method = method, id = as.character(mod$id)) - }) - if(inherits(out, "try-error")) return(NULL) - results <- rbind.data.frame(results, out) - } - # Return result - return(results) - } -) - -#' @name validate -#' @rdname validate -#' @usage \S4method{validate}{SpatRaster, character, sf, character}(mod, method, point, point_column) -methods::setMethod( - "validate", - methods::signature(mod = "SpatRaster"), - function(mod, method = 'continuous', layer = NULL, point = NULL, point_column = 'observed', ...){ - assertthat::assert_that( - is.Raster(mod), - inherits(point, 'sf'), - is.character(method), - is.character(point_column) - ) - method <- match.arg(method, c("discrete", "continuous"),several.ok = FALSE) - - # If mode truncate was used, also switch to continuous data - if(method == "discrete"){ - # Get parameter if there - fm <- attr(mod,'format')!="binary" - if(isTRUE(fm)){ - if(getOption('ibis.setupmessages')) myLog('[Validation]','red','Only truncated threshold found. Switching to continuous validation metrics.') - method <- 'continuous' - } else { - if(length( unique(mod)[,1] )>2){ - if(getOption('ibis.setupmessages')) myLog('[Validation]','red','Non-discrete layer found. Switching to continuous validation metrics.') - method <- 'continuous' - } - } - } - assertthat::assert_that(nrow(point)>0, - utils::hasName(point, point_column)) - point <- subset(point, select = point_column) - - # Correct point column in case larger 1 - # FIXME: Only reasonable for discrete validation - if(method == "discrete"){ - if(any(point[[point_column]] > 1)) point[[point_column]] <- ifelse(point[[point_column]]>=1, 1, 0) - } - - # --- # - df <- as.data.frame(point) - df$pred_tr <- get_rastervalue(point, mod)[[names(mod)]] - if(method == "continuous") df$pred <- get_rastervalue(point, mod)[[names(mod)]] - - # Remove any sfc column if present - if(!is.null(attr(df, "sf_column"))) df[[attr(df, "sf_column")]] <- NULL - # Remove any NAs - df <- subset(df, stats::complete.cases(df)) - if(nrow(df) < 2) stop("Validation was not possible owing to missing data.") - # --- # - # Messenger - if(getOption('ibis.setupmessages')) myLog('[Validation]','green','Calculating validation statistics') - - if(!is.null(layer)) dataset <- layer else dataset <- "External" - - # Validate the threshold - out <- .validatethreshold(df, point_column = point_column, mod = NULL, - name = dataset, method = method, id = NA) - return(out) - } -) - -#' @noRd -#' @keywords internal -.validatethreshold <- function(df2, point_column, mod = NULL, name = NULL, method = 'fixed', id = NULL) { - - if(method == 'continuous'){ - # continuous evaluation - assertthat::assert_that(utils::hasName(df2, 'pred'), - utils::hasName(df2, point_column) - ) - #### Calculating Boyce index as in Hirzel et al. 2006 - # fit: A vector or SpatRaster containing the predicted suitability values - # obs: A vector containing the predicted suitability values or xy-coordinates (if fit is a SpatRaster) of the validation points (presence records) - # nclass : number of classes or vector with classes threshold. If nclass=0, Boyce index is calculated with a moving window (see next parameters) - # windows.w : width of the moving window (by default 1/10 of the suitability range) - # res : resolution of the moving window (by default 101 focals) - # PEplot : if True, plot the predicted to expected ratio along the suitability class - ecospat.boyce <- - function(fit, - obs, - nclass = 0, - window.w = "default", - res = 100, - PEplot = TRUE){ - boycei <- function(interval, obs, fit) { - fit.bin <- fit - obs.bin <- obs - fit.bin[fit[] >= interval[1] & fit[] <= interval[2]] <- "i" - fit.bin[fit.bin != "i"] <- 0 - obs.bin[obs[] >= interval[1] & obs[] <= interval[2]] <- "i" - obs.bin[obs.bin != "i"] <- 0 - pi <- length(which(obs.bin == "i")) / length(obs) - ei <- length(which(fit.bin == "i")) / length(fit.bin) - fi <- pi / ei - return(fi) - } - - if (window.w == "default") { - window.w <- (max(fit, na.rm = TRUE) - min(fit, na.rm = TRUE)) / 10 - } - - interval <- c(min(fit, na.rm = TRUE), max(fit, na.rm = TRUE)) - mini <- interval[1] - maxi <- interval[2] - - if (nclass == 0) { - vec.mov <- - seq( - from = mini, - to = maxi - window.w, - by = (maxi - mini - window.w) / res - ) - - vec.mov[res + 1] <- - vec.mov[res + 1] + 1 #Trick to avoid error with closed interval in R - - interval <- cbind(vec.mov, vec.mov + window.w) - } else if (length(nclass) > 1) { - vec.mov <- c(mini, nclass) - interval <- cbind(vec.mov, c(vec.mov[-1], maxi)) - } else if (nclass > 0 & length(nclass) < 2) { - vec.mov <- seq(from = mini, - to = maxi, - by = (maxi - mini) / nclass) - } - - f <- apply(interval, 1, boycei, obs, fit) - to.keep <- which(f != "NaN") # index to keep no NaN data - f <- f[to.keep] - - if (length(f) < 2) { - b <- NA #at least two points are necessary to draw a correlation - } else { - r <- c(1:length(f))[f != c(f[-1], FALSE)] #index to remove successive duplicates - b <- stats::cor(f[r], vec.mov[to.keep][r], method = "spearman") # calculation of the spearman correlation (i.e. Boyce index) after removing successive duplicated values - } - - HS <- apply(interval, 1, sum) / 2 # mean habitat suitability in the moving window - HS[length(HS)] <- HS[length(HS)] - 1 #Correction of the 'trick' to deal with closed interval - HS <- HS[to.keep] # exlude the NaN - - if (PEplot == TRUE) { - plot( - HS, - f, - xlab = "Habitat suitability", - ylab = "Predicted/Expected ratio", - col = "grey", - cex = 0.75 - ) - graphics::points(HS[r], f[r], pch = 19, cex = 0.75) - - } - - results <- list(F.ratio = f, - Spearman.cor = round(b, 3), - HS = HS) - return(results) - } - - # Function for Root-mean square error - RMSE <- function(pred, obs, na.rm = TRUE) { - sqrt(mean((pred - obs)^2, na.rm = na.rm)) - } - # Mean absolute error - MAE <- function(pred, obs, na.rm = TRUE) { - mean(abs(pred - obs), na.rm = na.rm) - } - # Function for log loss/cross-entropy loss. - Poisson_LogLoss <- function(y_pred, y_true) { - eps <- 1e-15 - y_pred <- pmax(y_pred, eps) - Poisson_LogLoss <- mean(log(gamma(y_true + 1)) + y_pred - log(y_pred) * y_true) - return(Poisson_LogLoss) - } - # Normalized Gini Coefficient - NormalizedGini <- function(y_pred, y_true) { - SumGini <- function(y_pred, y_true) { - y_true_sort <- y_true[order(y_pred, decreasing = TRUE)] - y_random <- 1:length(y_pred) / length(y_pred) - y_Lorentz <- cumsum(y_true_sort) / sum(y_true_sort) - SumGini <- sum(y_Lorentz - y_random) - return(SumGini) - } - NormalizedGini <- SumGini(y_pred, y_true) / SumGini(y_true, y_true) - return(NormalizedGini) - } - # Create output container - out <- data.frame( - modelid = id, - name = name, - method = method, - metric = c('n','rmse', 'mae', - 'logloss','normgini', - 'cont.boyce'), - value = NA - ) - # - # - out$value[out$metric=='n'] <- nrow(df2) # Number of records - out$value[out$metric=='rmse'] <- RMSE(pred = df2$pred, obs = df2[[point_column]]) # RMSE - out$value[out$metric=='mae'] <- MAE(pred = df2$pred, obs = df2[[point_column]]) # Mean absolute error - out$value[out$metric=='normgini'] <- NormalizedGini(y_pred = df2$pred, y_true = df2[[point_column]]) - - if(!is.null(mod)){ - if( any( sapply(mod$model$biodiversity, function(x) x$family) == "binomial" ) ){ - LogLoss <- function(y_pred, y_true) { - LogLoss <- -mean(y_true * log(y_pred) + (1 - y_true) * log(1 - y_pred)) - return(LogLoss) - } - out$value[out$metric=='logloss'] <- LogLoss(y_pred = df2$pred, y_true = df2[[point_column]]) - } else { - out$value[out$metric=='logloss'] <- Poisson_LogLoss(y_pred = df2$pred, y_true = df2[[point_column]]) - } - } else { - # Assume Poisson distributed values, calculate log-loss - out$value[out$metric=='logloss'] <- Poisson_LogLoss(y_pred = df2$pred, y_true = df2[[point_column]]) - } - - - # Boyce index. Wrap in try since is known to crash - try({ - if("modEvA" %in% utils::installed.packages()[,1]){ - check_package("modEvA") - suppressWarnings( - boi <- modEvA::Boyce(obs = df2[[point_column]], pred = df2$pred, plot = FALSE) - ) - } else { - # Run boyce a few times as average sample ? - # obs <- df2[df2[[point_column]]>0,] - # abs <- df2[sample(which(df2[[point_column]]==0), size = nrow(obs)), ] - # test <- rbind(obs, abs) - boi <- ecospat.boyce(obs = df2[[point_column]], fit = df2$pred, nclass = 0, PEplot = FALSE) - boi$Boyce <- boi$Spearman.cor - } - }, silent = TRUE) - if(exists('boi')) out$value[out$metric=='cont.boyce'] <- boi$Boyce - - } else { - # discrete evaluation - assertthat::assert_that(utils::hasName(df2, 'pred_tr'), - length(unique(df2[[point_column]])) > 1, - msg = "It appears as either the observed data or the threshold does not allow discrete validation.") - # Ensure that the threshold value is numeric - if(is.factor(df2$pred_tr)) df2$pred_tr <- as.numeric( as.character( df2$pred_tr )) - - # For discrete functions to work correctly, ensure that all values are 0/1 - df2[[point_column]] <- ifelse(df2[[point_column]] > 0, 1, 0 ) - # Build the confusion matrix - ta <- sum((df2["pred_tr"] == 0) & (df2[point_column] == 0)) # True absence - fp <- sum((df2["pred_tr"] == 1) & (df2[point_column] == 0)) # False presence - fa <- sum((df2["pred_tr"] == 0) & (df2[point_column] == 1)) # False absence - tp <- sum((df2["pred_tr"] == 1) & (df2[point_column] == 1)) # True presence - - # Binary brier Score - BS <- function(pred, obs, na.rm = TRUE) { - if(assertthat::see_if(length(unique(pred)) <= 2, - length(unique(obs)) <= 2)){ - mean( (as.numeric(as.character(pred)) - as.numeric(as.character(obs)))^2, na.rm = na.rm) - } else return(NA) - } - - # Output data.frame - out <- data.frame( - modelid = id, - name = name, - method = method, - metric = c('n','auc','overall.accuracy', 'true.presence.ratio', - 'precision','sensitivity', 'specificity', - 'tss', 'f1', 'logloss', - 'expected.accuracy', 'kappa', 'brier.score'), - value = NA - ) - - # Accuracy indices - out$value[out$metric=='n'] <- N <- ta + fp + fa + tp # Total number of records - out$value[out$metric=='overall.accuracy'] <- OA <- (tp + ta) / N # Overall accuracy - out$value[out$metric=='true.presence.ratio'] <- FOM <- tp / (tp + fp + fa) # True presence classifications - out$value[out$metric=='precision'] <- precision <- tp / (tp + fp) # Precision - out$value[out$metric=='sensitivity'] <- Sensitivity <- tp / (tp + fa) # Sensitivity - out$value[out$metric=='specificity'] <- Specificity <- ta / (ta + fp) # Specificity - out$value[out$metric=='tss'] <- TSS <- Sensitivity + Specificity - 1 # True Skill statistic - out$value[out$metric=='f1'] <- 2 * (precision * Sensitivity) / (precision + Sensitivity) # F1 score - Prob_1and1 <- ((tp + fp) / N) * ((tp + fa) / N) # Probability presence - Prob_0and0 <- ((ta + fa) / N) * ((ta + fp) / N) # Probability absence - out$value[out$metric=='expected.accuracy'] <- Expected_accuracy <- Prob_1and1 + Prob_0and0 # Expected accuracy - out$value[out$metric=='kappa'] <- (OA - Expected_accuracy) / (1 - Expected_accuracy) - - if("modEvA" %in% utils::installed.packages()[,1]){ - check_package("modEvA") - # Calculate AUC - suppressWarnings( - out$value[out$metric=='auc'] <- modEvA::AUC(obs = df2[[point_column]], pred = df2[['pred_tr']], simplif = TRUE, plot = FALSE) - ) - } - # Add brier score - out$value[out$metric=='brier.score'] <- BS(obs = df2[[point_column]], pred = df2[['pred_tr']]) - - # Evaluate Log loss / Cross-Entropy Loss for a predicted probability measure - # FIXME: Hacky. This likely won't work with specific formulations - if(!is.null(mod)){ - if( any( sapply(mod$model$biodiversity, function(x) x$family) == "binomial" ) ){ - LogLoss <- function(y_pred, y_true) { - LogLoss <- -mean(y_true * log(y_pred) + (1 - y_true) * log(1 - y_pred)) - return(LogLoss) - } - out$value[out$metric=='logloss'] <- LogLoss(y_pred = df2$pred, y_true = df2[[point_column]]) - } else { - # Function for log loss/cross-entropy loss. - Poisson_LogLoss <- function(y_pred, y_true) { - eps <- 1e-15 - y_pred <- pmax(y_pred, eps) - Poisson_LogLoss <- mean(log(gamma(y_true + 1)) + y_pred - log(y_pred) * y_true) - return(Poisson_LogLoss) - } - out$value[out$metric=='logloss'] <- Poisson_LogLoss(y_pred = df2$pred_tr, y_true = df2[[point_column]]) - } - } - } # End of discrete clause - return(out) -} +#' Validation of a fitted distribution object +#' +#' @description +#' This function conducts a model evaluation based on +#' either on the fitted point data or any supplied independent. +#' **Currently only supporting point datasets. For validation of integrated models more work is needed.** +#' +#' @details +#' The \code{'validate'} function calculates different validation metrics +#' depending on the output type. +#' +#' The output metrics for each type are defined as follows: +#' **Continuous:** +#' +#' * \code{'n'} = Number of observations. +#' * \code{'rmse'} = Root Mean Square Error, \deqn{ \sqrt {\frac{1}{N} \sum_{i=1}^{N} (\hat{y_{i}} - y_{i})^2} } +#' * \code{'mae'} = Mean Absolute Error, \deqn{ \frac{ \sum_{i=1}^{N} y_{i} - x_{i} }{n} } +#' * \code{'logloss'} = Log loss, TBD +#' * \code{'normgini'} = Normalized Gini index, TBD +#' * \code{'cont.boyce'} = Continuous Boyce index, TBD +#' +#' **Discrete:** +#' +#' * \code{'n'} = Number of observations. +#' * \code{'auc'} = Area under the curve, TBD +#' * \code{'overall.accuracy'} = Overall Accuracy, TBD +#' * \code{'true.presence.ratio'} = True presence ratio or Jaccard index, TBD +#' * \code{'precision'} = Precision, TBD +#' * \code{'sensitivity'} = Sensitivity, TBD +#' * \code{'specificity'} = Specifivity, TBD +#' * \code{'tss'} = True Skill Statistics, TBD +#' * \code{'f1'} = F1 Score or Positive predictive value, \deqn{ \frac{2TP}{2TP + FP + FN} } +#' * \code{'logloss'} = Log loss, TBD +#' * \code{'expected.accuracy'} = Expected Accuracy, \deqn{ \frac{TP + FP}{N} x \frac{TP + FN}{N} + \frac{TN + FN}{N} x \frac{TN + FP}{N} } +#' * \code{'kappa'} = Kappa value, \deqn{ \frac{2 (TP x TN - FN x FP)}{(TP + FP) x (FP + TN) + (TP + FN) x (FN + TN) } }, +#' * \code{'brier.score'} = Brier score, \deqn{ \frac{ \sum_{i=1}^{N} (y_{i} - x_{i})^{2} }{n} }, where $y_{i}$ is predicted presence or absence and $x_{i}$ an observed. +#' where TP is true positive, TN a true negative, FP the false positive and FN the false negative. +#' +#' @param mod A fitted [`BiodiversityDistribution`] object with set predictors. Alternatively one can also +#' provide directly a [`SpatRaster`], however in this case the `point` layer also needs to be provided. +#' @param method Should the validation be conducted on the continious prediction or a +#' (previously calculated) thresholded layer in binary format? Note that depending +#' on the method different metrics can be computed. See Details. +#' @param layer In case multiple layers exist, which one to use? (Default: \code{'mean'}). +#' @param point A [`sf`] object with type `POINT` or `MULTIPOINT`. +#' @param point_column A [`character`] vector with the name of the column containing the independent observations. +#' (Default: \code{'observed'}). +#' @param ... Other parameters that are passed on. Currently unused. +#' @returns Return a tidy [`tibble`] with validation results. +#' @note If you use the Boyce Index, please cite the original Hirzel et al. (2006) paper. +#' +#' @references +#' * Liu, C., White, M., Newell, G., 2013. Selecting thresholds for the prediction of species occurrence with presence-only data. J. Biogeogr. 40, 778–789. https://doi.org/10.1111/jbi.12058 +#' * Hirzel, A. H., Le Lay, G., Helfer, V., Randin, C., & Guisan, A. (2006). Evaluating the ability of habitat suitability models to predict species presences. Ecological modelling, 199(2), 142-152. +#' @examples +#' \dontrun{ +#' # Assuming that mod is a distribution object and has a thresholded layer +#' mod <- threshold(mod, method = "TSS") +#' validate(mod, method = "discrete") +#' } +#' @name validate +#' @aliases validate +#' @keywords train +#' @exportMethod validate +#' @export +NULL +methods::setGeneric("validate", + signature = methods::signature("mod"), + function(mod, method = 'continuous', layer = "mean", + point = NULL, point_column = 'observed', ...) standardGeneric("validate")) + +#' @name validate +#' @rdname validate +#' @usage \S4method{validate}{ANY,character,sf,character,character}(mod,method,point,layer,point_column,...) +methods::setMethod( + "validate", + methods::signature(mod = "ANY"), + function(mod, method = 'continuous', layer = "mean", + point = NULL, point_column = 'observed', ...){ + assertthat::assert_that( + inherits(mod, "DistributionModel"), + inherits(point, 'sf') || is.null(point), + is.null(point_column) || is.character(point_column), + is.character(layer), + is.character(method) + ) + # method = "discrete"; layer = "mean"; point = NULL; point_column = "observed" + assertthat::assert_that( "prediction" %in% mod$show_rasters(),msg = "No prediction of the fitted model found!" ) + # Check that independent data is provided and if so that the used column is there + if(!is.null(point)){ + assertthat::assert_that(is.character(point_column), + utils::hasName(point, point_column), + anyNA(point[[point_column]])==FALSE + ) + } + # Match method to be sure + method <- match.arg(method, c('continuous', 'discrete'), several.ok = FALSE) + + # Get settings from model object + settings <- mod$settings + + # Get prediction and threshold if available + prediction <- mod$get_data('prediction')[[layer]] + if( any(grep('threshold', mod$show_rasters())) ){ + tr_lyr <- grep('threshold', mod$show_rasters(),value = TRUE) + if(length(tr_lyr)>1) warning("There appear to be multiple thresholds. Using the first one.") + threshold <- mod$get_data(tr_lyr[1]) + # Get mean layer if there are multiple + if( grep(layer, names(threshold),value = TRUE ) != "") threshold <- threshold[[grep(layer, names(threshold),value = TRUE )]] + } else { threshold <- NULL } + + # Check that threshold and method match + if(is.null(threshold) && method == 'discrete'){ + if(getOption('ibis.setupmessages')) myLog('[Validation]','red','No threshold data found. Switching to continuous validation metrics.') + method <- 'continuous' + } + # If mode truncate was used, also switch to continuous data + if(method == "discrete"){ + if((attr(threshold,'format')!="binary")){ + if(getOption('ibis.setupmessages')) myLog('[Validation]','red','Only truncated threshold found. Switching to continuous validation metrics.') + method <- 'continuous' + } + } + + # Check whether limits were applied and if so, set background to 0 everywhere for validation + if(settings$get("has_limits")){ + temp <- mod$model$predictors_object$get_data()[[1]]; temp[!is.na(temp)] <- 0 + if(!is.null(threshold)){ + new <- sum(threshold, temp, na.rm = TRUE); new <- terra::mask(new, temp) + attr(new,'format') <- attr(threshold,'format') + if(attr(threshold,'format')=="binary") new <- terra::droplevels(new) + threshold <- new + rm(new) + } + # Same for prediction layer, where missing data are set to 0 for validation + prediction <- sum(prediction, temp, na.rm = TRUE) + prediction <- terra::mask(prediction, temp) + rm(temp) + } + + # Get/check point data + if(!is.null(point)){ + if(is.factor(point[[point_column]])){ + point[[point_column]] <- as.numeric(as.character(point[[point_column]])) + } + assertthat::assert_that( + unique(sf::st_geometry_type(point)) %in% c('POINT', 'MULTIPOINT'), + # Check that the point data has presence-absence information + utils::hasName(point, point_column), + !is.na(sf::st_crs(point)$proj) + ) + # If sf is different, reproject to prediction + if(sf::st_crs(point)!= sf::st_crs(prediction)){ + point <- sf::st_transform(point, crs = sf::st_crs(prediction) ) + } + if(!utils::hasName(point, "name")) point$name <- "Validation data" # Assign a name for validation. Assuming only one dataset is present + if(!utils::hasName(point, "type")) point$type <- ifelse(length(unique(point[[point_column]]))>1, "poipa", "poipo") # Type depending on input + # Ensure comparable columns + point <- subset(point, select = c(point_column, "name", "type", attr(point, "sf_column") )) + } else { + # TODO: Think about how to do validation with non-point data + if(getOption('ibis.setupmessages')) myLog('[Validation]','red','Validating model with non-independent training data. Results can be misleading!') + # Get all point datasets and combine them + point <- collect_occurrencepoints(mod$model, + include_absences = FALSE, + addName = TRUE, + tosf = TRUE) + if(is.factor(point[[point_column]])){ + point[[point_column]] <- as.numeric(as.character(point[[point_column]])) + } + } + assertthat::assert_that(nrow(point)>0, + utils::hasName(point, point_column)) + # --- # + # Do the extraction + df <- as.data.frame(point) + df$pred <- get_rastervalue(coords = point, env = prediction,rm.na = FALSE)[[layer]] + + if(!is.null(threshold)) df$pred_tr <- get_rastervalue(coords = point, env = threshold)[[grep("threshold", names(threshold),value = TRUE)]] + # Remove any sfc column if present + if(!is.null(attr(df, "sf_column"))) df[[attr(df, "sf_column")]] <- NULL + # Remove any NAs + df <- subset(df, stats::complete.cases(df)) + assertthat::assert_that( nrow(df)> 2, + length( unique(df[[point_column]]) )>1, + msg = "Validation was not possible owing to missing data.") + # --- # + # Messenger + if(getOption('ibis.setupmessages')) myLog('[Validation]','green','Calculating validation statistics') + + # Output container + results <- data.frame() + for(dataset in unique(df$name)){ + # Subset to name + df2 <- subset.data.frame(df, name == dataset) + + # Check that absence points are present, otherwise add some. + # Reason is that some engine such as inlabru don't save their integration points + if( !any(df2[[point_column]]==0) && method == "discrete"){ + if(getOption('ibis.setupmessages')) myLog('[Validation]','yellow','No absence data found for threshold. Generating random points.') + + # Use the pseudo-absence generation + o <- add_pseudoabsence(df = point, + field_occurrence = point_column, + template = threshold, + settings = pseudoabs_settings(background = threshold,nrpoints = nrow(df2)*2)) |> + subset(subset = observed == 0) + o$name <- dataset + o <- o[,which(names(o) %in% names(df2))] + o$pred <- get_rastervalue(coords = o, env = prediction)[[layer]] + o$pred_tr <- get_rastervalue(coords = o, env = threshold)[[names(threshold)]] + o <- o |> sf::st_drop_geometry() + df2 <- rbind(df2, as.data.frame(o)) + } + # Validate the threshold + out <- try({.validatethreshold(df2 = df2, point_column = point_column, mod = mod, + name = dataset, method = method, id = as.character(mod$id)) + }) + if(inherits(out, "try-error")) return(NULL) + results <- rbind.data.frame(results, out) + } + # Return result + return(results) + } +) + +#' @name validate +#' @rdname validate +#' @usage \S4method{validate}{SpatRaster,character,sf,character}(mod,method,point,point_column,...) +methods::setMethod( + "validate", + methods::signature(mod = "SpatRaster"), + function(mod, method = 'continuous', layer = NULL, point = NULL, point_column = 'observed', ...){ + assertthat::assert_that( + is.Raster(mod), + inherits(point, 'sf'), + is.character(method), + is.character(point_column) + ) + method <- match.arg(method, c("discrete", "continuous"),several.ok = FALSE) + + # If mode truncate was used, also switch to continuous data + if(method == "discrete"){ + # Get parameter if there + fm <- attr(mod,'format')!="binary" + if(isTRUE(fm)){ + if(getOption('ibis.setupmessages')) myLog('[Validation]','red','Only truncated threshold found. Switching to continuous validation metrics.') + method <- 'continuous' + } else { + if(length( unique(mod)[,1] )>2){ + if(getOption('ibis.setupmessages')) myLog('[Validation]','red','Non-discrete layer found. Switching to continuous validation metrics.') + method <- 'continuous' + } + } + } + assertthat::assert_that(nrow(point)>0, + utils::hasName(point, point_column)) + point <- subset(point, select = point_column) + + # Correct point column in case larger 1 + # FIXME: Only reasonable for discrete validation + if(method == "discrete"){ + if(any(point[[point_column]] > 1)) point[[point_column]] <- ifelse(point[[point_column]]>=1, 1, 0) + } + + # --- # + df <- as.data.frame(point) + df$pred_tr <- get_rastervalue(point, mod)[[names(mod)]] + if(method == "continuous") df$pred <- get_rastervalue(point, mod)[[names(mod)]] + + # Remove any sfc column if present + if(!is.null(attr(df, "sf_column"))) df[[attr(df, "sf_column")]] <- NULL + # Remove any NAs + df <- subset(df, stats::complete.cases(df)) + if(nrow(df) < 2) stop("Validation was not possible owing to missing data.") + # --- # + # Messenger + if(getOption('ibis.setupmessages')) myLog('[Validation]','green','Calculating validation statistics') + + if(!is.null(layer)) dataset <- layer else dataset <- "External" + + # Validate the threshold + out <- .validatethreshold(df, point_column = point_column, mod = NULL, + name = dataset, method = method, id = NA) + return(out) + } +) + +#' @noRd +#' @keywords internal +.validatethreshold <- function(df2, point_column, mod = NULL, name = NULL, method = 'fixed', id = NULL) { + + if(method == 'continuous'){ + # continuous evaluation + assertthat::assert_that(utils::hasName(df2, 'pred'), + utils::hasName(df2, point_column) + ) + #### Calculating Boyce index as in Hirzel et al. 2006 + # fit: A vector or SpatRaster containing the predicted suitability values + # obs: A vector containing the predicted suitability values or xy-coordinates (if fit is a SpatRaster) of the validation points (presence records) + # nclass : number of classes or vector with classes threshold. If nclass=0, Boyce index is calculated with a moving window (see next parameters) + # windows.w : width of the moving window (by default 1/10 of the suitability range) + # res : resolution of the moving window (by default 101 focals) + # PEplot : if True, plot the predicted to expected ratio along the suitability class + ecospat.boyce <- + function(fit, + obs, + nclass = 0, + window.w = "default", + res = 100, + PEplot = TRUE){ + boycei <- function(interval, obs, fit) { + fit.bin <- fit + obs.bin <- obs + fit.bin[fit[] >= interval[1] & fit[] <= interval[2]] <- "i" + fit.bin[fit.bin != "i"] <- 0 + obs.bin[obs[] >= interval[1] & obs[] <= interval[2]] <- "i" + obs.bin[obs.bin != "i"] <- 0 + pi <- length(which(obs.bin == "i")) / length(obs) + ei <- length(which(fit.bin == "i")) / length(fit.bin) + fi <- pi / ei + return(fi) + } + + if (window.w == "default") { + window.w <- (max(fit, na.rm = TRUE) - min(fit, na.rm = TRUE)) / 10 + } + + interval <- c(min(fit, na.rm = TRUE), max(fit, na.rm = TRUE)) + mini <- interval[1] + maxi <- interval[2] + + if (nclass == 0) { + vec.mov <- + seq( + from = mini, + to = maxi - window.w, + by = (maxi - mini - window.w) / res + ) + + vec.mov[res + 1] <- + vec.mov[res + 1] + 1 #Trick to avoid error with closed interval in R + + interval <- cbind(vec.mov, vec.mov + window.w) + } else if (length(nclass) > 1) { + vec.mov <- c(mini, nclass) + interval <- cbind(vec.mov, c(vec.mov[-1], maxi)) + } else if (nclass > 0 & length(nclass) < 2) { + vec.mov <- seq(from = mini, + to = maxi, + by = (maxi - mini) / nclass) + } + + f <- apply(interval, 1, boycei, obs, fit) + to.keep <- which(f != "NaN") # index to keep no NaN data + f <- f[to.keep] + + if (length(f) < 2) { + b <- NA #at least two points are necessary to draw a correlation + } else { + r <- c(1:length(f))[f != c(f[-1], FALSE)] #index to remove successive duplicates + b <- stats::cor(f[r], vec.mov[to.keep][r], method = "spearman") # calculation of the spearman correlation (i.e. Boyce index) after removing successive duplicated values + } + + HS <- apply(interval, 1, sum) / 2 # mean habitat suitability in the moving window + HS[length(HS)] <- HS[length(HS)] - 1 #Correction of the 'trick' to deal with closed interval + HS <- HS[to.keep] # exlude the NaN + + if (PEplot == TRUE) { + plot( + HS, + f, + xlab = "Habitat suitability", + ylab = "Predicted/Expected ratio", + col = "grey", + cex = 0.75 + ) + graphics::points(HS[r], f[r], pch = 19, cex = 0.75) + + } + + results <- list(F.ratio = f, + Spearman.cor = round(b, 3), + HS = HS) + return(results) + } + + # Function for Root-mean square error + RMSE <- function(pred, obs, na.rm = TRUE) { + sqrt(mean((pred - obs)^2, na.rm = na.rm)) + } + # Mean absolute error + MAE <- function(pred, obs, na.rm = TRUE) { + mean(abs(pred - obs), na.rm = na.rm) + } + # Function for log loss/cross-entropy loss. + Poisson_LogLoss <- function(y_pred, y_true) { + eps <- 1e-15 + y_pred <- pmax(y_pred, eps) + Poisson_LogLoss <- mean(log(gamma(y_true + 1)) + y_pred - log(y_pred) * y_true) + return(Poisson_LogLoss) + } + # Normalized Gini Coefficient + NormalizedGini <- function(y_pred, y_true) { + SumGini <- function(y_pred, y_true) { + y_true_sort <- y_true[order(y_pred, decreasing = TRUE)] + y_random <- 1:length(y_pred) / length(y_pred) + y_Lorentz <- cumsum(y_true_sort) / sum(y_true_sort) + SumGini <- sum(y_Lorentz - y_random) + return(SumGini) + } + NormalizedGini <- SumGini(y_pred, y_true) / SumGini(y_true, y_true) + return(NormalizedGini) + } + # Create output container + out <- data.frame( + modelid = id, + name = name, + method = method, + metric = c('n','rmse', 'mae', + 'logloss','normgini', + 'cont.boyce'), + value = NA + ) + # - # + out$value[out$metric=='n'] <- nrow(df2) # Number of records + out$value[out$metric=='rmse'] <- RMSE(pred = df2$pred, obs = df2[[point_column]]) # RMSE + out$value[out$metric=='mae'] <- MAE(pred = df2$pred, obs = df2[[point_column]]) # Mean absolute error + out$value[out$metric=='normgini'] <- NormalizedGini(y_pred = df2$pred, y_true = df2[[point_column]]) + + if(!is.null(mod)){ + if( any( sapply(mod$model$biodiversity, function(x) x$family) == "binomial" ) ){ + LogLoss <- function(y_pred, y_true) { + LogLoss <- -mean(y_true * log(y_pred) + (1 - y_true) * log(1 - y_pred)) + return(LogLoss) + } + out$value[out$metric=='logloss'] <- LogLoss(y_pred = df2$pred, y_true = df2[[point_column]]) + } else { + out$value[out$metric=='logloss'] <- Poisson_LogLoss(y_pred = df2$pred, y_true = df2[[point_column]]) + } + } else { + # Assume Poisson distributed values, calculate log-loss + out$value[out$metric=='logloss'] <- Poisson_LogLoss(y_pred = df2$pred, y_true = df2[[point_column]]) + } + + + # Boyce index. Wrap in try since is known to crash + try({ + if("modEvA" %in% utils::installed.packages()[,1]){ + check_package("modEvA") + suppressWarnings( + boi <- modEvA::Boyce(obs = df2[[point_column]], pred = df2$pred, plot = FALSE) + ) + } else { + # Run boyce a few times as average sample ? + # obs <- df2[df2[[point_column]]>0,] + # abs <- df2[sample(which(df2[[point_column]]==0), size = nrow(obs)), ] + # test <- rbind(obs, abs) + boi <- ecospat.boyce(obs = df2[[point_column]], fit = df2$pred, nclass = 0, PEplot = FALSE) + boi$Boyce <- boi$Spearman.cor + } + }, silent = TRUE) + if(exists('boi')) out$value[out$metric=='cont.boyce'] <- boi$Boyce + + } else { + # discrete evaluation + assertthat::assert_that(utils::hasName(df2, 'pred_tr'), + length(unique(df2[[point_column]])) > 1, + msg = "It appears as either the observed data or the threshold does not allow discrete validation.") + # Ensure that the threshold value is numeric + if(is.factor(df2$pred_tr)) df2$pred_tr <- as.numeric( as.character( df2$pred_tr )) + + # For discrete functions to work correctly, ensure that all values are 0/1 + df2[[point_column]] <- ifelse(df2[[point_column]] > 0, 1, 0 ) + # Build the confusion matrix + ta <- sum((df2["pred_tr"] == 0) & (df2[point_column] == 0)) # True absence + fp <- sum((df2["pred_tr"] == 1) & (df2[point_column] == 0)) # False presence + fa <- sum((df2["pred_tr"] == 0) & (df2[point_column] == 1)) # False absence + tp <- sum((df2["pred_tr"] == 1) & (df2[point_column] == 1)) # True presence + + # Binary brier Score + BS <- function(pred, obs, na.rm = TRUE) { + if(assertthat::see_if(length(unique(pred)) <= 2, + length(unique(obs)) <= 2)){ + mean( (as.numeric(as.character(pred)) - as.numeric(as.character(obs)))^2, na.rm = na.rm) + } else return(NA) + } + + # Output data.frame + out <- data.frame( + modelid = id, + name = name, + method = method, + metric = c('n','auc','overall.accuracy', 'true.presence.ratio', + 'precision','sensitivity', 'specificity', + 'tss', 'f1', 'logloss', + 'expected.accuracy', 'kappa', 'brier.score'), + value = NA + ) + + # Accuracy indices + out$value[out$metric=='n'] <- N <- ta + fp + fa + tp # Total number of records + out$value[out$metric=='overall.accuracy'] <- OA <- (tp + ta) / N # Overall accuracy + out$value[out$metric=='true.presence.ratio'] <- FOM <- tp / (tp + fp + fa) # True presence classifications + out$value[out$metric=='precision'] <- precision <- tp / (tp + fp) # Precision + out$value[out$metric=='sensitivity'] <- Sensitivity <- tp / (tp + fa) # Sensitivity + out$value[out$metric=='specificity'] <- Specificity <- ta / (ta + fp) # Specificity + out$value[out$metric=='tss'] <- TSS <- Sensitivity + Specificity - 1 # True Skill statistic + out$value[out$metric=='f1'] <- 2 * (precision * Sensitivity) / (precision + Sensitivity) # F1 score + Prob_1and1 <- ((tp + fp) / N) * ((tp + fa) / N) # Probability presence + Prob_0and0 <- ((ta + fa) / N) * ((ta + fp) / N) # Probability absence + out$value[out$metric=='expected.accuracy'] <- Expected_accuracy <- Prob_1and1 + Prob_0and0 # Expected accuracy + out$value[out$metric=='kappa'] <- (OA - Expected_accuracy) / (1 - Expected_accuracy) + + if("modEvA" %in% utils::installed.packages()[,1]){ + check_package("modEvA") + # Calculate AUC + suppressWarnings( + out$value[out$metric=='auc'] <- modEvA::AUC(obs = df2[[point_column]], pred = df2[['pred_tr']], simplif = TRUE, plot = FALSE) + ) + } + # Add brier score + out$value[out$metric=='brier.score'] <- BS(obs = df2[[point_column]], pred = df2[['pred_tr']]) + + # Evaluate Log loss / Cross-Entropy Loss for a predicted probability measure + # FIXME: Hacky. This likely won't work with specific formulations + if(!is.null(mod)){ + if( any( sapply(mod$model$biodiversity, function(x) x$family) == "binomial" ) ){ + LogLoss <- function(y_pred, y_true) { + LogLoss <- -mean(y_true * log(y_pred) + (1 - y_true) * log(1 - y_pred)) + return(LogLoss) + } + out$value[out$metric=='logloss'] <- LogLoss(y_pred = df2$pred, y_true = df2[[point_column]]) + } else { + # Function for log loss/cross-entropy loss. + Poisson_LogLoss <- function(y_pred, y_true) { + eps <- 1e-15 + y_pred <- pmax(y_pred, eps) + Poisson_LogLoss <- mean(log(gamma(y_true + 1)) + y_pred - log(y_pred) * y_true) + return(Poisson_LogLoss) + } + out$value[out$metric=='logloss'] <- Poisson_LogLoss(y_pred = df2$pred_tr, y_true = df2[[point_column]]) + } + } + } # End of discrete clause + return(out) +} diff --git a/R/waiver.R b/R/waiver.R index 6a842c28..a9437287 100644 --- a/R/waiver.R +++ b/R/waiver.R @@ -26,6 +26,7 @@ NULL #' # is it a waiver object? #' is.Waiver(w) #' +#' @aliases new_waiver #' @keywords misc #' @export new_waiver <- function() structure(list(), class = "Waiver") diff --git a/R/write_output.R b/R/write_output.R index 55fc3e15..a1aa4ba2 100644 --- a/R/write_output.R +++ b/R/write_output.R @@ -29,15 +29,15 @@ #' @export NULL methods::setGeneric("write_output", - signature = methods::signature("mod"), + signature = methods::signature("mod", "fname"), function(mod, fname, dt = "FLT4S", verbose = getOption("ibis.setupmessages"), ...) standardGeneric("write_output")) #' @name write_output #' @rdname write_output -#' @usage \S4method{write_output}{ANY, character, character, logical}(mod, fname, dt, verbose) +#' @usage \S4method{write_output}{ANY,character,character,logical}(mod,fname,dt,verbose) methods::setMethod( "write_output", - methods::signature("ANY"), + methods::signature("ANY", fname = "character"), function(mod, fname, dt = "FLT4S", verbose = getOption("ibis.setupmessages"), ...){ assertthat::assert_that( !missing(mod), @@ -75,10 +75,10 @@ methods::setMethod( #' @name write_output #' @rdname write_output -#' @usage \S4method{write_output}{BiodiversityScenario, character, character, logical}(mod, fname, dt, verbose) +#' @usage \S4method{write_output}{BiodiversityScenario,character,character,logical}(mod,fname,dt,verbose) methods::setMethod( "write_output", - methods::signature(mod = "BiodiversityScenario"), + methods::signature(mod = "BiodiversityScenario",fname = "character"), function(mod, fname, dt = "FLT4S", verbose = getOption("ibis.setupmessages"), ...) { assertthat::assert_that( !missing(mod), @@ -94,10 +94,10 @@ methods::setMethod( #' @name write_output #' @rdname write_output -#' @usage \S4method{write_output}{SpatRaster, character, character, logical}(mod, fname, dt, verbose) +#' @usage \S4method{write_output}{SpatRaster,character,character,logical}(mod,fname,dt,verbose) methods::setMethod( "write_output", - methods::signature(mod = "SpatRaster"), + methods::signature(mod = "SpatRaster",fname = "character"), function(mod, fname, dt = "FLT4S", verbose = getOption("ibis.setupmessages"), ...) { assertthat::assert_that( !missing(mod), @@ -121,10 +121,10 @@ methods::setMethod( #' @name write_output #' @rdname write_output -#' @usage \S4method{write_output}{data.frame, character, character, logical}(mod, fname, dt, verbose) +#' @usage \S4method{write_output}{data.frame,character,character,logical}(mod,fname,dt,verbose) methods::setMethod( "write_output", - methods::signature(mod = "data.frame"), + methods::signature(mod = "data.frame",fname = "character"), function(mod, fname, dt = "FLT4S", verbose = getOption("ibis.setupmessages"),...) { assertthat::assert_that( !missing(mod), @@ -142,10 +142,10 @@ methods::setMethod( #' @name write_output #' @rdname write_output -#' @usage \S4method{write_output}{stars, character, character, logical}(mod, fname, dt, verbose) +#' @usage \S4method{write_output}{stars,character,character,logical}(mod,fname,dt,verbose) methods::setMethod( "write_output", - methods::signature(mod = "stars"), + methods::signature(mod = "stars",fname = "character"), function(mod, fname, dt = "FLT4S", verbose = getOption("ibis.setupmessages"),...) { assertthat::assert_that( !missing(mod), @@ -296,16 +296,16 @@ writeNetCDF <- function(file, fname, #' @export NULL methods::setGeneric("write_summary", - signature = methods::signature("mod"), + signature = methods::signature("mod","fname"), function(mod, fname, partial = FALSE, verbose = getOption("ibis.setupmessages"),...) standardGeneric("write_summary")) #' @name write_summary #' @rdname write_summary -#' @usage \S4method{write_summary}{ANY, character, logical, logical}(mod, fname, partial, verbose) +#' @usage \S4method{write_summary}{ANY,character,logical,logical}(mod,fname,partial,verbose,...) methods::setMethod( "write_summary", - methods::signature(mod = "ANY"), + methods::signature(mod = "ANY", fname = "character"), function(mod, fname, partial = FALSE, verbose = getOption("ibis.setupmessages"), ...) { assertthat::assert_that( !missing(mod), @@ -484,7 +484,7 @@ methods::setGeneric("write_model", #' @name write_model #' @rdname write_model -#' @usage \S4method{write_model}{ANY, character, logical, logical}(mod, fname, slim, verbose) +#' @usage \S4method{write_model}{ANY,character,logical,logical}(mod,fname,slim,verbose) methods::setMethod( "write_model", methods::signature(mod = "ANY"), @@ -556,7 +556,7 @@ methods::setGeneric("load_model", #' @name load_model #' @rdname load_model -#' @usage \S4method{load_model}{character, logical}(fname, verbose) +#' @usage \S4method{load_model}{character,logical}(fname,verbose) methods::setMethod( "load_model", methods::signature(fname = "character"), @@ -581,7 +581,8 @@ methods::setMethod( # Convert predictions back to terra if data.frame model <- mod$model if(is.list( model$predictors_object ) ){ - ras <- terra::rast(model$predictors, type = "xyz", crs = terra::crs(model$background)) + ras <- terra::rast(model$predictors, type = "xyz", + crs = terra::crs(model$background)) assertthat::assert_that(all(names(ras) %in% model$predictors_names)) # Get any previously set attributes ats <- model$predictors_object diff --git a/R/zzz.R b/R/zzz.R index def2f9a4..62caaeec 100644 --- a/R/zzz.R +++ b/R/zzz.R @@ -14,8 +14,6 @@ options("rgdal_show_exportToProj4_warnings" = "none") # Set some default ibis options - options('ibis.nthread' = parallel::detectCores() - 1) - options('ibis.runparallel' = FALSE) options('ibis.setupmessages' = TRUE) # Option to have variable names "cleaned" by default options('ibis.cleannames' = TRUE) @@ -28,7 +26,10 @@ # Names of priors options('ibis.priors' = c('INLAPrior', 'BARTPrior', 'GDBPrior','GLMNETPrior', 'XGBPrior', 'BREGPrior', 'STANPrior')) + # Use the future package for any options. Default is FALSE + options('ibis.nthread' = parallel::detectCores() - 1) + options('ibis.runparallel' = FALSE) options('ibis.futurestrategy' = "multisession") options(doFuture.foreach.export = ".export-and-automatic-with-warning") @@ -36,7 +37,7 @@ options('ibis.dependencies' = c( "pdp", "scales", "biscale", "modEvA", "dplyr", "geodist", "geosphere", "progress", "glmnet", "glmnetUtils", "xgboost","BoomSpikeSlab", "INLA", "inlabru", - "gnlm", + "gnlm", "cubelyr", "dbarts", "mboost", "rstan", "cmdstanr" )) diff --git a/README.Rmd b/README.Rmd index 96d9cc72..9db3415d 100644 --- a/README.Rmd +++ b/README.Rmd @@ -26,7 +26,7 @@ knitr::opts_chunk$set( [![Codecov Status](https://codecov.io/gh/iiasa/ibis.iSDM/branch/master/graph/badge.svg)](https://app.codecov.io/gh/iiasa/ibis.iSDM?branch=master) -The **ibis.iSDM** package provides a series of convenience functions to fit integrated Species Distribution Models (iSDMs). With integrated models we generally refer to SDMs that incorporate information from different biodiversity datasets, external parameters such as priors or offsets with respect to certain variables and regions. See [Fletcher et al. (2019)](https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1002/ecy.2710) and [Isaac et al. (2020)](https://linkinghub.elsevier.com/retrieve/pii/S0169534719302551) for an introduction to iSDMs. +The **ibis.iSDM** package provides a series of convenience functions to fit integrated Species Distribution Models (iSDMs). With integrated models we generally refer to SDMs that incorporate information from different biodiversity datasets, external parameters such as priors or offsets with respect to certain variables and regions. See [Fletcher et al. (2019)](https://doi.org/10.1002/ecy.2710) and [Isaac et al. (2020)](https://linkinghub.elsevier.com/retrieve/pii/S0169534719302551) for an introduction to iSDMs. ## Installation diff --git a/README.md b/README.md index d459e834..0208873f 100644 --- a/README.md +++ b/README.md @@ -22,8 +22,7 @@ fit integrated Species Distribution Models (iSDMs). With integrated models we generally refer to SDMs that incorporate information from different biodiversity datasets, external parameters such as priors or offsets with respect to certain variables and regions. See [Fletcher et -al. (2019)](https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1002/ecy.2710) -and [Isaac et +al. (2019)](https://doi.org/10.1002/ecy.2710) and [Isaac et al. (2020)](https://linkinghub.elsevier.com/retrieve/pii/S0169534719302551) for an introduction to iSDMs. diff --git a/_pkgdown.yml b/_pkgdown.yml index cc026ba0..16cd6fba 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -39,13 +39,15 @@ navbar: href: articles/02_integrate_data.html - text: 3. Biodiversity projections href: articles/03_biodiversity_projections.html - - text: 4. Engine comparison - href: articles/04_engine_comparison.html - - text: 5. Other packages - href: articles/05_package_comparison.html + - text: 4. Simulating mechanisms + href: articles/04_mechanistic_estimation.html + - text: 5. Engine comparison + href: articles/05_engine_comparison.html + - text: 6. Other packages + href: articles/06_package_comparison.html faq: text: FAQ - href: articles/06_frequently-asked-questions.html + href: articles/07_frequently-asked-questions.html contributing: text: Contributing href: articles/contributing.html @@ -106,6 +108,8 @@ reference: - effects - partial - spartial + - partial_density + - limiting - threshold - ensemble - ensemble_partial @@ -116,7 +120,6 @@ reference: These functions are used by engines or spatial processing in the package. Most of them are for internal use, but can be of use if input needs to be reformatted. contents: - - explode_factorized_raster - posterior_predict_stanfit - alignRasters - emptyraster @@ -157,6 +160,7 @@ reference: contents: - as.Id - is.Id + - check - has_keyword("misc") - print - myLog diff --git a/man/BARTPrior.Rd b/man/BARTPrior.Rd index 67760c58..b075a546 100644 --- a/man/BARTPrior.Rd +++ b/man/BARTPrior.Rd @@ -6,7 +6,7 @@ \usage{ BARTPrior(variable, hyper = 0.75, ...) -\S4method{BARTPrior}{character, numeric}(variable, hyper) +\S4method{BARTPrior}{character,numeric}(variable,hyper,...) } \arguments{ \item{variable}{A \code{\link{character}} matched against existing predictors or latent effects.} diff --git a/man/BARTPriors.Rd b/man/BARTPriors.Rd index 559a8b06..d6357929 100644 --- a/man/BARTPriors.Rd +++ b/man/BARTPriors.Rd @@ -6,7 +6,7 @@ \usage{ BARTPriors(variable, hyper = 0.75, ...) -\S4method{BARTPriors}{character, numeric}(variable, hyper) +\S4method{BARTPriors}{character,numeric}(variable,hyper,...) } \arguments{ \item{variable}{A \code{\link{character}} matched against existing predictors or latent effects.} diff --git a/man/BREGPrior.Rd b/man/BREGPrior.Rd index f4292bb6..ff51c019 100644 --- a/man/BREGPrior.Rd +++ b/man/BREGPrior.Rd @@ -6,7 +6,7 @@ \usage{ BREGPrior(variable, hyper = NULL, ip = NULL) -\S4method{BREGPrior}{character, numeric, numeric}(variable, hyper, ip) +\S4method{BREGPrior}{character,ANY,ANY}(variable,hyper,ip) } \arguments{ \item{variable}{A \code{\link{character}} matched against existing predictors.} diff --git a/man/BREGPriors.Rd b/man/BREGPriors.Rd index ace8c6fe..83386aed 100644 --- a/man/BREGPriors.Rd +++ b/man/BREGPriors.Rd @@ -6,7 +6,7 @@ \usage{ BREGPriors(variable, hyper = NULL, ip = NULL) -\S4method{BREGPriors}{character, numeric, numeric}(variable, hyper, ip) +\S4method{BREGPriors}{character,ANY,ANY}(variable,hyper,ip) } \arguments{ \item{variable}{A \code{\link{character}} matched against existing predictors.} diff --git a/man/GDBPrior.Rd b/man/GDBPrior.Rd index d96298a3..d7763550 100644 --- a/man/GDBPrior.Rd +++ b/man/GDBPrior.Rd @@ -6,7 +6,7 @@ \usage{ GDBPrior(variable, hyper = "increasing", ...) -\S4method{GDBPrior}{character, character}(variable, hyper) +\S4method{GDBPrior}{character,character}(variable,hyper,...) } \arguments{ \item{variable}{A \code{\link{character}} matched against existing predictors variables.} diff --git a/man/GDBPriors.Rd b/man/GDBPriors.Rd index 9ebc4cb5..ab2a7862 100644 --- a/man/GDBPriors.Rd +++ b/man/GDBPriors.Rd @@ -6,7 +6,7 @@ \usage{ GDBPriors(variable, hyper = "increasing", ...) -\S4method{GDBPriors}{character, character}(variable, hyper) +\S4method{GDBPriors}{character,character}(variable,hyper,...) } \arguments{ \item{variable}{A \code{\link{character}} matched against existing predictors variables.} diff --git a/man/GLMNETPrior.Rd b/man/GLMNETPrior.Rd index 5b5d5d6a..87778d00 100644 --- a/man/GLMNETPrior.Rd +++ b/man/GLMNETPrior.Rd @@ -6,7 +6,7 @@ \usage{ GLMNETPrior(variable, hyper = 0, lims = c(-Inf, Inf), ...) -\S4method{GLMNETPrior}{character, numeric, numeric}(variable, hyper, lims) +\S4method{GLMNETPrior}{character,numeric,ANY}(variable,hyper,lims,...) } \arguments{ \item{variable}{A \code{\link{character}} variable passed on to the prior object.} diff --git a/man/GLMNETPriors.Rd b/man/GLMNETPriors.Rd index 35d9f355..0ab83973 100644 --- a/man/GLMNETPriors.Rd +++ b/man/GLMNETPriors.Rd @@ -6,7 +6,7 @@ \usage{ GLMNETPriors(variable, hyper = 0, lims = c(-Inf, Inf)) -\S4method{GLMNETPriors}{character, numeric, numeric}(variable, hyper, lims) +\S4method{GLMNETPriors}{character,numeric,ANY}(variable,hyper,lims,...) } \arguments{ \item{variable}{A \code{\link{character}} variable passed on to the prior object.} diff --git a/man/INLAPrior.Rd b/man/INLAPrior.Rd index 0322da3c..7d6746e7 100644 --- a/man/INLAPrior.Rd +++ b/man/INLAPrior.Rd @@ -6,7 +6,7 @@ \usage{ INLAPrior(variable, type = "normal", hyper = c(0, 0.001), ...) -\S4method{INLAPrior}{character, character}(variable, type) +\S4method{INLAPrior}{character,character,ANY}(variable,type,hyper,...) } \arguments{ \item{variable}{A \code{\link{character}} matched against existing predictors or latent effects.} diff --git a/man/INLAPriors.Rd b/man/INLAPriors.Rd index 3cb72029..a41fefe5 100644 --- a/man/INLAPriors.Rd +++ b/man/INLAPriors.Rd @@ -6,7 +6,7 @@ \usage{ INLAPriors(variables, type, hyper = c(0, 0.001), ...) -\S4method{INLAPriors}{vector, character}(variables, type) +\S4method{INLAPriors}{vector,character,ANY}(variables,type,hyper,...) } \arguments{ \item{variables}{A \code{\link{vector}} of \code{\link{character}} matched against existing predictors or latent effects.} diff --git a/man/STANPrior.Rd b/man/STANPrior.Rd index 14eff759..aa5b3d43 100644 --- a/man/STANPrior.Rd +++ b/man/STANPrior.Rd @@ -6,7 +6,7 @@ \usage{ STANPrior(variable, type, hyper = c(0, 2), ...) -\S4method{STANPrior}{character, character}(variable, type) +\S4method{STANPrior}{character,character,ANY}(variable,type,hyper,...) } \arguments{ \item{variable}{A \code{\link{character}} matched against existing predictors or latent effects.} @@ -22,6 +22,12 @@ a Gaussian distribution on the respective coefficient.} \description{ Function to create a new prior for \link{engine_stan} models. Priors currently can be set on specific environmental predictors. +} +\examples{ +\dontrun{ + pp <- STANPrior("forest", "normal", c(0,1)) +} + } \references{ \itemize{ @@ -31,7 +37,6 @@ can be set on specific environmental predictors. } \seealso{ \code{\linkS4class{Prior}}. -s Other prior: \code{\link{BARTPriors}()}, diff --git a/man/STANPriors.Rd b/man/STANPriors.Rd index 41c5962a..adab08ac 100644 --- a/man/STANPriors.Rd +++ b/man/STANPriors.Rd @@ -6,7 +6,7 @@ \usage{ STANPriors(variables, type, hyper = c(0, 2), ...) -\S4method{STANPriors}{vector, character}(variables, type) +\S4method{STANPriors}{vector,character,ANY}(variables,type,hyper,...) } \arguments{ \item{variables}{A \code{\link{vector}} of \code{\link{character}} matched against existing predictors or latent effects.} diff --git a/man/XGBPrior.Rd b/man/XGBPrior.Rd index 6d0a8115..6099f419 100644 --- a/man/XGBPrior.Rd +++ b/man/XGBPrior.Rd @@ -6,7 +6,7 @@ \usage{ XGBPrior(variable, hyper = "increasing", ...) -\S4method{XGBPrior}{character, character}(variable, hyper) +\S4method{XGBPrior}{character,character}(variable,hyper,...) } \arguments{ \item{variable}{A \code{\link{character}} matched against existing predictors or latent effects.} @@ -22,6 +22,12 @@ extreme gradient descent boosting model \code{\link{engine_xgboost}}. Monotonic priors enforce directionality in direction of certain variables, however specifying a monotonic constrain does not guarantee that the variable is not regularized out during model fitting. +} +\examples{ +\dontrun{ + pp <- XGBPrior("forest", "increasing") +} + } \references{ \itemize{ diff --git a/man/XGBPriors.Rd b/man/XGBPriors.Rd index 51dfa601..fc3e23f9 100644 --- a/man/XGBPriors.Rd +++ b/man/XGBPriors.Rd @@ -6,7 +6,7 @@ \usage{ XGBPriors(variable, hyper = "increasing", ...) -\S4method{XGBPriors}{character, character}(variable, hyper) +\S4method{XGBPriors}{character,character}(variable,hyper,...) } \arguments{ \item{variable}{A \code{\link{character}} matched against existing predictors or latent effects.} diff --git a/man/add_biodiversity_poipa.Rd b/man/add_biodiversity_poipa.Rd index 1cfdf878..1d503a64 100644 --- a/man/add_biodiversity_poipa.Rd +++ b/man/add_biodiversity_poipa.Rd @@ -18,12 +18,12 @@ add_biodiversity_poipa( ... ) -\S4method{add_biodiversity_poipa}{BiodiversityDistribution,sf}(x, poipa) +\S4method{add_biodiversity_poipa}{BiodiversityDistribution,sf,character,character,ANY,character,character,numeric,logical,logical}(x,poipa,name,field_occurrence,formula,family,link,weight,separate_intercept,docheck) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} -\item{poipa}{A \code{\link{data.frame}}, \code{\link{sf}} or \code{\link{Spatial}}) object of presence-absence point occurrences.} +\item{poipa}{A \code{\link{data.frame}} or \code{\link{sf}} object of presence-absence point occurrences.} \item{name}{The name of the biodiversity dataset used as internal identifier.} @@ -38,9 +38,10 @@ By default set to \code{"Observed"} and an error will be thrown if a \code{\link \item{weight}{A \code{\link{numeric}} value acting as a multiplier with regards to any weights used in the modelling. Larger weights indicate higher weighting relative to any other datasets. By default set to \code{1} if only -one dataset is added. A \code{\link{vector}} is also supported but must be of the same length as \code{\link{poipa}}.} +one dataset is added. A \code{\link{vector}} is also supported but must be of the same length as +parameter \code{"poipa"}.} -\item{separate_intercept}{A \code{\link{boolean}} value stating whether a separate intercept is to be added in. +\item{separate_intercept}{A \code{\link{logical}} value stating whether a separate intercept is to be added in. shared likelihood models for engines \link{engine_inla}, \link{engine_inlabru} and \link{engine_stan}.} \item{docheck}{\code{\link{logical}} on whether additional checks should be performed (e.g. intersection tests) (Default: \code{TRUE}).} @@ -62,7 +63,7 @@ for more information. } \details{ By default, the logit link function is used in a logistic regression setting -unless the specific \link{engine} does not support generalised linear regressions (e.g. \link{engine_bart}). +unless the specific engine does not support generalised linear regressions (e.g. \link{engine_bart}). } \examples{ \dontrun{ diff --git a/man/add_biodiversity_poipo.Rd b/man/add_biodiversity_poipo.Rd index 1662cc77..4447537c 100644 --- a/man/add_biodiversity_poipo.Rd +++ b/man/add_biodiversity_poipo.Rd @@ -19,12 +19,12 @@ add_biodiversity_poipo( ... ) -\S4method{add_biodiversity_poipo}{BiodiversityDistribution,sf}(x, poipo) +\S4method{add_biodiversity_poipo}{BiodiversityDistribution,sf,ANY,character,ANY,character,ANY,numeric,logical,logical,ANY}(x,poipo,name,field_occurrence,formula,family,link,weight,separate_intercept,docheck,pseudoabsence_settings,...) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} -\item{poipo}{A \code{\link{data.frame}}, \code{\link{sf}} or \code{\link{Spatial}}) object of presence-only point occurrences.} +\item{poipo}{A \code{\link{data.frame}} or \code{\link{sf}} object of presence-only point occurrences.} \item{name}{The name of the biodiversity dataset used as internal identifier.} @@ -38,10 +38,10 @@ add_biodiversity_poipo( \item{weight}{A \code{\link{numeric}} value acting as a multiplier with regards to any weights used in the modelling. Larger weights indicate higher weighting relative to any other datasets. By default set to \code{1} if only -one dataset is added. A \code{\link{vector}} is also supported but must be of the same length as \code{\link{poipo}}. +one dataset is added. A \code{\link{vector}} is also supported but must be of the same length as \code{"poipo"}. \strong{Note: Weights are reformated to the inverse for models with area offsets (e.g. 5 is converted to 1/5).}} -\item{separate_intercept}{A \code{\link{boolean}} value stating whether a separate intercept is to be added in +\item{separate_intercept}{A \code{\link{logical}} value stating whether a separate intercept is to be added in shared likelihood models for engines \link{engine_inla}, \link{engine_inlabru} and \link{engine_stan}. Otherwise ignored.} \item{docheck}{\code{\link{logical}} on whether additional checks should be performed (e.g. intersection tests) (Default: \code{TRUE}).} @@ -83,7 +83,7 @@ x } } \seealso{ -See other functions for adding biodiversity data, e.g. \link{add_biodiversity} +See other functions for adding biodiversity data, i.e. \code{\link{add_biodiversity_poipa}} Other add_biodiversity: \code{\link{add_biodiversity_poipa}()}, diff --git a/man/add_biodiversity_polpa.Rd b/man/add_biodiversity_polpa.Rd index 8d652be2..b39f175c 100644 --- a/man/add_biodiversity_polpa.Rd +++ b/man/add_biodiversity_polpa.Rd @@ -23,12 +23,12 @@ add_biodiversity_polpa( ... ) -\S4method{add_biodiversity_polpa}{BiodiversityDistribution, sf}(x, polpa) +\S4method{add_biodiversity_polpa}{BiodiversityDistribution,sf,ANY,character,ANY,character,ANY,numeric,logical,numeric,ANY,character,logical,logical,ANY}(x,polpa,name,field_occurrence,formula,family,link,weight,simulate,simulate_points,simulate_bias,simulate_strategy,separate_intercept,docheck,pseudoabsence_settings,...) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} -\item{polpa}{A \code{\link{sf}} or \code{\link{Spatial}}) polygon object of presence-absence occurrences.} +\item{polpa}{A \code{\link{sf}} polygon object of presence-absence occurrences.} \item{name}{The name of the biodiversity dataset used as internal identifier.} @@ -42,7 +42,7 @@ add_biodiversity_polpa( \item{weight}{A \code{\link{numeric}} value acting as a multiplier with regards to any weights used in the modelling. Larger weights indicate higher weighting relative to any other datasets. By default set to \code{1} if only -one dataset is added. A \code{\link{vector}} is also supported but must be of the same length as \code{\link{polpa}}.} +one dataset is added. A \code{\link{vector}} is also supported but must be of the same length as \code{"polpa"}.} \item{simulate}{Simulate poipa points within its boundaries. Result are passed to \code{\link{add_biodiversity_poipa}} (Default: \code{FALSE}).} @@ -51,10 +51,10 @@ one dataset is added. A \code{\link{vector}} is also supported but must be of th \item{simulate_bias}{A \code{\link{SpatRaster}} layer describing an eventual preference for simulation (Default: \code{NULL}).} \item{simulate_strategy}{A \code{\link{character}} stating the strategy for sampling. Can be set to either. -\code{'random'} or \code{'regular'}, the latter requiring a raster supplied in the \link{simulate_weights} +\code{'random'} or \code{'regular'}, the latter requiring a raster supplied in the \code{'simulate_weights'} parameter.} -\item{separate_intercept}{A \code{\link{boolean}} value stating whether a separate intercept is to be added in +\item{separate_intercept}{A \code{\link{logical}} value stating whether a separate intercept is to be added in shared likelihood models for engines \link{engine_inla}, \link{engine_inlabru} and \link{engine_stan}.} \item{docheck}{\code{\link{logical}} on whether additional checks should be performed (e.g. intersection tests) (Default: \code{TRUE}).} diff --git a/man/add_biodiversity_polpo.Rd b/man/add_biodiversity_polpo.Rd index 1671ab1a..e4f97ecb 100644 --- a/man/add_biodiversity_polpo.Rd +++ b/man/add_biodiversity_polpo.Rd @@ -23,12 +23,12 @@ add_biodiversity_polpo( ... ) -\S4method{add_biodiversity_polpo}{BiodiversityDistribution,sf}(x, polpo) +\S4method{add_biodiversity_polpo}{BiodiversityDistribution,sf,ANY,character,ANY,character,ANY,numeric,logical,numeric,ANY,character,logical,logical,ANY}(x,polpo,name,field_occurrence,formula,family,link,weight,simulate,simulate_points,simulate_bias,simulate_strategy,separate_intercept,docheck,pseudoabsence_settings,...) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} -\item{polpo}{A \code{\link{sf}} or \code{\link{Spatial}}) polygon object of presence-only occurrences.} +\item{polpo}{A \code{\link{sf}} polygon object of presence-only occurrences.} \item{name}{The name of the biodiversity dataset used as internal identifier.} @@ -42,7 +42,7 @@ add_biodiversity_polpo( \item{weight}{A \code{\link{numeric}} value acting as a multiplier with regards to any weights used in the modelling. Larger weights indicate higher weighting relative to any other datasets. By default set to \code{1} if only -one dataset is added. A \code{\link{vector}} is also supported but must be of the same length as \code{\link{polpo}}.} +one dataset is added. A \code{\link{vector}} is also supported but must be of the same length as \code{"polpo"}.} \item{simulate}{Simulate poipo points within its boundaries. Result are passed to \code{\link{add_biodiversity_poipo}} (Default: \code{FALSE}).} @@ -51,10 +51,10 @@ one dataset is added. A \code{\link{vector}} is also supported but must be of th \item{simulate_bias}{A \code{\link{SpatRaster}} layer describing an eventual preference for simulation (Default: \code{NULL}).} \item{simulate_strategy}{A \code{\link{character}} stating the strategy for sampling. Can be set to either. -\code{'random'} or \code{'regular'}, the latter requiring a raster supplied in the \link{simulate_weights} +\code{'random'} or \code{'regular'}, the latter requiring a raster supplied in the \code{'simulate_weights'} parameter.} -\item{separate_intercept}{A \code{\link{boolean}} value stating whether a separate intercept is to be added in +\item{separate_intercept}{A \code{\link{logical}} value stating whether a separate intercept is to be added in shared likelihood models for engines \link{engine_inla}, \link{engine_inlabru} and \link{engine_stan}.} \item{docheck}{\code{\link{logical}} on whether additional checks should be performed (e.g. intersection tests) (Default: \code{TRUE}).} diff --git a/man/add_constraint.Rd b/man/add_constraint.Rd index 3c12c3be..787f4347 100644 --- a/man/add_constraint.Rd +++ b/man/add_constraint.Rd @@ -4,7 +4,7 @@ \alias{add_constraint} \title{Add a constraint to an existing \code{scenario}} \usage{ -\S4method{add_constraint}{BiodiversityScenario, character}(mod, method) +\S4method{add_constraint}{BiodiversityScenario,character}(mod,method) } \arguments{ \item{mod}{A \code{\link{BiodiversityScenario}} object with specified predictors.} @@ -14,7 +14,7 @@ information.} \item{...}{passed on parameters. See also the specific methods for adding constraints.} -\item{value}{For many dispersal \code{\link{constrain}} this is set as \code{\link{numeric}} value specifying a +\item{value}{For many dispersal \code{"constrain"} this is set as \code{\link{numeric}} value specifying a fixed constrain or constant in units \code{"m"} (Default: \code{NULL}). For kissmig the value needs to give the number of iteration steps (or within year migration steps). For adaptability constraints this parameter specifies the extent (in units of standard deviation) to which extrapolations @@ -47,7 +47,7 @@ Currently supported are the following options: \item \code{sdd_fixed} - Applies a fixed uniform dispersal distance per modelling timestep. \item \code{sdd_nexpkernel} - Applies a dispersal distance using a negative exponential kernel from its origin. \item \code{kissmig} - Applies the kissmig stochastic dispersal model. Requires \code{`kissmig`} package. Applied at each modelling time step. -\item \code{migclim} - Applies the dispersal algorithm MigClim to the modelled objects. Requires \code{\link{MigClim}} package. +\item \code{migclim} - Applies the dispersal algorithm MigClim to the modelled objects. Requires \code{"MigClim"} package. } A comprehensive overview of the benefits of including dispersal constrains in species distribution models @@ -82,7 +82,7 @@ constrain a projection within a certain area (e.g. a species range or an island) # Assumes that a trained 'model' object exists mod <- scenario(model) |> add_predictors(env = predictors, transform = 'scale', derivates = "none") |> - add_constrain_dispersal(method = "kissmig", value = 2, pext = 0.1) |> + add_constraint_dispersal(method = "kissmig", value = 2, pext = 0.1) |> project() } } diff --git a/man/add_constraint_MigClim.Rd b/man/add_constraint_MigClim.Rd index 88ab02ce..9cdfb1ec 100644 --- a/man/add_constraint_MigClim.Rd +++ b/man/add_constraint_MigClim.Rd @@ -4,7 +4,7 @@ \alias{add_constraint_MigClim} \title{Add constrains to the modelled distribution projection using the MigClim approach} \usage{ -\S4method{add_constraint_MigClim}{BiodiversityScenario, character, numeric, numeric, character, numeric, numeric, numeric, numeric, numeric, character}(mod, rcThresholdMode, dispSteps, dispKernel, barrierType, lddFreq, lddRange, iniMatAge, propaguleProdProb, replicateNb, dtmp) +\S4method{add_constraint_MigClim}{BiodiversityScenario,character,numeric,numeric,character,numeric,numeric,numeric,numeric,numeric,character}(mod,rcThresholdMode,dispSteps,dispKernel,barrierType,lddFreq,lddRange,iniMatAge,propaguleProdProb,replicateNb,dtmp) } \arguments{ \item{mod}{A \code{\link{BiodiversityScenario}} object with specified predictors.} @@ -19,7 +19,7 @@ Can be set either to a uniform numeric \link{vector}, e.g. \code{c(1,1,1,1)} or \strong{Depending on the resolution of the raster, this parameter needs to be adapted}} \item{barrierType}{A \link{character} indicating whether any set barrier should be set as \code{'strong'} or \code{'weak'} barriers. -Strong barriers prevent any dispersal across the barrier and weak barriers only do so if the whole \link{dispKernel} length +Strong barriers prevent any dispersal across the barrier and weak barriers only do so if the whole \code{"dispKernel"} length is covered by the barrier (Default: \code{'strong'}).} \item{lddFreq}{\code{\link{numeric}} parameter indicating the frequency of long-distance dispersal (LDD) events. Default is \code{0}, so no long-distance dispersal.} @@ -47,7 +47,7 @@ constrain future projections. For a detailed description of MigClim, please the and the UserGuide. \strong{The default parameters chosen here are suggestions.} } \details{ -The barrier parameter is defined through \link{add_barrier}. +The barrier parameter is defined through \code{"add_barrier"}. } \examples{ \dontrun{ @@ -69,7 +69,7 @@ into species distribution models. R package version 1.6. } } \seealso{ -\code{\link[=MigClim.userGuide]{MigClim.userGuide()}} +\code{"MigClim::MigClim.userGuide()"} } \concept{constrain} \keyword{scenario} diff --git a/man/add_constraint_adaptability.Rd b/man/add_constraint_adaptability.Rd index 6e3e86b8..c72e3e6c 100644 --- a/man/add_constraint_adaptability.Rd +++ b/man/add_constraint_adaptability.Rd @@ -4,7 +4,7 @@ \alias{add_constraint_adaptability} \title{Adds an adaptability constraint to a scenario object} \usage{ -\S4method{add_constraint_adaptability}{BiodiversityScenario, character, character, numeric, numeric}(mod, method, names, value, increment) +\S4method{add_constraint_adaptability}{BiodiversityScenario,character,character,numeric,numeric}(mod,method,names,value,increment) } \arguments{ \item{mod}{A \code{\link{BiodiversityScenario}} object with specified predictors.} @@ -29,6 +29,13 @@ Currently only \code{nichelimit} is implemented, which adds a simple constrain o can be defined through the \code{"value"} parameter. For example by setting it to \code{1} (Default), any projections are constrained to be within the range of at maximum 1 standard deviation from the range of covariates used for model training. +} +\examples{ +\dontrun{ +scenario(fit) |> + add_constraint_adaptability(value = 1) +} + } \seealso{ Other constraint: diff --git a/man/add_constraint_boundary.Rd b/man/add_constraint_boundary.Rd index c74a401d..ead8dbab 100644 --- a/man/add_constraint_boundary.Rd +++ b/man/add_constraint_boundary.Rd @@ -4,9 +4,9 @@ \alias{add_constraint_boundary} \title{Adds a boundary constraint to a scenario object} \usage{ -\S4method{add_constraint_boundary}{BiodiversityScenario, sf, character}(mod, layer, method) +\S4method{add_constraint_boundary}{BiodiversityScenario,sf,character}(mod,layer,method) -\S4method{add_constraint_boundary}{BiodiversityScenario, ANY, character}(mod, layer, method) +\S4method{add_constraint_boundary}{BiodiversityScenario,ANY,character}(mod,layer,method) } \arguments{ \item{mod}{A \code{\link{BiodiversityScenario}} object with specified predictors.} @@ -28,6 +28,13 @@ as was done for model training. The difference to a boundary constraint is that as a hard cut on any projection, while the zones would allow any projection (and other constraints) to be applied within the zone. \strong{Note: Setting a boundary constraint for future projections effectively potentially suitable areas!} +} +\examples{ +\dontrun{ +# Add scenario constraint +scenario(fit) |> add_constraint_boundary(range) +} + } \seealso{ Other constraint: diff --git a/man/add_constraint_connectivity.Rd b/man/add_constraint_connectivity.Rd index 26b59f2e..26fafd12 100644 --- a/man/add_constraint_connectivity.Rd +++ b/man/add_constraint_connectivity.Rd @@ -4,7 +4,7 @@ \alias{add_constraint_connectivity} \title{Adds a connectivity constraint to a scenario object.} \usage{ -\S4method{add_constraint_connectivity}{BiodiversityScenario, character, numeric, ANY}(mod, method, value, resistance) +\S4method{add_constraint_connectivity}{BiodiversityScenario,character,numeric,ANY}(mod,method,value,resistance) } \arguments{ \item{mod}{A \code{\link{BiodiversityScenario}} object with specified predictors.} @@ -12,13 +12,13 @@ \item{method}{A \code{\link{character}} indicating the type of constraints to be added to the scenario. See details for more information.} -\item{value}{For many dispersal \code{\link{constrain}} this is set as \code{\link{numeric}} value specifying a +\item{value}{For many dispersal \code{"constrain"} this is set as \code{\link{numeric}} value specifying a fixed constrain or constant in units \code{"m"} (Default: \code{NULL}). For kissmig the value needs to give the number of iteration steps (or within year migration steps). For adaptability constraints this parameter specifies the extent (in units of standard deviation) to which extrapolations should be performed.} -\item{resistance}{A \code{\link{RasterLayer}} object describing a resistance surface or barrier for use in +\item{resistance}{A \code{\link{SpatRaster}} object describing a resistance surface or barrier for use in connectivity constrains (Default: \code{NULL}).} \item{...}{passed on parameters. See also the specific methods for adding constraints.} diff --git a/man/add_constraint_dispersal.Rd b/man/add_constraint_dispersal.Rd index 0a403295..7e2bd62e 100644 --- a/man/add_constraint_dispersal.Rd +++ b/man/add_constraint_dispersal.Rd @@ -4,7 +4,7 @@ \alias{add_constraint_dispersal} \title{Adds a dispersal constrain to a scenario object.} \usage{ -\S4method{add_constraint_dispersal}{BiodiversityScenario, character, numeric}(mod, method, value) +\S4method{add_constraint_dispersal}{BiodiversityScenario,character,numeric}(mod,method,value) } \arguments{ \item{mod}{A \code{\link{BiodiversityScenario}} object with specified predictors.} @@ -12,7 +12,7 @@ \item{method}{A \code{\link{character}} indicating the type of constraints to be added to the scenario. See details for more information.} -\item{value}{For many dispersal \code{\link{constrain}} this is set as \code{\link{numeric}} value specifying a +\item{value}{For many dispersal \code{"constrain"} this is set as \code{\link{numeric}} value specifying a fixed constrain or constant in units \code{"m"} (Default: \code{NULL}). For kissmig the value needs to give the number of iteration steps (or within year migration steps). For adaptability constraints this parameter specifies the extent (in units of standard deviation) to which extrapolations diff --git a/man/add_control_bias.Rd b/man/add_control_bias.Rd index ffbc8cb3..e4bcbe72 100644 --- a/man/add_control_bias.Rd +++ b/man/add_control_bias.Rd @@ -6,7 +6,7 @@ \usage{ add_control_bias(x, layer, method = "partial", bias_value = NULL, add = TRUE) -\S4method{add_control_bias}{BiodiversityDistribution, SpatRaster}(x, layer) +\S4method{add_control_bias}{BiodiversityDistribution,SpatRaster,character,ANY,logical}(x,layer,method,bias_value,add) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} @@ -40,7 +40,7 @@ across the background. offsets to factor out a specified bias variable. } \note{ -**Covariate transformations applied to other predictors need to ** +\strong{Covariate transformations applied to other predictors need to be applied to bias too.} Another option to consider biases particular in Poisson-point process models is to remove them through an offset. Functionality to do so is available through the \code{\link[=add_offset_bias]{add_offset_bias()}} method. Setting the method to \code{"offset"} will automatically point to this option. diff --git a/man/add_latent_spatial.Rd b/man/add_latent_spatial.Rd index 1f3fd8c5..e8afbae5 100644 --- a/man/add_latent_spatial.Rd +++ b/man/add_latent_spatial.Rd @@ -12,14 +12,14 @@ add_latent_spatial( ... ) -\S4method{add_latent_spatial}{BiodiversityDistribution}(x) +\S4method{add_latent_spatial}{BiodiversityDistribution,character,ANY,logical}(x,method,priors,separate_spde,...) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} \item{method}{A \code{\link{character}} describing what kind of spatial effect is to be added to the model. See details.} -\item{priors}{A \code{\link{Prior-List}} object supplied to the latent effect. Relevant only for \code{\link{engine_inla}} and \code{NULL} equates the use of default priors.} +\item{priors}{A \code{"Prior-List"} object supplied to the latent effect. Relevant only for \code{\link{engine_inla}} and \code{NULL} equates the use of default priors.} \item{separate_spde}{A \code{\link{logical}} parameter indicating whether, in the case of SPDE effects, separate effects for each likelihood are being fitted. Default (\code{FALSE}) uses a copy of the first added likelihood.} @@ -46,14 +46,14 @@ unsupported method for an engine is chosen this is modified to the next similar Available are: -\link{*} \code{"spde"} - stochastic partial differential equation (SPDE) for \code{\link{INLA-engine}} and \code{\link{INLABRU-engine}}. +\link{*} \code{"spde"} - stochastic partial differential equation (SPDE) for \code{\link{engine_inla}} and \code{\link{engine_inlabru}}. SPDE effects aim at capturing the variation of the response variable in space, once all of the covariates are accounted for. Examining the spatial distribution of the spatial error can reveal which covariates might be missing. For example, if elevation is positively correlated with the response variable, but is not included in the model, we could see a higher posterior mean in areas with higher elevation. Note that calculations of SPDE's can be computationally costly. \itemize{ -\item \code{"car"} - conditional autocorrelative errors (CAR) for \code{\link{INLA-engine}}. Not yet implemented in full. +\item \code{"car"} - conditional autocorrelative errors (CAR) for \code{\link{engine_inla}}. Not yet implemented in full. \item \code{"kde"} - additional covariate of the kernel density of input point observations. \item \code{"poly"} - spatial trend correction by adding coordinates as polynominal transformation. Available for all Engines. \item \code{"nnd"} - nearest neighbour distance. This function calculates the euclidean distance from each grid cell diff --git a/man/add_log.Rd b/man/add_log.Rd index 2bb59a04..85bf3cdc 100644 --- a/man/add_log.Rd +++ b/man/add_log.Rd @@ -6,7 +6,7 @@ \usage{ add_log(x, filename) -\S4method{add_log}{BiodiversityDistribution, character}(x, filename) +\S4method{add_log}{BiodiversityDistribution,character}(x,filename) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} diff --git a/man/add_offset.Rd b/man/add_offset.Rd index e195547c..19920e51 100644 --- a/man/add_offset.Rd +++ b/man/add_offset.Rd @@ -6,7 +6,7 @@ \usage{ add_offset(x, layer, add = TRUE) -\S4method{add_offset}{BiodiversityDistribution, SpatRaster}(x, layer) +\S4method{add_offset}{BiodiversityDistribution,SpatRaster,logical}(x,layer,add) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} diff --git a/man/add_offset_bias.Rd b/man/add_offset_bias.Rd index 7386b7bd..bbec62ea 100644 --- a/man/add_offset_bias.Rd +++ b/man/add_offset_bias.Rd @@ -6,7 +6,7 @@ \usage{ add_offset_bias(x, layer, add = TRUE, points = NULL) -\S4method{add_offset_bias}{BiodiversityDistribution, SpatRaster}(x, layer) +\S4method{add_offset_bias}{BiodiversityDistribution,SpatRaster,logical,ANY}(x,layer,add,points) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} diff --git a/man/add_offset_elevation.Rd b/man/add_offset_elevation.Rd index 36e6af61..5ceb9fbb 100644 --- a/man/add_offset_elevation.Rd +++ b/man/add_offset_elevation.Rd @@ -6,7 +6,7 @@ \usage{ add_offset_elevation(x, elev, pref, rate = 0.0089, add = TRUE) -\S4method{add_offset_elevation}{BiodiversityDistribution, SpatRaster, numeric}(x, elev, pref) +\S4method{add_offset_elevation}{BiodiversityDistribution,SpatRaster,numeric,numeric,logical}(x,elev,pref,rate,add) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} @@ -33,7 +33,7 @@ The code here was adapted from the Supporting materials script. Specifically this functions calculates a continuous decay and decreasing probability of a species to occur from elevation limits. It requires a \code{\link{SpatRaster}} with elevation information. A generalized logistic transform (aka Richard's curve) is used to calculate decay from the suitable elevational -areas, with the \code{\link{rate}} parameter allowing to vary the steepness of decline. +areas, with the \code{"rate"} parameter allowing to vary the steepness of decline. Note that all offsets created by this function are by default log-transformed before export. In addition this function also mean-centers the output as recommended by Ellis-Soto et al. diff --git a/man/add_offset_range.Rd b/man/add_offset_range.Rd index 4be6de28..c7d12806 100644 --- a/man/add_offset_range.Rd +++ b/man/add_offset_range.Rd @@ -18,9 +18,9 @@ add_offset_range( add = TRUE ) -\S4method{add_offset_range}{BiodiversityDistribution, SpatRaster}(x, layer) +\S4method{add_offset_range}{BiodiversityDistribution,SpatRaster,ANY,logical}(x,layer,fraction,add) -\S4method{add_offset_range}{BiodiversityDistribution, sf}(x, layer) +\S4method{add_offset_range}{BiodiversityDistribution,sf,numeric,character,numeric,logical,character,character,ANY,ANY,logical}(x,layer,distance_max,family,presence_prop,distance_clip,distance_function,field_occurrence,fraction,point,add) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} @@ -28,7 +28,7 @@ add_offset_range( \item{layer}{A \code{\link{sf}} or \code{\link{SpatRaster}} object with the range for the target feature.} \item{distance_max}{A \code{\link{numeric}} threshold on the maximum distance beyond the range that should be considered -to have a high likelihood of containing species occurrences (Default: \code{Inf} \link{m}). Can be set to \code{NULL} or \code{0} +to have a high likelihood of containing species occurrences (Default: \code{Inf} \code{"m"}). Can be set to \code{NULL} or \code{0} to indicate that no distance should be calculated.} \item{family}{A \code{\link{character}} denoting the type of model to which this offset is to be added. By default @@ -64,8 +64,10 @@ Adds a range offset to a \code{\link{distribution}} object. This function has additional options compared to the more generic \code{\link[=add_offset]{add_offset()}}, allowing customized options specifically for expert-based ranges as offsets or spatialized polygon information on species occurrences. -If even more control is needed, the user is informed of the \pkg{bossMaps} -package Merow et al. (2017). +If even more control is needed, the user is informed of the \code{"bossMaps"} +package Merow et al. (2017). Some functionalities of that package emulated +through the \code{"distance_function"} set to \code{"log"}. This tries to fit +a 5-parameter logistic function to estimate the distance from the range (Merow et al. 2017). } \details{ The output created by this function creates a \code{\link{SpatRaster}} to be added to @@ -73,7 +75,7 @@ a provided distribution object. Offsets in regression models are likelihood specific as they are added directly to the overall estimate of \code{`y^hat`}. Note that all offsets created by this function are by default log-transformed before export. -Background values (e.g. beyond \code{\link{distance_max}}) are set to a very small +Background values (e.g. beyond \code{"distance_max"}) are set to a very small constant (\code{1e-10}). } \examples{ @@ -89,7 +91,7 @@ constant (\code{1e-10}). } } \seealso{ -\code{\link{bossMaps}} +\code{"bossMaps"} Other offset: \code{\link{add_offset_bias}()}, diff --git a/man/add_predictor_elevationpref.Rd b/man/add_predictor_elevationpref.Rd index d1300459..30f1eed9 100644 --- a/man/add_predictor_elevationpref.Rd +++ b/man/add_predictor_elevationpref.Rd @@ -6,7 +6,7 @@ \usage{ add_predictor_elevationpref(x, layer, lower, upper, transform = "none") -\S4method{add_predictor_elevationpref}{BiodiversityDistribution, ANY, numeric, numeric, character}(x, layer, lower, upper, transform) +\S4method{add_predictor_elevationpref}{BiodiversityDistribution,ANY,numeric,numeric,character}(x,layer,lower,upper,transform) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} @@ -22,3 +22,10 @@ add_predictor_elevationpref(x, layer, lower, upper, transform = "none") \description{ Create lower and upper limits for an elevational range and add them as separate predictors } +\examples{ +\dontrun{ +distribution(background) |> + add_predictor_elevationpref(elevation, lower = 200, upper = 1000) +} + +} diff --git a/man/add_predictor_range.Rd b/man/add_predictor_range.Rd index 5d3a0c75..148eab42 100644 --- a/man/add_predictor_range.Rd +++ b/man/add_predictor_range.Rd @@ -13,9 +13,9 @@ add_predictor_range( priors = NULL ) -\S4method{add_predictor_range}{BiodiversityDistribution, SpatRaster}(x, layer) +\S4method{add_predictor_range}{BiodiversityDistribution,SpatRaster,character,ANY,ANY}(x,layer,method,fraction,priors) -\S4method{add_predictor_range}{BiodiversityDistribution, sf}(x, layer) +\S4method{add_predictor_range}{BiodiversityDistribution,sf,character,numeric,ANY,ANY}(x,layer,method,distance_max,fraction,priors) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} @@ -35,7 +35,7 @@ Can be used to for example to remove or reduce the expected value (Default: \cod This function allows to add a species range which is usually drawn by experts in a separate process as spatial explicit prior. Both \code{\link{sf}} and \code{\link{SpatRaster}}-objects are supported as input. -Users are advised to look at the \code{\link{bossMaps}} R-package presented as part of Merow et al. (2017), +Users are advised to look at the \code{"bossMaps"} R-package presented as part of Merow et al. (2017), which allows flexible calculation of non-linear distance transforms from the boundary of the range. Outputs of this package could be added directly to this function. \strong{Note that this function adds the range as predictor and not as offset. For this purpose a separate function \code{\link[=add_offset_range]{add_offset_range()}} exists.} @@ -44,6 +44,13 @@ Additional options allow to include the range either as \code{"binary"} or as \c predictor. The difference being that the range is either directly included as presence-only predictor or alternatively with a linear distance transform from the range boundary. The parameter \code{"distance_max"} can be specified to constrain this distance transform. +} +\examples{ +\dontrun{ +distribution(background) |> + add_predictor_range(range, method = "distance", distance_max = 2) +} + } \references{ \itemize{ diff --git a/man/add_predictors.Rd b/man/add_predictors.Rd index 2d1ecff2..36155c0f 100644 --- a/man/add_predictors.Rd +++ b/man/add_predictors.Rd @@ -19,15 +19,15 @@ add_predictors( ... ) -\S4method{add_predictors}{BiodiversityDistribution,SpatRasterCollection}(x, env) +\S4method{add_predictors}{BiodiversityDistribution,SpatRasterCollection,ANY,character,character,numeric,ANY,logical,logical,logical,ANY}(x,env,names,transform,derivates,derivate_knots,int_variables,bgmask,harmonize_na,explode_factors,priors,...) -\S4method{add_predictors}{BiodiversityDistribution, SpatRaster}(x, env) +\S4method{add_predictors}{BiodiversityDistribution,SpatRaster,ANY,character,character,numeric,ANY,logical,logical,logical,ANY}(x,env,names,transform,derivates,derivate_knots,int_variables,bgmask,harmonize_na,explode_factors,priors,...) -\S4method{add_predictors}{BiodiversityDistribution, stars}(x, env) +\S4method{add_predictors}{BiodiversityDistribution,stars,ANY,character,character,numeric,ANY,logical,logical,logical,ANY}(x,env,names,transform,derivates,derivate_knots,int_variables,bgmask,harmonize_na,explode_factors,priors,...) -\S4method{add_predictors}{BiodiversityScenario, SpatRaster}(x, env) +\S4method{add_predictors}{BiodiversityScenario,SpatRaster,ANY,character,character,numeric,ANY,logical}(x,env,names,transform,derivates,derivate_knots,int_variables,harmonize_na,...) -\S4method{add_predictors}{BiodiversityScenario, stars}(x, env) +\S4method{add_predictors}{BiodiversityScenario,stars,ANY,character,character,numeric,ANY,logical}(x,env,names,transform,derivates,derivate_knots,int_variables,harmonize_na,...) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} @@ -66,7 +66,7 @@ them through a principal component analysis (\link{prcomp}). In contrast, deriva the original provided predictors alone, but instead create new ones, for instance by transforming their values through a quadratic or hinge transformation. Note that this effectively increases the number of predictors in the object, generally requiring stronger regularization by -the used \code{\link{engine}}. +the used \code{\link{Engine}}. Both transformations and derivates can also be combined. Available options for transformation are: \itemize{ @@ -92,13 +92,13 @@ Available options for creating derivates are: } \note{ \strong{Important:} -Not every \code{\link{engine}} supported by the \pkg{ibis.iSDM} R-package allows missing data points +Not every \code{\link{Engine}} supported by the \pkg{ibis.iSDM} R-package allows missing data points among extracted covariates. Thus any observation with missing data is generally removed prior from model fitting. Thus ensure that covariates have appropriate no-data settings (for instance setting \code{NA} values to \code{0} or another out of range constant). Not every engine does actually need covariates. For instance it is perfectly legit -to fit a model with only occurrence data and a spatial latent effect (\link{add_latent}). +to fit a model with only occurrence data and a spatial latent effect (\link{add_latent_spatial}). This correspondents to a spatial kernel density estimate. Certain names such \code{"offset"} are forbidden as predictor variable names. The function diff --git a/man/add_predictors_globiom.Rd b/man/add_predictors_globiom.Rd index 34f0976e..ad68eb91 100644 --- a/man/add_predictors_globiom.Rd +++ b/man/add_predictors_globiom.Rd @@ -18,9 +18,9 @@ add_predictors_globiom( ... ) -\S4method{add_predictors_globiom}{BiodiversityDistribution, character}(x, fname) +\S4method{add_predictors_globiom}{BiodiversityDistribution,character,ANY,character,character,numeric,ANY,logical,logical,ANY}(x,fname,names,transform,derivates,derivate_knots,int_variables,bgmask,harmonize_na,priors,...) -\S4method{add_predictors_globiom}{BiodiversityScenario, character}(x, fname) +\S4method{add_predictors_globiom}{BiodiversityScenario,character,ANY,character,character,numeric,ANY,logical}(x,fname,names,transform,derivates,derivate_knots,int_variables,harmonize_na,...) } \arguments{ \item{x}{A \code{\linkS4class{BiodiversityDistribution}} or \code{\linkS4class{BiodiversityScenario}} object.} diff --git a/man/add_priors.Rd b/man/add_priors.Rd index cc1f6a2d..2a740192 100644 --- a/man/add_priors.Rd +++ b/man/add_priors.Rd @@ -6,7 +6,7 @@ \usage{ add_priors(x, priors = NULL, ...) -\S4method{add_priors}{BiodiversityDistribution, ANY}(x, priors) +\S4method{add_priors}{BiodiversityDistribution,ANY}(x,priors,...) } \arguments{ \item{x}{\link{distribution} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} @@ -26,7 +26,10 @@ via \link{add_predictors} } \examples{ \dontrun{ - x <- distribution(background) + pp <- GLMNETPrior("forest") + x <- distribution(background) |> + add_priors(pp) + } } \seealso{ diff --git a/man/bivplot.Rd b/man/bivplot.Rd index 6187cf08..e3eac335 100644 --- a/man/bivplot.Rd +++ b/man/bivplot.Rd @@ -15,7 +15,7 @@ bivplot( ... ) -\S4method{bivplot}{ANY}(mod) +\S4method{bivplot}{ANY,character,character,logical,ANY,ANY,character}(mod,xvar,yvar,plot,fname,title,col,...) } \arguments{ \item{mod}{A trained \code{\link{DistributionModel}} or alternatively a \code{\link{SpatRaster}} object with \code{prediction} model within.} @@ -47,7 +47,7 @@ In particular Bayesian engines can produce not only mean estimates of fitted res but also pixel-based estimates of uncertainty from the posterior such as the standard deviation (SD) or the coefficient of variation of a given prediction. -This function makes use of the \code{\link{biscale}} R-package to create bivariate plots of the fitted distribution object, +This function makes use of the \code{"biscale"} R-package to create bivariate plots of the fitted distribution object, allowing to visualize two variables at once. It is mostly thought of as a convenience function to create such bivariate plots for quick visualization. diff --git a/man/check.Rd b/man/check.Rd new file mode 100644 index 00000000..4a543a89 --- /dev/null +++ b/man/check.Rd @@ -0,0 +1,64 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/check.R +\name{check} +\alias{check} +\title{Check objects in the package for common errors or issues} +\usage{ +check(obj, stoponwarning = FALSE) + +\S4method{check}{ANY,logical}(obj,stoponwarning) +} +\arguments{ +\item{obj}{A \code{\link{BiodiversityDistribution}}, \code{\link{DistributionModel}} or \code{\link{BiodiversityScenario}} object.} + +\item{stoponwarning}{\code{\link{logical}} Should check return a stop if warning is raised? (Default: \code{FALSE}).} +} +\value{ +Message outputs +} +\description{ +Not always is there enough data or sufficient information to robustly +infer the suitable habitat or niche of a species. As many SDM algorithms are +essentially regression models, similar assumptions about model convergence, +homogeneity of residuals and inferrence usually apply (although often +ignored). +This function simply checks the respective input object for common issues or +mistakes. +} +\details{ +Different checks are implemented depending on the supplied object +\itemize{ +\item \code{\link{BiodiversityDistribution}} +} +\itemize{ +\item Checks if there are less than 200 observations +\item TODO: Add rm_insufficient_covs link +} +\itemize{ +\item \code{\link{DistributionModel}} +} +\itemize{ +\item Check model convergence +\item Check if model is found +\item Check if coefficients exist +\item Check if there are unusal outliers in prediction (using 10median absolute deviation) +\item Check if threshold is larger than layer +} +\itemize{ +\item \code{\link{BiodiversityScenario}} +} +\itemize{ +\item +} +} +\note{ +This function will likely be expanded with additional checks in the future. +If you have ideas, please let them know per issue. +} +\examples{ +\dontrun{ + # Where mod is an estimated DistributionModel + check(mod) +} +} +\keyword{misc} diff --git a/man/clamp_predictions.Rd b/man/clamp_predictions.Rd index d8d9caa5..555a24ed 100644 --- a/man/clamp_predictions.Rd +++ b/man/clamp_predictions.Rd @@ -21,7 +21,7 @@ This function takes an internal model matrix and restricts the values seen in th to those observed during training. } \note{ -This function is meant to be used within a certain \code{\link{engine}} or within \code{\link{project}}. +This function is meant to be used within a certain \code{"engine"} or within \code{\link{project}}. } \references{ Phillips, S. J., Anderson, R. P., Dudík, M., Schapire, R. E., & Blair, M. E. (2017). Opening the black box: An open-source release of Maxent. Ecography. https://doi.org/10.1111/ecog.03049 diff --git a/man/distribution.Rd b/man/distribution.Rd index feb04de8..68e63f8e 100644 --- a/man/distribution.Rd +++ b/man/distribution.Rd @@ -4,18 +4,33 @@ \alias{distribution} \title{Create distribution modelling procedure} \usage{ -distribution(background, limits = NULL) +distribution( + background, + limits = NULL, + limits_method = "none", + mcp_buffer = 0, + limits_clip = FALSE +) -\S4method{distribution}{SpatRaster, ANY}(background, limits) +\S4method{distribution}{SpatRaster,ANY,character,numeric,logical}(background,limits,limits_method,mcp_buffer,limits_clip) -\S4method{distribution}{sf, ANY}(background, limits) +\S4method{distribution}{sf,ANY,character,numeric,logical}(background,limits,limits_method,mcp_buffer,limits_clip) } \arguments{ \item{background}{Specification of the modelling background. Must be a -\code{\link{SpatRaster}}, \code{\link{sf}} or \code{\link{extent}} object.} +\code{\link{SpatRaster}} or \code{\link{sf}} object.} \item{limits}{A \code{\link{SpatRaster}} or \code{\link{sf}} object that limits the prediction surface when intersected with input data (Default: \code{NULL}).} + +\item{limits_method}{A \code{\link{character}} of the method used for hard limiting a projection. +Available options are \code{"none"} (Default), \code{"zones"} or \code{"mcp"}.} + +\item{mcp_buffer}{A \code{\link{numeric}} distance to buffer the mcp (Default \code{0}). Only used if +\code{"mcp"} is used.} + +\item{limits_clip}{\code{\link{logical}} Should the limits clip all predictors before fitting +a model (\code{TRUE}) or just the prediction (\code{FALSE}, default).} } \value{ \code{\linkS4class{BiodiversityDistribution}} object containing data for building a biodiversity distribution modelling problem. @@ -23,38 +38,42 @@ intersected with input data (Default: \code{NULL}).} \description{ This function creates an object that contains all the data, parameters and settings for building an (integrated) species distribution model. -Key functions to add data are \code{\link{add_biodiversity_*}}, \code{\link[=add_predictors]{add_predictors()}}, -\code{\link{add_latent_*}}, \code{\link{engine_*}}, \code{\link{add_priors}} and \code{\link{add_offset}}. It creates a -prototype \code{\link{BiodiversityDistribution}} object with its own functions. +Key functions to add data are \code{\link{add_biodiversity_poipo}} and the like, \code{\link{add_predictors}}, +\code{\link{add_latent_spatial}}, \code{\link{engine_glmnet}} or similar, \code{\link{add_priors}} and \code{\link{add_offset}}. +It creates a prototype \code{\link{BiodiversityDistribution}} object with its own functions. After setting input data and parameters, model predictions can then be created via the \link{train} function and predictions be created. Additionally, it is possible to specify a \code{"limit"} to any predictions conducted on the background. This can be for instance a buffered layer by a certain dispersal distance (Cooper and Soberon, 2018) -or a categorical layer representing biomes or soil conditions. See also the -frequently asked question (FAQ) section on the homepage for more information. +or a categorical layer representing biomes or soil conditions. +Another option is to create a constraint by constructing a minimum convex polygon (MCP) using +the supplied biodiversity data. This option can be enabled by setting +\code{"limits_method"} to \code{"mcp"}. It is also possible to provide a small buffer +to constructed MCP that way. +See the frequently asked question (FAQ) section on the homepage for more information. See \strong{Details} for a description of the internal functions available to modify or summarize data within the created object. -\strong{Note that any model requires at minimum a single added \link{biodiversity} dataset -as well as a specified \link{engine}.} +\strong{Note that any model requires at minimum a single added biodiversity dataset +as well as a specified engine.} } \details{ This function creates a \code{\linkS4class{BiodiversityDistribution}} object that in itself contains other functions and stores parameters and (pre-)processed data. -A full list of functions available can be queried via \code{names(object)}. +A full list of functions available can be queried via \code{"names(object)"}. Some of the functions are not intended to be manipulated directly, -but rather through convenience functions (e.g. \code{\link[=object$set_predictors]{object$set_predictors()}}). +but rather through convenience functions (e.g. \code{"object$set_predictors()"}). Similarly other objects are stored in the \code{\linkS4class{BiodiversityDistribution}} object that -have their own functions as well and can be queried (e.g. \code{\link{names(object)}}). For a list of +have their own functions as well and can be queried (e.g. \code{"names(object)"}). For a list of functions see the reference documentation. By default, -if some datasets are not set, then a \code{\link{Waiver}} object is returned instead. +if some datasets are not set, then a \code{"Waiver"} object is returned instead. The following objects can be stored: \itemize{ \item \code{object$biodiversity} A \code{\link{BiodiversityDatasetCollection}} object with the added biodiversity data. -\item \code{object$engine} An \code{\link{engine}} object (e.g. \link{engine_inlabru}) with function depended on the added engine. +\item \code{object$engine} An \code{"engine"} object (e.g. \code{\link[=engine_inlabru]{engine_inlabru()}}) with function depended on the added engine. \item \code{object$predictors} A \code{\link{PredictorDataset}} object with all set predictions. \item \code{object$priors} A \code{\link{PriorList}} object with all specified priors. \item \code{object$log} A \code{\link{Log}} object that captures. @@ -66,8 +85,8 @@ Useful high-level functions to address those objects are for instance: \item \code{object$get_biodiversity_equations()} Lists the equations used for each biodiversity dataset with given id. Defaults to all predictors. \item \code{object$get_biodiversity_types()} Lists the type of each specified biodiversity dataset with given id. \item \code{object$get_extent()} Outputs the \link[terra:ext]{terra::ext} of the modelling region. -\item \code{object$show_background_info()} Returns a \code{\link{list}} with the \link[terra:ext]{terra::ext} and the \link[sp:is.projected]{sp::proj4string}. -\item \code{object$get_extent_dimensions()} Outputs the \link[terra:ext]{terra::ext} dimension by calling the \code{\link[=extent_dimensions]{extent_dimensions()}} function. +\item \code{object$show_background_info()} Returns a \code{\link{list}} with the \link[terra:ext]{terra::ext} and the \link[terra:crs]{terra::crs}. +\item \code{object$get_extent_dimensions()} Outputs the \link[terra:ext]{terra::ext} dimension by calling the \code{"extent_dimensions()"} function. \item \code{object$get_predictor_names()} Returns a \link{character} vector with the names of all added predictors. \item \code{object$get_prior_variables()} Returns a description of \code{\link{priors}} added. } @@ -93,5 +112,5 @@ There are other functions as well but those are better accessed through their re } } \seealso{ -\code{\link{bdproto}} on the general definition of \code{\link{proto}} objects and in particular \code{\link{bdproto-biodiversitydistribution}}. +\code{"bdproto"} on the general definition of \code{\link{proto}} objects and in particular \code{\link{BiodiversityDistribution}}. } diff --git a/man/effects.Rd b/man/effects.Rd index 58d21e89..e662dca7 100644 --- a/man/effects.Rd +++ b/man/effects.Rd @@ -17,7 +17,7 @@ None. } \description{ This functions is handy wrapper that calls the default plotting -functions for the model of a specific \link{engine}. Equivalent to +functions for the model of a specific engine. Equivalent to calling \code{effects} of a fitted \link{distribution} function. } \note{ diff --git a/man/emptyraster.Rd b/man/emptyraster.Rd index c3da98e5..87528fff 100644 --- a/man/emptyraster.Rd +++ b/man/emptyraster.Rd @@ -23,5 +23,4 @@ require(terra) r <- rast(matrix(1:100, 5, 20)) emptyraster(r) } -\keyword{terra,} \keyword{utils} diff --git a/man/engine_bart.Rd b/man/engine_bart.Rd index 04f5679c..b947b1ef 100644 --- a/man/engine_bart.Rd +++ b/man/engine_bart.Rd @@ -20,7 +20,7 @@ engine_bart(x, iter = 1000, nburn = 250, chains = 4, type = "response", ...) \item{...}{Other options.} } \value{ -An \link{engine}. +An \link{Engine}. } \description{ The Bayesian regression approach to a sum of complementary trees is to shrink @@ -29,9 +29,9 @@ non-linear highly flexible estimation and have been shown to compare favourable algorithms (Dorie et al. 2019). Default prior preference is for trees to be small (few terminal nodes) and shrinkage towards \code{0}. -This package requires the \link{dbarts} R-package to be installed. -Many of the functionalities of this \link{engine} have been inspired by the \link{embarcadero} R-package. Users -are therefore advised to cite if they make heavy use of BART. +This package requires the \code{"dbarts"} R-package to be installed. +Many of the functionalities of this engine have been inspired by the \code{"embarcadero"} R-package. +Users are therefore advised to cite if they make heavy use of BART. } \details{ Prior distributions can furthermore be set for: diff --git a/man/engine_breg.Rd b/man/engine_breg.Rd index 24d7750d..766a1bd5 100644 --- a/man/engine_breg.Rd +++ b/man/engine_breg.Rd @@ -24,7 +24,7 @@ engine_breg( \item{...}{Other none specified parameters passed on to the model.} } \value{ -An \link{engine}. +An \link{Engine}. } \description{ Efficient MCMC algorithm for linear regression models that makes use of diff --git a/man/engine_gdb.Rd b/man/engine_gdb.Rd index 3b29f08b..f18245db 100644 --- a/man/engine_gdb.Rd +++ b/man/engine_gdb.Rd @@ -28,25 +28,25 @@ Available options are \code{'inbag'}, \code{'oobag'} and \code{'none'}. (Default \item{...}{Other variables or control parameters} } \value{ -An\link{engine}. +An engine. } \description{ Gradient descent boosting is an efficient way to optimize any loss function -of a generalized linear or additive model (such as the GAMs available through the \link{mgcv} R-package). +of a generalized linear or additive model (such as the GAMs available through the \code{"mgcv"} R-package). It furthermore automatically regularizes the fit, thus the resulting model only contains the covariates whose baselearners have some influence on the response. -Depending on the type of the \link{add_biodiversity} data, either poisson process models or -logistic regressions are estimated. If the \code{only_linear} term in \link{train} is set to \code{FALSE}, +Depending on the type of the \code{add_biodiversity} data, either poisson process models or +logistic regressions are estimated. If the \code{"only_linear"} term in \link{train} is set to \code{FALSE}, splines are added to the estimation, thus providing a non-linear additive inference. } \details{ : -This package requires the \link{mboost} R-package to be installed. -It is in philosophy somewhat related to the \link{engine_xgboost} and \link{XGBoost} R-package, +This package requires the \code{"mboost"} R-package to be installed. +It is in philosophy somewhat related to the \link{engine_xgboost} and \code{"XGBoost"} R-package, however providing some additional desirable features that make estimation quicker and particularly useful for spatial projections. Such as for instance the ability to specifically add -spatial baselearners via \link{add_latent} or the specification of monotonically constrained priors -via \link{GDBPrior}. +spatial baselearners via \link{add_latent_spatial} or the specification of +monotonically constrained priors via \link{GDBPrior}. } \examples{ \dontrun{ diff --git a/man/engine_glmnet.Rd b/man/engine_glmnet.Rd index 1c502696..4bb97da8 100644 --- a/man/engine_glmnet.Rd +++ b/man/engine_glmnet.Rd @@ -29,7 +29,7 @@ determined deterministically (Default: \code{NULL}).} \item{...}{Other parameters passed on to glmnet.} } \value{ -An \link{engine}. +An \link{Engine}. } \description{ This engine allows the estimation of linear coefficients using either ridge, lasso or elastic net regressions techniques. diff --git a/man/engine_inla.Rd b/man/engine_inla.Rd index 13e16a1b..ab00097b 100644 --- a/man/engine_inla.Rd +++ b/man/engine_inla.Rd @@ -28,7 +28,7 @@ engine_inla( \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} -\item{optional_mesh}{A directly supplied \code{\link{INLA}} mesh (Default: \code{NULL})} +\item{optional_mesh}{A directly supplied \code{"INLA"} mesh (Default: \code{NULL})} \item{optional_projstk}{A directly supplied projection stack. Useful if projection stack is identical for multiple species (Default: \code{NULL})} @@ -69,11 +69,11 @@ Either summarizing the linear \code{"predictor"} or \code{"response"} (Default: \item{...}{Other options.} } \value{ -An \link{engine}. +An engine. } \description{ Allows a full Bayesian analysis of linear and additive models using Integrated Nested Laplace approximation. -Engine has been largely superceded by the \link{engine_bru} package and users are advised to us this one, +Engine has been largely superceded by the \code{\link{engine_inlabru}} package and users are advised to us this one, unless specific options are required. } \details{ diff --git a/man/engine_inlabru.Rd b/man/engine_inlabru.Rd index dba697e6..525643ee 100644 --- a/man/engine_inlabru.Rd +++ b/man/engine_inlabru.Rd @@ -25,7 +25,7 @@ engine_inlabru( \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} -\item{optional_mesh}{A directly supplied \code{\link{INLA}} mesh (Default: \code{NULL})} +\item{optional_mesh}{A directly supplied \code{"INLA"} mesh (Default: \code{NULL})} \item{max.edge}{The largest allowed triangle edge length, must be in the same scale units as the coordinates. Default is an educated guess (Default: \code{NULL}).} @@ -52,16 +52,16 @@ Default is an educated guess (Default: \code{NULL}).} \item{...}{Other variables} } \value{ -An \link{engine}. +An \link{Engine}. } \description{ Model components are specified with general inputs and mapping methods to the latent variables, and the predictors are specified via general R expressions, with separate expressions for each observation likelihood model in multi-likelihood models. -The inlabru engine - similar as the \code{\link{engine_inla}} function acts a wrapper for \link[INLA:inla]{INLA::inla}, -albeit \link{inlabru} has a number of convenience functions implemented that make in particular predictions +The inlabru engine - similar as the \code{\link{engine_inla}} function acts a wrapper for INLA, +albeit \code{"inlabru"} has a number of convenience functions implemented that make in particular predictions with new data much more straight forward (e.g. via posterior simulation instead of fitting). -Since more recent versions \link{inlabru} also supports the addition of multiple likelihoods, therefore +Since more recent versions \code{"inlabru"} also supports the addition of multiple likelihoods, therefore allowing full integrated inference. } \details{ diff --git a/man/engine_stan.Rd b/man/engine_stan.Rd index d2bf6052..0a771d9b 100644 --- a/man/engine_stan.Rd +++ b/man/engine_stan.Rd @@ -29,22 +29,22 @@ If step-size adaptation is on (Default: \code{TRUE}), this also controls the num adaptation is run (and hence these warmup samples should not be used for inference). The number of warmup iterations should be smaller than \code{iter} and the default is \code{iter/2}.} -\item{init}{Initial values for parameters (Default: \code{'random'}). Can also be specified as \link{list} (see: \code{\link[rstan:stan]{rstan::stan}})} +\item{init}{Initial values for parameters (Default: \code{'random'}). Can also be specified as \link{list} (see: \code{"rstan::stan"})} \item{cores}{If set to NULL take values from specified ibis option \code{getOption('ibis.nthread')}.} \item{algorithm}{Mode used to sample from the posterior. Available options are \code{"sampling"}, \code{"optimize"}, or \code{"variational"}. -See \code{\link{cmdstanr}} package for more details. (Default: \code{"sampling"}).} +See \code{"cmdstanr"} package for more details. (Default: \code{"sampling"}).} -\item{control}{See \code{\link[rstan:stan]{rstan::stan}} for more details on specifying the controls.} +\item{control}{See \code{"rstan::stan"} for more details on specifying the controls.} \item{type}{The mode used for creating posterior predictions. Either summarizing the linear \code{"predictor"} or \code{"response"} (Default: \code{"response"}).} \item{...}{Other variables} } \value{ -An \link{engine}. +An \link{Engine}. } \description{ Stan is probabilistic programming language that can be used to @@ -53,7 +53,7 @@ Stan provides full Bayesian inference for continuous-variable models through Mar such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Stan code has to be written separately and this function acts as compiler to build the stan-model. -\strong{Requires the \link{cmdstanr} package to be installed!} +\strong{Requires the \code{"cmdstanr"} package to be installed!} } \details{ By default the posterior is obtained through sampling, however stan also supports @@ -76,7 +76,7 @@ x <- distribution(background) |> engine_stan(iter = 1000) } } \seealso{ -\link{rstan}, \link{cmdstanr} +rstan, cmdstanr Other engine: \code{\link{engine_bart}()}, diff --git a/man/engine_xgboost.Rd b/man/engine_xgboost.Rd index 58f28b9b..68c715e5 100644 --- a/man/engine_xgboost.Rd +++ b/man/engine_xgboost.Rd @@ -31,7 +31,7 @@ engine_xgboost( Lower values generally being better but also computationally more costly. (Default: \code{1e-3})} \item{gamma}{\code{\link{numeric}} A regularization parameter in the model. Lower values for better estimates (Default: \code{3}). -Also see \link{reg_lambda} parameter for the L2 regularization on the weights} +Also see \code{"reg_lambda"} parameter for the L2 regularization on the weights} \item{reg_lambda}{\code{\link{numeric}} L2 regularization term on weights (Default: \code{0}).} @@ -51,7 +51,7 @@ tresting dataset (Default: \code{0.75}).} \item{...}{Other none specified parameters.} } \value{ -An \link{engine}. +An \link{Engine}. } \description{ Allows to estimate eXtreme gradient descent boosting for tree-based or linear boosting regressions. diff --git a/man/ensemble.Rd b/man/ensemble.Rd index 1e32d1ae..55118f74 100644 --- a/man/ensemble.Rd +++ b/man/ensemble.Rd @@ -4,7 +4,7 @@ \alias{ensemble} \title{Function to create an ensemble of multiple fitted models} \usage{ -\S4method{ensemble}{ANY, character, numeric, numeric, character, logical, character}(..., method, weights, min.value, layer, normalize, uncertainty) +\S4method{ensemble}{ANY,character,numeric,numeric,character,logical,character}(...,method,weights,min.value,layer,normalize,uncertainty) } \arguments{ \item{...}{Provided \code{\link{DistributionModel}} objects.} @@ -63,7 +63,7 @@ other summary statistics from the posterior (e.g. \code{'sd'}) can be specified. } \note{ If a list is supplied, then it is assumed that each entry in the list is a fitted \code{\link{DistributionModel}} object. -Take care not to create an ensemble of models constructed with different link functions, e.g. \link{logistic} vs \link{log}. In this case +Take care not to create an ensemble of models constructed with different link functions, e.g. logistic vs \link{log}. In this case the \code{"normalize"} parameter has to be set. } \examples{ diff --git a/man/ensemble_partial.Rd b/man/ensemble_partial.Rd index fb8ad87d..81994837 100644 --- a/man/ensemble_partial.Rd +++ b/man/ensemble_partial.Rd @@ -4,7 +4,7 @@ \alias{ensemble_partial} \title{Function to create an ensemble of partial effects from multiple models} \usage{ -\S4method{ensemble_partial}{ANY, character, character, character, logical}(..., x.var, method, layer, normalize) +\S4method{ensemble_partial}{ANY,character,character,character,logical}(...,x.var,method,layer,normalize) } \arguments{ \item{...}{Provided \code{\link{DistributionModel}} objects from which partial responses can be called. In the future provided data.frames might be supported as well.} @@ -41,7 +41,7 @@ Possible options for creating an ensemble includes: } \note{ If a list is supplied, then it is assumed that each entry in the list is a fitted \code{\link{DistributionModel}} object. -Take care not to create an ensemble of models constructed with different link functions, e.g. \link{logistic} vs \link{log}. +Take care not to create an ensemble of models constructed with different link functions, e.g. logistic vs \link{log}. By default the response functions of each model are normalized. } \examples{ diff --git a/man/explode_factorized_raster.Rd b/man/explode_factorized_raster.Rd deleted file mode 100644 index f78a3a38..00000000 --- a/man/explode_factorized_raster.Rd +++ /dev/null @@ -1,19 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/utils-spatial.R -\name{explode_factorized_raster} -\alias{explode_factorized_raster} -\title{Split raster factor levels to stack} -\usage{ -explode_factorized_raster(ras, name = NULL, ...) -} -\description{ -@description Takes a single raster that is a \code{\link{factor}} and creates -a new \code{\link{SpatRaster}} that contains the individual levels. -@param ras A \code{\link{SpatRaster}} object that is a \code{\link{factor}}. Alternatively a \code{\link{SpatRaster}} object -can be supplied in which only factor variables are 'exploded'. -@param name An optional \code{\link{character}} name for the \code{\link{SpatRaster}}. -@param ... Other parameters (not used). -@returns A \code{\link{SpatRaster}} object. -@keywords utils, internal -@noRd -} diff --git a/man/formatGLOBIOM.Rd b/man/formatGLOBIOM.Rd index 52e960ee..89a4fe21 100644 --- a/man/formatGLOBIOM.Rd +++ b/man/formatGLOBIOM.Rd @@ -10,6 +10,8 @@ formatGLOBIOM( ignore = NULL, period = "all", template = NULL, + shares_to_area = FALSE, + use_gdalutils = FALSE, verbose = getOption("ibis.setupmessages") ) } @@ -26,6 +28,10 @@ and \code{"all"} for all entries (Default: \code{"reference"}).} \item{template}{An optional \code{\link{SpatRaster}} object towards which projects should be transformed.} +\item{shares_to_area}{A \code{\link{logical}} on whether shares should be corrected to areas (if identified).} + +\item{use_gdalutils}{(Deprecated) \code{\link{logical}} on to use gdalutils hack around.} + \item{verbose}{\code{\link{logical}} on whether to be chatty.} } \value{ diff --git a/man/get_priors.Rd b/man/get_priors.Rd index 8c36fdce..afa74da0 100644 --- a/man/get_priors.Rd +++ b/man/get_priors.Rd @@ -6,7 +6,7 @@ \usage{ get_priors(mod, target_engine, ...) -\S4method{get_priors}{ANY, character}(mod, target_engine) +\S4method{get_priors}{ANY,character}(mod,target_engine,...) } \arguments{ \item{mod}{A fitted \code{\linkS4class{DistributionModel}} object. If instead a \code{\linkS4class{BiodiversityDistribution}} object diff --git a/man/get_rastervalue.Rd b/man/get_rastervalue.Rd index ff117147..7d7c0c2c 100644 --- a/man/get_rastervalue.Rd +++ b/man/get_rastervalue.Rd @@ -7,7 +7,7 @@ get_rastervalue(coords, env, ngb_fill = TRUE, rm.na = FALSE) } \arguments{ -\item{coords}{A \code{\link{Spatial}}, \code{\link{data.frame}}, \code{\link{matrix}} or \code{\link{sf}} object.} +\item{coords}{A \code{\link{data.frame}}, \code{\link{matrix}} or \code{\link{sf}} object.} \item{env}{A \code{\link{SpatRaster}} object with the provided predictors.} diff --git a/man/ibis_options.Rd b/man/ibis_options.Rd index b70dbc59..b5b78aac 100644 --- a/man/ibis_options.Rd +++ b/man/ibis_options.Rd @@ -21,8 +21,6 @@ Currently supported are: } } \examples{ -\dontrun{ ibis_options() } -} \keyword{misc} diff --git a/man/limiting.Rd b/man/limiting.Rd new file mode 100644 index 00000000..88ac52b7 --- /dev/null +++ b/man/limiting.Rd @@ -0,0 +1,44 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/limiting.R +\name{limiting} +\alias{limiting} +\title{Identify local limiting factor} +\usage{ +limiting(mod, plot = TRUE) + +\S4method{limiting}{ANY,logical}(mod,plot) +} +\arguments{ +\item{mod}{A fitted \code{'DistributionModel'} object from which limited factors are to +be identified.} + +\item{plot}{Should the result be plotted? (Default: \code{TRUE}).} +} +\value{ +A \code{terra} object of the most important variable for a given grid cell. +} +\description{ +Calculates a \code{\link{SpatRaster}} of locally limiting factors from a given projected model. +To calculate this first the \code{\link{spartial}} effect of each individual covariate +in the model is calculated. + +The effect is estimated as that variable most responsible for decreasing suitability +at that cell. The decrease in suitability is calculated, +for each predictor in turn, relative to thesuitability +that would be achieved if that predictor took the value equal to the mean +The predictor associated with the largest decrease in suitability is +the most limiting factor. +} +\examples{ +\dontrun{ +o <- limiting(fit) +plot(o) +} +} +\references{ +\itemize{ +\item Elith, J., Kearney, M. and Phillips, S. (2010), \href{http://doi.org/10.1111/j.2041-210X.2010.00036.x}{The art of modelling range-shifting species}. \emph{Methods in Ecology and Evolution}, 1: 330-342. doi: 10.1111/j.2041-210X.2010.00036.x +} +} +\concept{Partly inspired by the rmaxent package.} +\keyword{partial} diff --git a/man/load_model.Rd b/man/load_model.Rd index 8c7f4dde..17d9097d 100644 --- a/man/load_model.Rd +++ b/man/load_model.Rd @@ -4,7 +4,7 @@ \alias{load_model} \title{Load a pre-computed model} \usage{ -\S4method{load_model}{character, logical}(fname, verbose) +\S4method{load_model}{character,logical}(fname,verbose) } \arguments{ \item{fname}{A \code{\link{character}} depicting an output filename.} diff --git a/man/new_id.Rd b/man/new_id.Rd index fb65711c..b086e591 100644 --- a/man/new_id.Rd +++ b/man/new_id.Rd @@ -2,13 +2,13 @@ % Please edit documentation in R/identifier.R \name{new_id} \alias{new_id} -\alias{Id} +\alias{Id,} \title{Identifier} \usage{ new_id() } \value{ -\code{\link{Id}} object. +\code{"Id"} object. } \description{ Generate a new unique identifier. diff --git a/man/partial.Rd b/man/partial.Rd index 9fa8c61f..fbfdbb3e 100644 --- a/man/partial.Rd +++ b/man/partial.Rd @@ -16,7 +16,7 @@ partial( ... ) -\S4method{partial}{ANY,character}(mod, x.var) +\S4method{partial}{ANY,character,ANY,numeric,ANY,logical,character}(mod,x.var,constant,variable_length,values,plot,type,...) \method{partial}{DistributionModel}(mod, ...) } @@ -42,7 +42,7 @@ than \code{NULL}, the parameter \code{"variable_length"} is ignored (Default: \c A \link{data.frame} with the created partial response. } \description{ -Create a partial response or effect plot of a trained model +Create a partial response or effect plot of a trained model. } \details{ By default the mean is calculated across all parameters that are not \code{x.var}. diff --git a/man/partial_density.Rd b/man/partial_density.Rd new file mode 100644 index 00000000..510ff5ff --- /dev/null +++ b/man/partial_density.Rd @@ -0,0 +1,52 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/partial.R +\name{partial_density} +\alias{partial_density} +\title{Visualize the density of the data over the environmental data} +\usage{ +partial_density(mod, x.var, df = FALSE, ...) + +\S4method{partial_density}{ANY,character,logical}(mod,x.var,df,...) +} +\arguments{ +\item{mod}{A trained \code{DistributionModel} object. Requires a fitted model and inferred prediction.} + +\item{x.var}{A \link{character} indicating the variable to be investigated. Can be a \code{\link{vector}} of length \code{1} or \code{2}.} + +\item{df}{\code{\link{logical}} if plotting data should be returned instead (Default: \code{FALSE}).} + +\item{...}{Other engine specific parameters.} +} +\value{ +A \code{\link{ggplot2}} object showing the marginal response in light of the data. +} +\description{ +Based on a fitted model, plot the density of observations over the estimated variable and environmental space. +Opposed to the \link{partial} and \link{spartial} functions, which are rather low-level interfaces, this function provides more +detail in the light of the data. It is also able to contrast different variables against each other and show the used data. +} +\details{ +This functions calculates the observed density of presence and absence points over the whole surface of a specific +variable. It can be used to visually inspect the fit of the model to data. +} +\note{ +By default all variables that are not \code{x.var} are hold constant at the mean. +} +\examples{ +\dontrun{ + # Do a partial calculation of a trained model + partial_density(fit, x.var = "Forest.cover") + # Or with two variables + partial_density(fit, x.var = c("Forest.cover", "bio01")) +} +} +\references{ +\itemize{ +\item Warren, D.L., Matzke, N.J., Cardillo, M., Baumgartner, J.B., Beaumont, L.J., Turelli, M., Glor, R.E., Huron, N.A., Simões, M., Iglesias, T.L. Piquet, J.C., and Dinnage, R. 2021. ENMTools 1.0: an R package for comparative ecological biogeography. Ecography, 44(4), pp.504-511. +} +} +\seealso{ +\link{partial} +} +\concept{Visual style emulated from ENMTools package.} +\keyword{partial} diff --git a/man/posterior_predict_stanfit.Rd b/man/posterior_predict_stanfit.Rd index 08372901..90910910 100644 --- a/man/posterior_predict_stanfit.Rd +++ b/man/posterior_predict_stanfit.Rd @@ -15,7 +15,7 @@ posterior_predict_stanfit( ) } \arguments{ -\item{obj}{A \code{"stanfit"} object (as used by \code{\link{rstan}}).} +\item{obj}{A \code{"stanfit"} object (as used by rstan).} \item{form}{A \code{\link{formula}} object created for the \link{DistributionModel}.} @@ -37,6 +37,6 @@ to project coefficients obtained from Bayesian models to new/novel contexts. \references{ \itemize{ \item \url{https://medium.com/@alex.pavlakis/making-predictions-from-stan-models-in-r-3e349dfac1ed}. -\item The \code{\link{brms}} R-package. +\item The brms R-package. } } diff --git a/man/predictor_derivate.Rd b/man/predictor_derivate.Rd index dc1054c6..3c0305bd 100644 --- a/man/predictor_derivate.Rd +++ b/man/predictor_derivate.Rd @@ -62,6 +62,6 @@ predictor_derivate(covs, option = "hinge", knots = 4) } } \seealso{ -predictor_derivate +predictor_transform } \keyword{utils} diff --git a/man/predictor_filter.Rd b/man/predictor_filter.Rd index e2361310..9b30bc91 100644 --- a/man/predictor_filter.Rd +++ b/man/predictor_filter.Rd @@ -25,7 +25,7 @@ If the function fails due to some reason return \code{NULL}. This function helps to remove highly correlated variables from a set of predictors. It supports multiple options some of which require both environmental predictors and observations, others only predictors. -Some of the options require different packages to be pre-installed, such as \code{\link{ranger}} or \code{\link{Boruta}}. +Some of the options require different packages to be pre-installed, such as \code{ranger} or \code{Boruta}. } \details{ Available options are: @@ -33,9 +33,9 @@ Available options are: \item \code{"none"} No prior variable removal is performed (Default). \item \code{"pearson"}, \code{"spearman"} or \code{"kendall"} Makes use of pairwise comparisons to identify and remove highly collinear predictors (Pearson's \code{r >= 0.7}). -\item \code{"abess"} A-priori adaptive best subset selection of covariates via the \code{\link{abess}} package (see References). Note that this +\item \code{"abess"} A-priori adaptive best subset selection of covariates via the \code{abess} package (see References). Note that this effectively fits a separate generalized linear model to reduce the number of covariates. -\item \code{"boruta"} Uses the \code{\link{Boruta}} package to identify non-informative features. +\item \code{"boruta"} Uses the \code{Boruta} package to identify non-informative features. } } \note{ diff --git a/man/predictors_filter_abess.Rd b/man/predictors_filter_abess.Rd index e19d15d6..cf34e7d6 100644 --- a/man/predictors_filter_abess.Rd +++ b/man/predictors_filter_abess.Rd @@ -2,6 +2,7 @@ % Please edit documentation in R/utils-predictors.R \name{predictors_filter_abess} \alias{predictors_filter_abess} +\alias{predictor_filter_abess} \title{Apply the adaptive best subset selection framework on a set of predictors} \usage{ predictors_filter_abess( @@ -27,7 +28,7 @@ predictors_filter_abess( \item{family}{A \code{\link{character}} indicating the family the observational data originates from.} \item{tune.type}{\code{\link{character}} indicating the type used for subset evaluation. -Options are \code{c("gic", "ebic", "bic", "aic", "cv")} as listed in \code{\link{abess}}.} +Options are \code{c("gic", "ebic", "bic", "aic", "cv")} as listed in \code{abess}.} \item{lambda}{A \code{\link{numeric}} single lambda value for regularized best subset selection (Default: \code{0}).} diff --git a/man/predictors_filter_boruta.Rd b/man/predictors_filter_boruta.Rd index 5154b241..730f93bd 100644 --- a/man/predictors_filter_boruta.Rd +++ b/man/predictors_filter_boruta.Rd @@ -2,6 +2,7 @@ % Please edit documentation in R/utils-predictors.R \name{predictors_filter_boruta} \alias{predictors_filter_boruta} +\alias{predictor_filter_boruta} \title{All relevant feature selection using Boruta} \usage{ predictors_filter_boruta( @@ -35,12 +36,12 @@ information is known.} A \code{\link{vector}} of variable names to exclude. } \description{ -This function uses the \code{\link{Boruta}} package to identify predictor variables with little information content. It iteratively +This function uses the \code{Boruta} package to identify predictor variables with little information content. It iteratively compares importances of attributes with importances of shadow attributes, created by shuffling original ones. Attributes that have significantly worst importance than shadow ones are being consecutively dropped. } \note{ -This package depends on the \code{\link{ranger}} package to iteratively fit randomForest models. +This package depends on the \code{ranger} package to iteratively fit randomForest models. } \references{ \itemize{ diff --git a/man/priors.Rd b/man/priors.Rd index 567ff3b6..acd267c3 100644 --- a/man/priors.Rd +++ b/man/priors.Rd @@ -6,11 +6,11 @@ \usage{ priors(x, ...) -\S4method{priors}{Prior}(x) +\S4method{priors}{Prior}(x,...) priors(x, ...) -\S4method{priors}{Prior}(x) +\S4method{priors}{Prior}(x,...) } \arguments{ \item{x}{A \code{\linkS4class{Prior}} object added to the list.} diff --git a/man/project.Rd b/man/project.Rd index 822a906d..82f83bdc 100644 --- a/man/project.Rd +++ b/man/project.Rd @@ -9,11 +9,11 @@ \usage{ \method{project}{BiodiversityScenario}(x, ...) -\S4method{project}{BiodiversityScenario, character, logical, character, character}(mod, date_interpolation, stabilize, stabilize_method, layer) +\S4method{project}{BiodiversityScenario,character,logical,character,character}(x,date_interpolation,stabilize,stabilize_method,layer) } \arguments{ \item{x}{A \code{\link{BiodiversityScenario}} object with set predictors. -Note that some constrains such as \code{\link{MigClim}} can still simulate future change without projections.} +Note that some constrains such as \code{MigClim} can still simulate future change without projections.} \item{...}{passed on parameters.} @@ -39,17 +39,17 @@ Any constrains specified in the scenario object are applied during the projectio } \details{ In the background the function \code{x$project()} for the respective model object is called, where -\code{x} is fitted model object. For specifics on the constrains, see the relevant \code{\link{constrain}} functions, +\code{x} is fitted model object. For specifics on the constrains, see the relevant \code{constrain} functions, respectively: \itemize{ -\item \code{\link[=add_constrain]{add_constrain()}} for generic wrapper to add any of the available constrains. -\item \code{\link[=add_constrain_dispersal]{add_constrain_dispersal()}} for specifying dispersal constrain on the temporal projections at each step. -\item \code{\link[=add_constrain_MigClim]{add_constrain_MigClim()}} Using the \pkg{MigClim} R-package to simulate dispersal in projections. -\item \code{\link[=add_constrain_connectivity]{add_constrain_connectivity()}} Apply a connectivity constrain at the projection, for instance by adding +\item \code{\link[=add_constraint]{add_constraint()}} for generic wrapper to add any of the available constrains. +\item \code{\link[=add_constraint_dispersal]{add_constraint_dispersal()}} for specifying dispersal constrain on the temporal projections at each step. +\item \code{\link[=add_constraint_MigClim]{add_constraint_MigClim()}} Using the \pkg{MigClim} R-package to simulate dispersal in projections. +\item \code{\link[=add_constraint_connectivity]{add_constraint_connectivity()}} Apply a connectivity constrain at the projection, for instance by adding a barrier that prevents migration. -\item \code{\link[=add_constrain_adaptability]{add_constrain_adaptability()}} Apply an adaptability constrain to the projection, for instance +\item \code{\link[=add_constraint_adaptability]{add_constraint_adaptability()}} Apply an adaptability constrain to the projection, for instance constraining the speed a species is able to adapt to new conditions. -\item \code{\link[=add_constrain_boundary]{add_constrain_boundary()}} To artificially limit the distribution change. Similar as specifying projection limits, but +\item \code{\link[=add_constraint_boundary]{add_constraint_boundary()}} To artificially limit the distribution change. Similar as specifying projection limits, but can be used to specifically constrain a projection within a certain area (e.g. a species range or an island). } diff --git a/man/pseudoabs_settings.Rd b/man/pseudoabs_settings.Rd index 6c68cd17..9e08b3b2 100644 --- a/man/pseudoabs_settings.Rd +++ b/man/pseudoabs_settings.Rd @@ -4,7 +4,7 @@ \alias{pseudoabs_settings} \title{Settings for specifying pseudo-absence points within the model background} \usage{ -\S4method{pseudoabs_settings}{ANY, numeric, numeric, character, numeric, logical, logical ANY}(background, nrpoints, min_ratio, method, buffer_distance, inside, layer, bias) +\S4method{pseudoabs_settings}{ANY,numeric,numeric,character,numeric,logical,logical,ANY}(background,nrpoints,min_ratio,method,buffer_distance,inside,layer,bias,...) } \arguments{ \item{background}{A \code{\link{SpatRaster}} or \code{\link{sf}} object over which background points can be sampled. Default is diff --git a/man/rm_offset.Rd b/man/rm_offset.Rd index 1c3b8264..0429422e 100644 --- a/man/rm_offset.Rd +++ b/man/rm_offset.Rd @@ -6,7 +6,7 @@ \usage{ rm_offset(x, layer = NULL) -\S4method{rm_offset}{BiodiversityDistribution, character}(x, layer) +\S4method{rm_offset}{BiodiversityDistribution,character}(x,layer) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} diff --git a/man/rm_predictors.Rd b/man/rm_predictors.Rd index 34297263..6bff9374 100644 --- a/man/rm_predictors.Rd +++ b/man/rm_predictors.Rd @@ -6,7 +6,7 @@ \usage{ rm_predictors(x, names) -\S4method{rm_predictors}{BiodiversityDistribution,vector}(x, names) +\S4method{rm_predictors}{BiodiversityDistribution,ANY}(x,names) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} diff --git a/man/rm_priors.Rd b/man/rm_priors.Rd index 7bbc16ad..8a6ff137 100644 --- a/man/rm_priors.Rd +++ b/man/rm_priors.Rd @@ -6,7 +6,7 @@ \usage{ rm_priors(x, names = NULL, ...) -\S4method{rm_priors}{BiodiversityDistribution}(x) +\S4method{rm_priors}{BiodiversityDistribution,ANY}(x,names,...) } \arguments{ \item{x}{\link{distribution} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} @@ -21,7 +21,12 @@ In order to remove a set prior, the name of the prior has to be specified. } \examples{ \dontrun{ - TBD + # Add prior + pp <- GLMNETPrior("forest") + x <- distribution(background) |> + add_priors(pp) + # Remove again + x <- x |> rm_priors("forest") } } \seealso{ diff --git a/man/run_stan.Rd b/man/run_stan.Rd index a84d99e6..fe617c43 100644 --- a/man/run_stan.Rd +++ b/man/run_stan.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/utils-stan.R \name{run_stan} \alias{run_stan} -\title{Fit \link{cmdstanr} model and convert to \link{rstan} object} +\title{Fit cmdstanr model and convert to rstan object} \usage{ run_stan( model_code, @@ -24,7 +24,7 @@ run_stan( \arguments{ \item{model_code}{A \code{\link{character}} pointing to the stan modelling code.} -\item{data}{A \code{\link{list}} with all the parameters required to run the \link{model_code} in stan.} +\item{data}{A \code{\link{list}} with all the parameters required to run the model_code in stan.} \item{algorithm}{A \code{\link{character}} giving the algorithm to use. Either \code{'sampling'} (Default), \code{'optimize'} or \code{'variational'} for penalized likelihood estimation.} @@ -51,14 +51,14 @@ run_stan( \item{...}{Other non-specified parameters.} } \value{ -A \link{rstan} object +A rstan object } \description{ This function fits a stan model using the light-weight interface provided -by \link{cmdstanr}. The code was adapted from McElreath \link{rethinking} package. +by cmdstanr. The code was adapted from McElreath rethinking package. } \seealso{ -\link{rethinking} R package +rethinking R package } \keyword{misc,} \keyword{stan} diff --git a/man/scenario.Rd b/man/scenario.Rd index 4630ce49..9b998be2 100644 --- a/man/scenario.Rd +++ b/man/scenario.Rd @@ -6,7 +6,7 @@ \usage{ scenario(fit, limits = NULL, copy_model = FALSE) -\S4method{scenario}{ANY, ANY, logical}(fit, limits, copy_model) +\S4method{scenario}{ANY,ANY,logical}(fit,limits,copy_model) } \arguments{ \item{fit}{A \code{\link{BiodiversityDistribution}} object containing a trained model.} diff --git a/man/sel_predictors.Rd b/man/sel_predictors.Rd index f5aa49d1..67108368 100644 --- a/man/sel_predictors.Rd +++ b/man/sel_predictors.Rd @@ -6,7 +6,7 @@ \usage{ sel_predictors(x, names) -\S4method{sel_predictors}{BiodiversityDistribution,vector}(x, names) +\S4method{sel_predictors}{BiodiversityDistribution,ANY}(x,names) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} diff --git a/man/set_priors.Rd b/man/set_priors.Rd index dccb6fb9..ef9f7e59 100644 --- a/man/set_priors.Rd +++ b/man/set_priors.Rd @@ -6,7 +6,7 @@ \usage{ set_priors(x, priors = NULL, ...) -\S4method{set_priors}{BiodiversityDistribution}(x) +\S4method{set_priors}{BiodiversityDistribution,ANY}(x,priors,...) } \arguments{ \item{x}{\link{distribution} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object.} @@ -26,7 +26,10 @@ via \link{add_predictors} } \examples{ \dontrun{ - x <- distribution(background) + pp <- GLMNETPrior("forest") + x <- distribution(background) |> + add_priors(pp) + } } \seealso{ diff --git a/man/similarity.Rd b/man/similarity.Rd index de3ebbc7..6975c282 100644 --- a/man/similarity.Rd +++ b/man/similarity.Rd @@ -15,9 +15,9 @@ similarity( ... ) -\S4method{similarity}{BiodiversityDistribution, character, character, character, logical, logical}(obj, ref_type, method, predictor_names, full, plot) +\S4method{similarity}{BiodiversityDistribution,character,character,character,logical,logical}(obj,ref_type,method,predictor_names,full,plot,...) -\S4method{similarity}{SpatRaster}(obj) +\S4method{similarity}{SpatRaster,sf,character,logical,logical}(obj,ref,method,full,plot,...) } \arguments{ \item{obj}{A \code{\link{BiodiversityDistribution}}, \code{\link{DistributionModel}} or alternatively a \code{\link{SpatRaster}} object.} @@ -80,7 +80,7 @@ species distribution models" https://doi.org/10.1111/ddi.12209 \emph{Diversity a } } \seealso{ -\code{\link{dismo}} R-package. +dismo R-package. } \keyword{environment} \keyword{mahalanobis,} diff --git a/man/spartial.Rd b/man/spartial.Rd index 24ba2ef8..2e239ea5 100644 --- a/man/spartial.Rd +++ b/man/spartial.Rd @@ -7,7 +7,7 @@ \usage{ spartial(mod, x.var, constant = NULL, plot = FALSE, ...) -\S4method{spartial}{ANY,character}(mod, x.var) +\S4method{spartial}{ANY,character,ANY,logical}(mod,x.var,constant,plot,...) \method{spartial}{DistributionModel}(mod, ...) } diff --git a/man/stan_check_cmd.Rd b/man/stan_check_cmd.Rd index 82eed7af..ecea89e6 100644 --- a/man/stan_check_cmd.Rd +++ b/man/stan_check_cmd.Rd @@ -7,7 +7,7 @@ stan_check_cmd(install = TRUE, ask = FALSE) } \arguments{ -\item{install}{A \code{\link{logical}} factor to indicate whether \link{cmdstanr} should be directly installed (Default: \code{TRUE}).} +\item{install}{A \code{\link{logical}} factor to indicate whether cmdstanr should be directly installed (Default: \code{TRUE}).} \item{ask}{\code{\link{logical}} whether the cmdstanr package is to be installed (Default: \code{FALSE}).} } diff --git a/man/stancode.Rd b/man/stancode.Rd index dba9ee1e..b69a51d1 100644 --- a/man/stancode.Rd +++ b/man/stancode.Rd @@ -18,10 +18,10 @@ None. \description{ This helper function shows the code from a trained \link{DistributionModel} using the \code{\link{engine_stan}}. -This function is emulated after a similar functionality in the \link{brms} R-package. +This function is emulated after a similar functionality in the brms R-package. \strong{It only works with models inferred with stan!} } \seealso{ -\link{rstan}, \link{cmdstanr}, \link{brms} +rstan, cmdstanr, brms } \keyword{engine} diff --git a/man/threshold.Rd b/man/threshold.Rd index 859de028..f7e46e18 100644 --- a/man/threshold.Rd +++ b/man/threshold.Rd @@ -14,16 +14,14 @@ threshold( ... ) -\S4method{threshold}{ANY}(obj) +\S4method{threshold}{ANY,character,numeric,ANY,character,logical}(obj,method,value,point,format,return_threshold,...) -\S4method{threshold}{SpatRasterDataset}(obj) +\S4method{threshold}{SpatRaster,character,ANY,ANY,character,logical}(obj,method,value,point,format,return_threshold) -\S4method{threshold}{SpatRaster}(obj) - -\S4method{threshold}{BiodiversityScenario}(obj) +\S4method{threshold}{BiodiversityScenario,ANY}(obj,tr) } \arguments{ -\item{obj}{A trained \code{\link{DistributionModel}} or alternatively a \code{\link{SpatRaster}} object.} +\item{obj}{A \link{BiodiversityScenario} object to which an existing threshold is to be added.} \item{method}{A specifc method for thresholding. See details for available options.} @@ -38,7 +36,7 @@ formatted thresholds are to be created (Default: \code{"binary"}). Also see Musc \item{...}{other parameters not yet set.} -\item{tr}{A \code{\link{numeric}} value specifiying the specific threshold for scenarios.} +\item{tr}{A \code{\link{numeric}} value specifying the specific threshold for scenarios (Default: Grab from object).} } \value{ A \link{SpatRaster} if a \link{SpatRaster} object as input. @@ -66,12 +64,12 @@ The following options are currently implemented: \item \code{'mtp'} = minimum training presence is used to find and set the lowest predicted suitability for any occurrence point. \item \code{'percentile'} = For a percentile threshold. A \code{value} as parameter has to be set here. \item \code{'min.cv'} = Threshold the raster so to minimize the coefficient of variation (cv) of the posterior. Uses the lowest tercile of the cv in space. Only feasible with Bayesian engines. -\item \code{'TSS'} = Determines the optimal TSS (True Skill Statistic). Requires the \code{\link{modEvA}} package to be installed. -\item \code{'kappa'} = Determines the optimal kappa value (Kappa). Requires the \code{\link{modEvA}} package to be installed. -\item \code{'F1score'} = Determines the optimal F1score (also known as Sorensen similarity). Requires the \code{\link{modEvA}} package to be installed. -\item \code{'F1score'} = Determines the optimal sensitivity of presence records. Requires the \code{\link{modEvA}} package to be installed. -\item \code{'Sensitivity'} = Determines the optimal sensitivity of presence records. Requires the \code{\link{modEvA}} package to be installed. -\item \code{'Specificity'} = Determines the optimal sensitivity of presence records. Requires the \code{\link{modEvA}} package to be installed. +\item \code{'TSS'} = Determines the optimal TSS (True Skill Statistic). Requires the \code{"modEvA"} package to be installed. +\item \code{'kappa'} = Determines the optimal kappa value (Kappa). Requires the \code{"modEvA"} package to be installed. +\item \code{'F1score'} = Determines the optimal F1score (also known as Sorensen similarity). Requires the \code{"modEvA"} package to be installed. +\item \code{'F1score'} = Determines the optimal sensitivity of presence records. Requires the \code{"modEvA"} package to be installed. +\item \code{'Sensitivity'} = Determines the optimal sensitivity of presence records. Requires the \code{"modEvA"} package to be installed. +\item \code{'Specificity'} = Determines the optimal sensitivity of presence records. Requires the \code{"modEvA"} package to be installed. } } \examples{ @@ -89,5 +87,5 @@ The following options are currently implemented: } } \seealso{ -\code{\link{modEvA}} +\code{"modEvA"} } diff --git a/man/train.Rd b/man/train.Rd index a2125886..90bb047c 100644 --- a/man/train.Rd +++ b/man/train.Rd @@ -16,11 +16,11 @@ train( method_integration = "predictor", aggregate_observations = TRUE, clamp = FALSE, - verbose = FALSE, + verbose = getOption("ibis.setupmessages"), ... ) -\S4method{train}{BiodiversityDistribution, character, character, logical, logical, logical, character, logical, logical, logical}(x,runname,filter_predictors,optim_hyperparam,inference_only,only_linear,method_integration,aggregate_observations,clamp,verbose) +\S4method{train}{BiodiversityDistribution,character,character,logical,logical,logical,character,logical,logical,logical}(x,runname,filter_predictors,optim_hyperparam,inference_only,only_linear,method_integration,aggregate_observations,clamp,verbose,...) } \arguments{ \item{x}{\code{\link[=distribution]{distribution()}} (i.e. \code{\linkS4class{BiodiversityDistribution}}) object).} @@ -34,10 +34,10 @@ Available options are: \item \code{"none"} No prior variable removal is performed (Default). \item \code{"pearson"}, \code{"spearman"} or \code{"kendall"} Makes use of pairwise comparisons to identify and remove highly collinear predictors (Pearson's \code{r >= 0.7}). -\item \code{"abess"} A-priori adaptive best subset selection of covariates via the \code{\link{abess}} package (see References). +\item \code{"abess"} A-priori adaptive best subset selection of covariates via the \code{"abess"} package (see References). Note that this effectively fits a separate generalized linear model to reduce the number of covariates. -\item \code{"boruta"} Uses the \code{\link{Boruta}} package to identify non-informative features. +\item \code{"boruta"} Uses the \code{"Boruta"} package to identify non-informative features. }} \item{optim_hyperparam}{Parameter to tune the model by iterating over input parameters or selection @@ -65,7 +65,7 @@ Available options are: added to the predictor stack and thus are predictors for subsequent models (Default). \item \code{"offset"} The predicted output of the first (or previously fitted) models are added as spatial offsets to subsequent models. Offsets are back-transformed depending -on the model family. This option might not be supported for every \code{\link{engine}}. +on the model family. This option might not be supported for every \code{\link{Engine}}. \item \code{"interaction"} Instead of fitting several separate models, the observations from each dataset are combined and incorporated in the prediction as a factor interaction with the "weaker" data source being partialed out during prediction. Here the first dataset added determines the reference level diff --git a/man/validate.Rd b/man/validate.Rd index 6b9a539e..204c1910 100644 --- a/man/validate.Rd +++ b/man/validate.Rd @@ -4,9 +4,9 @@ \alias{validate} \title{Validation of a fitted distribution object} \usage{ -\S4method{validate}{ANY, character, sf, character, character}(mod, method, point, layer, point_column) +\S4method{validate}{ANY,character,sf,character,character}(mod,method,point,layer,point_column,...) -\S4method{validate}{SpatRaster, character, sf, character}(mod, method, point, point_column) +\S4method{validate}{SpatRaster,character,sf,character}(mod,method,point,point_column,...) } \arguments{ \item{mod}{A fitted \code{\link{BiodiversityDistribution}} object with set predictors. Alternatively one can also @@ -58,11 +58,12 @@ The output metrics for each type are defined as follows: \item \code{'sensitivity'} = Sensitivity, TBD \item \code{'specificity'} = Specifivity, TBD \item \code{'tss'} = True Skill Statistics, TBD -\item \code{'f1'} = F1 Score or Positive predictive value, TBD +\item \code{'f1'} = F1 Score or Positive predictive value, \deqn{ \frac{2TP}{2TP + FP + FN} } \item \code{'logloss'} = Log loss, TBD -\item \code{'expected.accuracy'} = Expected Accuracy, TBD -\item \code{'kappa'} = Kappa value, TBD -\item \code{'brier.score'} = Brier score, TBD +\item \code{'expected.accuracy'} = Expected Accuracy, \deqn{ \frac{TP + FP}{N} x \frac{TP + FN}{N} + \frac{TN + FN}{N} x \frac{TN + FP}{N} } +\item \code{'kappa'} = Kappa value, \deqn{ \frac{2 (TP x TN - FN x FP)}{(TP + FP) x (FP + TN) + (TP + FN) x (FN + TN) } }, +\item \code{'brier.score'} = Brier score, \deqn{ \frac{ \sum_{i=1}^{N} (y_{i} - x_{i})^{2} }{n} }, where $y_{i}$ is predicted presence or absence and $x_{i}$ an observed. +where TP is true positive, TN a true negative, FP the false positive and FN the false negative. } } \note{ diff --git a/man/wrap_stanmodel.Rd b/man/wrap_stanmodel.Rd index 097851c8..91f4640e 100644 --- a/man/wrap_stanmodel.Rd +++ b/man/wrap_stanmodel.Rd @@ -13,7 +13,7 @@ wrap_stanmodel(sm_code) A \link{character} object. } \description{ -\link{engine_stan} builds a list with stan model code. This function +engine_stan builds a list with stan model code. This function concatenates them together. } \keyword{stan,} diff --git a/man/write_model.Rd b/man/write_model.Rd index c981b10b..2bc0bf92 100644 --- a/man/write_model.Rd +++ b/man/write_model.Rd @@ -4,7 +4,7 @@ \alias{write_model} \title{Save a model for later use} \usage{ -\S4method{write_model}{ANY, character, logical, logical}(mod, fname, slim, verbose) +\S4method{write_model}{ANY,character,logical,logical}(mod,fname,slim,verbose) } \arguments{ \item{mod}{Provided \code{\link{DistributionModel}} object.} diff --git a/man/write_output.Rd b/man/write_output.Rd index 5086a6c3..f777ce3a 100644 --- a/man/write_output.Rd +++ b/man/write_output.Rd @@ -4,15 +4,15 @@ \alias{write_output} \title{Generic function to write spatial outputs} \usage{ -\S4method{write_output}{ANY, character, character, logical}(mod, fname, dt, verbose) +\S4method{write_output}{ANY,character,character,logical}(mod,fname,dt,verbose) -\S4method{write_output}{BiodiversityScenario, character, character, logical}(mod, fname, dt, verbose) +\S4method{write_output}{BiodiversityScenario,character,character,logical}(mod,fname,dt,verbose) -\S4method{write_output}{SpatRaster, character, character, logical}(mod, fname, dt, verbose) +\S4method{write_output}{SpatRaster,character,character,logical}(mod,fname,dt,verbose) -\S4method{write_output}{data.frame, character, character, logical}(mod, fname, dt, verbose) +\S4method{write_output}{data.frame,character,character,logical}(mod,fname,dt,verbose) -\S4method{write_output}{stars, character, character, logical}(mod, fname, dt, verbose) +\S4method{write_output}{stars,character,character,logical}(mod,fname,dt,verbose) } \arguments{ \item{mod}{Provided \code{\link{DistributionModel}}, \code{\link{BiodiversityScenario}}, \code{\link{terra}} or \code{\link{stars}} object.} diff --git a/man/write_stanmodel.Rd b/man/write_stanmodel.Rd index 2f671e7a..4290b986 100644 --- a/man/write_stanmodel.Rd +++ b/man/write_stanmodel.Rd @@ -7,12 +7,12 @@ write_stanmodel(mod, dir = tempdir()) } \arguments{ -\item{mod}{A supplied \link{cmdstanr} model} +\item{mod}{A supplied cmdstanr model} \item{dir}{The model directory where the model chould be written. Should be a character / existing dir.} } \description{ -Write a \link{cmdstanr} model output to a specific destination +Write a cmdstanr model output to a specific destination } \keyword{stan,} \keyword{utils} diff --git a/man/write_summary.Rd b/man/write_summary.Rd index b7625261..353d7960 100644 --- a/man/write_summary.Rd +++ b/man/write_summary.Rd @@ -4,7 +4,7 @@ \alias{write_summary} \title{Generic function to write summary outputs from created models.} \usage{ -\S4method{write_summary}{ANY, character, logical, logical}(mod, fname, partial, verbose) +\S4method{write_summary}{ANY,character,logical,logical}(mod,fname,partial,verbose,...) } \arguments{ \item{mod}{Provided \code{\link{DistributionModel}} or \code{\link{BiodiversityScenario}} object.} diff --git a/tests/testthat/test_BiodiversityDistribution.R b/tests/testthat/test_BiodiversityDistribution.R index acc5ad82..20cd05f5 100644 --- a/tests/testthat/test_BiodiversityDistribution.R +++ b/tests/testthat/test_BiodiversityDistribution.R @@ -113,7 +113,13 @@ test_that('Setting up a distribution model',{ # Also add a zonal layer for limits zones <- terra::as.factor( predictors$koeppen_50km ) x <- distribution(background, limits = zones) - expect_s3_class(x$get_limits(), "sf") + expect_type(x$get_limits(), "list") + expect_s3_class(x$get_limits()$layer, "sf") + + # Alternatively with MCP + x <- distribution(background, limits_method = "mcp", mcp_buffer = 1000) + expect_type(x$get_limits(), "list") + expect_equal(x$get_limits()$limits_method, "mcp") y <- x |> engine_bart() expect_equal( y$get_engine(), "") diff --git a/tests/testthat/test_Scenarios.R b/tests/testthat/test_Scenarios.R index e624c488..55b607d4 100644 --- a/tests/testthat/test_Scenarios.R +++ b/tests/testthat/test_Scenarios.R @@ -3,6 +3,7 @@ test_that('Scenarios and constraints', { skip_if_not_installed('glmnet') skip_if_not_installed('geosphere') + skip_if_not_installed('cubelyr') skip_on_travis() skip_on_cran() @@ -10,6 +11,7 @@ test_that('Scenarios and constraints', { suppressWarnings( requireNamespace('igraph', quietly = TRUE) ) suppressWarnings( requireNamespace('stars', quietly = TRUE) ) suppressWarnings( requireNamespace('geosphere', quietly = TRUE) ) + suppressWarnings( requireNamespace('cubelyr', quietly = TRUE) ) options("ibis.setupmessages" = FALSE) # Be less chatty options("ibis.seed" = 1234) @@ -44,6 +46,7 @@ test_that('Scenarios and constraints', { test <- raster_to_stars(pred_current) expect_length(test, 9) expect_equal(names(test), names(pred_current)) + expect_length(stars::st_get_dimension_values(test,"time"), 1) # Basic validity checks expect_length(pred_future, 9) diff --git a/tests/testthat/test_functions.R b/tests/testthat/test_functions.R index 28d24609..54d6f0db 100644 --- a/tests/testthat/test_functions.R +++ b/tests/testthat/test_functions.R @@ -105,10 +105,21 @@ test_that('Custom functions - Test gridded transformations and ensembles', { ex <- ensemble(r1, r2, r3, layer = "lyr.1") expect_equal(terra::nlyr(ex), 2) expect_lte( terra::global(ex, "max", na.rm = TRUE)[1,1], max( terra::global( c(r1, r2, r3), "max", na.rm = TRUE) )) + expect_named(object = ex, expected = c("ensemble_lyr.1", "cv_lyr.1")) ex <- ensemble(r1, r2, r3, layer = "lyr.1", normalize = TRUE) expect_lte( terra::global(ex, "max")[1,1], 1) + ex_sd <- ensemble(r1, r2, r3, layer = "lyr.1", uncertainty = "sd") + ex_range <- ensemble(r1, r2, r3, layer = "lyr.1", uncertainty = "range") + + expect_named(object = ex_sd, expected = c("ensemble_lyr.1", "sd_lyr.1")) + expect_named(object = ex_range, expected = c("ensemble_lyr.1", "range_lyr.1")) + + expect_error(ensemble(r1, r2, r3, layer = "lyr.1", uncertainty = "pca"), + regexp = "Currently, uncertainty = 'pca' is not implemented for SpatRaster input.") + + }) # ---- # diff --git a/tests/testthat/test_trainINLA.R b/tests/testthat/test_trainINLA.R index 582f653e..50932989 100644 --- a/tests/testthat/test_trainINLA.R +++ b/tests/testthat/test_trainINLA.R @@ -35,6 +35,8 @@ test_that('Train a distribution model with INLA', { skip_on_travis() skip_on_cran() + options(ibis.setupmessages = FALSE) + # Load data # Background Raster background <- terra::rast(system.file('extdata/europegrid_50km.tif', package='ibis.iSDM',mustWork = TRUE)) @@ -57,8 +59,10 @@ test_that('Train a distribution model with INLA', { proj_stepsize = 1 ) # Train the model - suppressWarnings( - mod <- train(x, "test", inference_only = TRUE, only_linear = TRUE, varsel = "none", verbose = FALSE) + suppressMessages( + suppressWarnings( + mod <- train(x, "test", inference_only = TRUE, only_linear = TRUE, varsel = "none", verbose = FALSE) + ) ) # Expect summary diff --git a/tests/testthat/test_trainOtherEngines.R b/tests/testthat/test_trainOtherEngines.R index 3f0211cc..e9cd0545 100644 --- a/tests/testthat/test_trainOtherEngines.R +++ b/tests/testthat/test_trainOtherEngines.R @@ -2,10 +2,12 @@ test_that('Train a distribution model with XGboost', { skip_if_not_installed('xgboost') + skip_if_not_installed('pdp') skip_on_travis() skip_on_cran() suppressWarnings( requireNamespace('xgboost', quietly = TRUE) ) + suppressWarnings( requireNamespace('pdp', quietly = TRUE) ) # Load data # Background Raster @@ -47,6 +49,15 @@ test_that('Train a distribution model with XGboost', { ex <- ensemble(mod, mod) expect_s4_class(ex, "SpatRaster") + + ex_sd <- ensemble(mod, mod, uncertainty = "sd") + ex_range <- ensemble(mod, mod, uncertainty = "range") + ex_pca <- ensemble(mod, mod, uncertainty = "pca") + + expect_named(object = ex_sd, expected = c("ensemble_mean", "sd_mean")) + expect_named(object = ex_range, expected = c("ensemble_mean", "range_mean")) + expect_named(object = ex_pca, expected = c("ensemble_mean", "pca_mean")) + }) # ---- # @@ -99,6 +110,9 @@ test_that('Train a distribution model with Breg', { ex <- ensemble(mod, mod) expect_s4_class(ex, "SpatRaster") + # Does limiting raster work? + suppressMessages( expect_s4_class(limiting(mod, plot = FALSE), "SpatRaster") ) + }) # ---- # @@ -151,6 +165,9 @@ test_that('Train a distribution model with GDB', { ex <- ensemble(mod, mod) expect_s4_class(ex, "SpatRaster") + # Does limiting raster work? + suppressMessages( expect_s4_class(limiting(mod, plot = FALSE), "SpatRaster") ) + }) # ---- # @@ -158,10 +175,12 @@ test_that('Train a distribution model with GDB', { test_that('Train a distribution model with glmnet', { skip_if_not_installed('glmnet') + skip_if_not_installed('pdp') skip_on_travis() skip_on_cran() suppressWarnings( requireNamespace('glmnet', quietly = TRUE) ) + suppressWarnings( requireNamespace('pdp', quietly = TRUE) ) # Load data # Background Raster @@ -203,6 +222,12 @@ test_that('Train a distribution model with glmnet', { ex <- ensemble(mod, mod) expect_s4_class(ex, "SpatRaster") + # Added here to tests as it is quick + expect_no_error( partial_density(mod = mod, x.var = "elevation_mean_50km", df = FALSE)) + expect_s3_class( partial_density(mod = mod, x.var = "elevation_mean_50km", df = TRUE), "data.frame") + + # Does limiting raster work? + suppressMessages( expect_s4_class(limiting(mod, plot = FALSE), "SpatRaster") ) }) # ---- # diff --git a/vignettes/articles/04_mechanistic_estimation.Rmd b/vignettes/articles/04_mechanistic_estimation.Rmd new file mode 100644 index 00000000..0ea5b551 --- /dev/null +++ b/vignettes/articles/04_mechanistic_estimation.Rmd @@ -0,0 +1,29 @@ +--- +title: "Adding mechanistic distribution modelling through simulations" +author: "Martin Jung" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Adding mechanistic distribution modelling through simulations} + %\VignetteEncoding{UTF-8} + %\VignetteEngine{knitr::rmarkdown} +editor_options: + chunk_output_type: console +--- + +```{r setup, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +# Define variables for vignette figures and code execution +h <- 5.5 +w <- 5.5 +is_check <- ("CheckExEnv" %in% search()) || any(c("_R_CHECK_TIMINGS_", + "_R_CHECK_LICENSE_") %in% names(Sys.getenv())) +knitr::opts_chunk$set(fig.align = "center", eval = !is_check) +``` + +# Mechanistic species distribution + +Arriving in upcoming release diff --git a/vignettes/articles/04_engine_comparison.Rmd b/vignettes/articles/05_engine_comparison.Rmd similarity index 100% rename from vignettes/articles/04_engine_comparison.Rmd rename to vignettes/articles/05_engine_comparison.Rmd diff --git a/vignettes/articles/05_package_comparison.Rmd b/vignettes/articles/06_package_comparison.Rmd similarity index 100% rename from vignettes/articles/05_package_comparison.Rmd rename to vignettes/articles/06_package_comparison.Rmd diff --git a/vignettes/articles/06_frequently-asked-questions.Rmd b/vignettes/articles/07_frequently-asked-questions.Rmd similarity index 96% rename from vignettes/articles/06_frequently-asked-questions.Rmd rename to vignettes/articles/07_frequently-asked-questions.Rmd index 5525ad7d..4e1b8000 100644 --- a/vignettes/articles/06_frequently-asked-questions.Rmd +++ b/vignettes/articles/07_frequently-asked-questions.Rmd @@ -13,7 +13,7 @@ documentclass: article This document contains a series of frequently asked questions when using the `ibis.iSDM` package and is a work in progress. -## Data preparation +## Data and model preparation
What are the necessary data preparation steps for `ibis.iSDM` ? @@ -58,6 +58,15 @@ Also see the `add_pseudoabsence()` and `pseudoabs_settings()` help pages for mor
+
+ The package seem to have many suggested dependencies. Is there a way to install them all? + + +Yes, the ibis.iSDM package uses a range of different functionalities from other, existing packages and if these functions are required for a specific purpose, the packages in question should be installed. + +An easy convenience functions to install all packages is `ibis_dependencies()` which installs all the packages listed in `getOption("ibis.dependencies")`. + +
## Model setup @@ -85,6 +94,16 @@ mod <- distribution(background, limits = zone) |> engine_gdb() |> train() plot(mod) + +# Alternatively one can also create such limits based on a minimum convex polygon +# of the provided data. Here we create a non-buffered MCP across all points used +# in species_data to constrain the prediction. +mod <- distribution(background, limits_method = "mcp", mcp_buffer = 0) |> + add_biodiversity_poipo(species_data) |> + engine_gdb() |> + train() +plot(mod) + ``` There is further a dedicated function ([`create_zonaloccurrence_mask()`]) to help set up such zones, either by taking existing categorical raster datasets or by constructing buffers around existing presence records (which could for example reflect assumed maximum dispersal distances).