Either Approximate Bayesian Computation or Markov Chain Monte Carlo Approximation is used to estimate relevant model parameters. Model accuracy is gauged using a custom quantity allocation disagreement function to assess accuracy of spatial configuration. We test number of predictions, number of predicted locations, cumulative distance to nearest infection. The calibration uses these metrics to determine if a run is kept if it is under a threshold. either because it improves the results or randomly gets kept despite being worse. We recommend running calibration for at least 10,000 iterations but even more will provide a better result. If the model converges and doesn't improve for awhile it will exist calibration prior to reaching the total number of iterations specified.

calibrate(
  infected_years_file,
  number_of_observations = 1,
  prior_number_of_observations = 0,
  prior_means = c(0, 0, 0, 0, 0, 0),
  prior_cov_matrix = matrix(0, 6, 6),
  params_to_estimate = c(TRUE, TRUE, TRUE, TRUE, FALSE, FALSE),
  number_of_generations = 7,
  generation_size = 1000,
  pest_host_table,
  competency_table,
  infected_file_list,
  host_file_list,
  total_populations_file,
  temp = FALSE,
  temperature_coefficient_file = "",
  precip = FALSE,
  precipitation_coefficient_file = "",
  model_type = "SI",
  latency_period = 0,
  time_step = "month",
  season_month_start = 1,
  season_month_end = 12,
  start_date = "2008-01-01",
  end_date = "2008-12-31",
  use_survival_rates = FALSE,
  survival_rate_month = 3,
  survival_rate_day = 15,
  survival_rates_file = "",
  use_lethal_temperature = FALSE,
  temperature_file = "",
  lethal_temperature = -12.87,
  lethal_temperature_month = 1,
  mortality_frequency = "year",
  mortality_frequency_n = 1,
  management = FALSE,
  treatment_dates = c(""),
  treatments_file = "",
  treatment_method = "ratio",
  natural_kernel_type = "cauchy",
  anthropogenic_kernel_type = "cauchy",
  natural_dir = "NONE",
  natural_kappa = 0,
  anthropogenic_dir = "NONE",
  anthropogenic_kappa = 0,
  pesticide_duration = c(0),
  pesticide_efficacy = 1,
  mask = NULL,
  output_frequency = "year",
  output_frequency_n = 1,
  movements_file = "",
  use_movements = FALSE,
  start_exposed = FALSE,
  generate_stochasticity = TRUE,
  establishment_stochasticity = TRUE,
  movement_stochasticity = TRUE,
  dispersal_stochasticity = TRUE,
  establishment_probability = 0.5,
  dispersal_percentage = 0.99,
  quarantine_areas_file = "",
  use_quarantine = FALSE,
  use_spreadrates = FALSE,
  use_overpopulation_movements = FALSE,
  overpopulation_percentage = 0,
  leaving_percentage = 0,
  leaving_scale_coefficient = 1,
  calibration_method = "ABC",
  number_of_iterations = 1e+05,
  exposed_file_list = "",
  verbose = TRUE,
  write_outputs = "None",
  output_folder_path = "",
  network_filename = "",
  network_movement = "walk",
  success_metric = "mcc",
  use_initial_condition_uncertainty = FALSE,
  use_host_uncertainty = FALSE,
  weather_type = "deterministic",
  temperature_coefficient_sd_file = "",
  precipitation_coefficient_sd_file = "",
  dispersers_to_soils_percentage = 0,
  quarantine_directions = "",
  multiple_random_seeds = FALSE,
  file_random_seeds = NULL,
  use_soils = FALSE,
  soil_starting_pest_file = "",
  start_with_soil_populations = FALSE,
  county_level_infection_data = FALSE
)

Arguments

infected_years_file

Raster file with years of initial infection/infestation as individual locations of a pest or pathogen. This is a multiband raster file (e.g. .tif) with each band representing a unique time step (e.g. band 1 = year 1 .... band 6 = year 6 or band 1 = week 1 .... band 6 = week 6). This needs to align with both the time step selection and start and end dates selection. Units for infections are based on data availability and the way the units used for your host file creation (e.g. percent area, # of hosts per cell, etc.). This doesn't include the start year which passed in in the initial_infected_file (e.g. if we had observation data from 2017, 2018, and 2019 the 2017 raster file would be the initial_infected_file and a dual band raster file would have band 1 = 2018 and band 2 = 2019 observations)

number_of_observations

the number of observations used for this calibration. Useful if using previous calibration. This is used to weight the parameters when updating parameters when new data becomes available. Example if we have 2,000 observations in 2019 and had 1,000 observations in 2018 and 1,000 in 2017, we would use 2,000 here and 2,000 for our prior_number_of_observations.

prior_number_of_observations

the number of total observations from previous calibrations used to weight the posterior distributions (if this is a new calibration this value takes the form of a prior weight (0 - 1)). This is used to weight the parameters when updating parameters when new data becomes available. Example if we have 2,000 observations in 2019 and had 1,000 observations in 2018 and 1,000 in 2017, we would use 2,000 here and 2,000 for our number_of_observations.

prior_means

A vector of the means of your parameters you are estimating in order from (reproductive_rate, natural_dispersal_distance, percent_natural_dispersal, anthropogenic_dispersal_distance, natural kappa, and anthropogenic kappa). This is used when updating a parameter set from a previous calibration using the iterative framework.

prior_cov_matrix

A covariance matrix from the previous years posterior parameter estimation ordered from (reproductive_rate, natural_dispersal_distance, percent_natural_dispersal, anthropogenic_dispersal_distance, natural kappa, and anthropogenic kappa). This is used when updating a parameter set from a previous calibration using the iterative framework.

params_to_estimate

A list of booleans specifying which parameters to estimate ordered from (reproductive_rate, natural_dispersal_distance, percent_natural_dispersal, anthropogenic_dispersal_distance, natural kappa, and anthropogenic kappa)

number_of_generations

the number of generations to use to decrease the uncertainty in the parameter estimation (too many and it will take a long time, too few and your parameter sets will be too wide). This is an ABC implementation naming convention but should be set to greater than 7 for robust calibrations. There is a trade off between computational time and model accuracy the larger this number gets. Usually 7 to 9 is the ideal range.

generation_size

how many accepted parameter sets should occur in each generation. For example if generation size is 1,000 then the simulation runs until 1,000 model runs are less than the threshold value. We recommend running at least 1,000 but the greater this number the more accurate the model parameters selected will be.

pest_host_table

The file path to a csv that has these columns in this order: host, susceptibility_mean, susceptibility_sd, mortality_rate, mortality_rate_mean, and mortality_time_lag as columns with each row being the species. Host species must be in the same order in the host_file_list, infected_file_list, pest_host_table rows, and competency_table columns. The host column is character string of the species name, and is only used for metadata and labeling output files. Susceptibility and mortality_rate values must be between 0 and 1.

competency_table

A csv with the hosts as the first n columns (n being the number of hosts) and the last column being the competency value. Each row is a set of Boolean for host presence and the competency value (between 0 and 1) for that combination of hosts in a cell. #'

infected_file_list

paths to raster files with initial infections and standard deviation for each host can be based in 2 formats (a single file with number of hosts or a single file with 2 layers number of hosts and standard deviation).. Units for infections are based on data availability and the way the units used for your host file is created (e.g. percent area, # of hosts per cell, etc.).

host_file_list

paths to raster files with number of hosts and standard deviation on those estimates can be based in 2 formats (a single file with number of hosts or a single file with 2 layers number of hosts and standard deviation). The units for this can be of many formats the two most common that we use are either percent area (0 to 100) or # of hosts in the cell. Usually depends on data available and estimation methods.

total_populations_file

path to raster file with number of total populations of all hosts and non-hosts. This depends on how your host data is set up. If host is percent area then this should be a raster with values that are 100 anywhere with host. If host file is # of hosts in a cell then this should be a raster with values that are the max of the host raster any where the # of hosts is greater than 0.

temp

boolean that allows the use of temperature coefficients to modify spread (TRUE or FALSE)

temperature_coefficient_file

path to raster file with temperature coefficient data for the timestep and and time period specified (e.g. if timestep = week and start_date = 2017_01_01 and end_date = 2019_12_31 this file would have 52 * 3 bands = 156 bands with data being weekly precipitation coefficients). We convert raw precipitation values to coefficients that affect the reproduction and survival of the pest all values in the raster are between 0 and 1.

precip

boolean that allows the use of precipitation coefficients to modify spread (TRUE or FALSE)

precipitation_coefficient_file

Raster file with precipitation coefficient data for the timestep and time period specified (e.g. if timestep = week and start_date = 2017_01_01 and end_date = 2019_12_31 this file would have 52 * 3 bands = 156 bands with data being weekly precipitation coefficients). We convert raw precipitation values to coefficients that affect the reproduction and survival of the pest all values in the raster are between 0 and 1.

model_type

What type of model most represents your system. Options are "SEI" (Susceptible - Exposed - Infected/Infested) or "SI" (Susceptible - Infected/Infested). Default value is "SI".

latency_period

How many times steps does it take to for exposed populations become infected/infested. This is an integer value and must be greater than 0 if model type is SEI.

time_step

How often should spread occur options: ('day', 'week', 'month').

season_month_start

When does spread first start occurring in the year for your pest or pathogen (integer value between 1 and 12)

season_month_end

When does spread end during the year for your pest or pathogen (integer value between 1 and 12)

start_date

Date to start the simulation with format ('YYYY_MM_DD')

end_date

Date to end the simulation with format ('YYYY_MM_DD')

use_survival_rates

Boolean to indicate if the model will use survival rates to limit the survival or emergence of overwintering generations.

survival_rate_month

What month do over wintering generations emerge. We suggest using the month before for this parameter as it is when the survival rates raster will be applied.

survival_rate_day

What day should the survival rates be applied

survival_rates_file

Raster file with survival rates from 0 to 1 representing the percentage of emergence for a cell.

use_lethal_temperature

A boolean to answer the question: does your pest or pathogen have a temperature at which it cannot survive? (TRUE or FALSE)

temperature_file

Path to raster file with temperature data for minimum temperature

lethal_temperature

The temperature in degrees C at which lethal temperature related mortality occurs for your pest or pathogen (-50 to 60)

lethal_temperature_month

The month in which lethal temperature related mortality occurs for your pest or pathogen integer value between 1 and 12

mortality_frequency

Sets the frequency of mortality calculations occur either ('year', 'month', week', 'day', 'time step', or 'every_n_steps')

mortality_frequency_n

Sets number of units from mortality_frequency in which to run the mortality calculation if mortality_frequency is 'every_n_steps'. Must be an integer >= 1.

management

Boolean to allow use of management (TRUE or FALSE)

treatment_dates

Dates in which to apply treatment list with format ('YYYY_MM_DD') (needs to be the same length as treatment_file and pesticide_duration)

treatments_file

Path to raster files with treatment data by dates. Needs to be a list of files the same length as treatment_dates and pesticide_duration.

treatment_method

What method to use when applying treatment one of ("ratio" or "all infected"). ratio removes a portion of all infected and susceptibles, all infected removes all infected a portion of susceptibles.

natural_kernel_type

What type of dispersal kernel should be used for natural dispersal. Current dispersal kernel options are ('Cauchy', 'exponential', 'uniform', 'deterministic neighbor','power law', 'hyperbolic secant', 'gamma', 'weibull', 'logistic')

anthropogenic_kernel_type

What type of dispersal kernel should be used for anthropogenic dispersal. Current dispersal kernel options are ('cauchy', 'exponential', 'uniform', 'deterministic neighbor','power law', 'hyperbolic secant', 'gamma', 'weibull', 'logistic', 'network')

natural_dir

Sets the predominate direction of natural dispersal usually due to wind values ('N', 'NW', 'W', 'SW', 'S', 'SE', 'E', 'NE', 'NONE')

natural_kappa

sets the strength of the natural direction in the von-mises distribution numeric value between 0.01 and 12

anthropogenic_dir

Sets the predominate direction of anthropogenic dispersal usually due to human movement typically over long distances (e.g. nursery trade, movement of firewood, etc..) ('N', 'NW', 'W', 'SW', 'S', 'SE', 'E', 'NE', 'NONE')

anthropogenic_kappa

sets the strength of the anthropogenic direction in the von-mises distribution numeric value between 0.01 and 12

pesticide_duration

How long does the pesticide (herbicide, vaccine, etc..) last before the host is susceptible again. If value is 0 treatment is a culling (i.e. host removal) not a pesticide treatment. (needs to be the same length as treatment_dates and treatment_file)

pesticide_efficacy

How effective is the pesticide at preventing the disease or killing the pest (if this is 0.70 then when applied it successfully treats 70 percent of the plants or animals).

mask

Raster file used to provide a mask to remove 0's that are not true negatives from comparisons (e.g. mask out lakes and oceans from statics if modeling terrestrial species). A numerical value represents the area you want to calculate statistics on and an NA value represents the area to remove from the statistics.

output_frequency

Sets when outputs occur either ('year', 'month', week', 'day', 'time step', or 'every_n_steps')

output_frequency_n

Sets number of units from output_frequency in which to export model results if mortality_frequency is 'every_n_steps'. Must be an integer >= 1.

movements_file

This is a csv file with columns lon_from, lat_from, lon_to, lat_to, number of animals, and date.

use_movements

This is a boolean to turn on use of the movement module.

start_exposed

Do your initial conditions start as exposed or infected (only used if model_type is "SEI"). Default False. If this is TRUE need to have both infected_files (this can be a raster of all 0's) and exposed_files

generate_stochasticity

Boolean to indicate whether to use stochasticity in reproductive functions default is TRUE

establishment_stochasticity

Boolean to indicate whether to use stochasticity in establishment functions default is TRUE

movement_stochasticity

Boolean to indicate whether to use stochasticity in movement functions default is TRUE

dispersal_stochasticity

Boolean to indicate whether to use a stochasticity in the dispersal kernel default is TRUE

establishment_probability

Threshold to determine establishment if establishment_stochasticity is FALSE (range 0 to 1, default = 0.5)

dispersal_percentage

Percentage of dispersal used to calculate the bounding box for deterministic dispersal

quarantine_areas_file

Path to raster file with quarantine boundaries used in calculating likelihood of quarantine escape if use_quarantine is TRUE

use_quarantine

Boolean to indicate whether or not there is a quarantine area if TRUE must pass in a raster file indicating the quarantine areas (default = FALSE)

use_spreadrates

Boolean to indicate whether or not to calculate spread rates

use_overpopulation_movements

Boolean to indicate whether to use the overpopulation pest movement module (driven by the natural kernel with its scale parameter modified by a coefficient)

overpopulation_percentage

Percentage of occupied hosts when the cell is considered to be overpopulated

leaving_percentage

Percentage of pests leaving an overpopulated cell

leaving_scale_coefficient

Coefficient to multiply scale parameter of the natural kernel (if applicable)

calibration_method

choose which method of calibration to use either 'ABC' (Approximate Bayesian Computation) or 'MCMC' (Markov Chain Monte Carlo Approximation)

number_of_iterations

how many iterations do you want to run to allow the calibration to converge (recommend a minimum of at least 100,000 but preferably 1 million).

exposed_file_list

paths to raster files with initial exposeds and standard deviation for each host can be based in 2 formats (a single file with number of hosts or a single file with 2 layers number of hosts and standard deviation).. Units for infections are based on data availability and the way the units used for your host file is created (e.g. percent area, # of hosts per cell, etc.).

verbose

Boolean with true printing current status of calibration, (e.g. the current generation, current particle, and the acceptance rate). Defaults if FALSE.

write_outputs

Either c("summary_outputs", or "None"). If not "None" output folder path must be provided.

output_folder_path

this is the full path with either / or \ (e.g., "C:/user_name/desktop/pops_sod_2020_2023/outputs/")

network_filename

The entire file path for the network file. Used if anthropogenic_kernel_type = 'network'.

network_movement

What movement type do you want to use in the network kernel either "walk", "jump", or "teleport". "walk" allows dispersing units to leave the network at any cell along the edge. "jump" automatically moves to the nearest node when moving through the network. "teleport" moves from node to node most likely used for airport and seaport networks.

success_metric

Choose the success metric that is most relevant to your system or data for comparing simulations vs. observations. Must be one of "quantity", "allocation", "configuration", "quantity and allocation","quantity and configuration", "allocation and configuration", "quantity, allocation, and configuration", "accuracy", "precision", "recall", "specificity", "accuracy and precision", "accuracy and specificity", "accuracy and recall", "precision and recall", "precision and specificity", "recall and specificity", "accuracy, precision, and recall", "accuracy, precision, and specificity", "accuracy, recall, and specificity", "precision, recall, and specificity", "accuracy, precision, recall, and specificity", "rmse", "distance", "mcc", "mcc and quantity", "mcc and distance", "rmse and distance", "mcc and configuration", "mcc and RMSE", "mcc, quantity, and configuration"). Default is "mcc"

use_initial_condition_uncertainty

Boolean to indicate whether or not to propagate and partition uncertainty from initial conditions. If TRUE the infected_files needs to have 2 layers one with the mean value and one with the standard deviation. If an SEI model is used the exposed_file needs to have 2 layers one with the mean value and one with the standard deviation

use_host_uncertainty

Boolean to indicate whether or not to propagate and partition uncertainty from host data. If TRUE the host_file needs to have 2 layers one with the mean value and one with the standard deviation.

weather_type

string indicating how the weather data is passed in either as a mean and standard deviation to represent uncertainty ("probabilistic") or as a time series ("deterministic")

temperature_coefficient_sd_file

Raster file with temperature coefficient standard deviation data for the timestep and time period specified (e.g. if timestep = week this file would have 52 bands with data being weekly temperature coefficient standard deviations). We convert raw temperature values to coefficients that affect the reproduction and survival of the pest all values in the raster are between 0 and 1.

precipitation_coefficient_sd_file

Raster file with precipitation coefficient standard deviation data for the timestep and time period specified (e.g. if timestep = week this file would have 52 bands with data being weekly precipitation coefficient standard deviations). We convert raw precipitation values to coefficients that affect the reproduction and survival of the pest all values in the raster are between 0 and 1.

dispersers_to_soils_percentage

Range from 0 to 1 representing the percentage of dispersers that fall to the soil and survive.

quarantine_directions

String with comma separated directions to include in the quarantine direction analysis, e.g., 'N,E'. By default all directions (N, S, E, W) are considered

multiple_random_seeds

Boolean to indicate if the model should use multiple random seeds (allows for performing uncertainty partitioning) or a single random seed (backwards compatibility option). Default is FALSE.

file_random_seeds

A file path to the file with the .csv file containing random_seeds table. Use if you are trying to recreate an exact analysis otherwise we suggest leaving the default. Default is Null which draws the seed numbers for each.

use_soils

Boolean to indicate if pests establish in the soil and spread out from there. Typically used for soil borne pathogens.

soil_starting_pest_file

path to the raster file with the starting amount of pest or pathogen.

start_with_soil_populations

Boolean to indicate whether to use a starting soil pest or pathogen population if TRUE then soil_starting_pest_file is required.

county_level_infection_data

Boolean to indicate if infection data is at the county level. If TRUE then the infected_file should be a polygon raster with county level infection/infestation counts.

Value

a dataframe of the variables saved and their success metrics for each run