R/validate.R
validate.Rd
This function uses the quantity, allocation, and configuration disagreement to validate the model across the landscape using the parameters from the calibrate function. Ideally the model is calibrated with 2 or more years of data and validated for the last year or if you have 6 or more years of data then the model can be validated for the final 2 years.
validate(
infected_years_file,
number_of_iterations = 10,
number_of_cores = NA,
parameter_means,
parameter_cov_matrix,
pest_host_table,
competency_table,
infected_file_list,
host_file_list,
total_populations_file,
temp = FALSE,
temperature_coefficient_file = "",
precip = FALSE,
precipitation_coefficient_file = "",
model_type = "SI",
latency_period = 0,
time_step = "month",
season_month_start = 1,
season_month_end = 12,
start_date = "2008-01-01",
end_date = "2008-12-31",
use_survival_rates = FALSE,
survival_rate_month = 3,
survival_rate_day = 15,
survival_rates_file = "",
use_lethal_temperature = FALSE,
temperature_file = "",
lethal_temperature = -12.87,
lethal_temperature_month = 1,
mortality_frequency = "year",
mortality_frequency_n = 1,
management = FALSE,
treatment_dates = c(""),
treatments_file = "",
treatment_method = "ratio",
natural_kernel_type = "cauchy",
anthropogenic_kernel_type = "cauchy",
natural_dir = "NONE",
anthropogenic_dir = "NONE",
pesticide_duration = 0,
pesticide_efficacy = 1,
mask = NULL,
output_frequency = "year",
output_frequency_n = 1,
movements_file = "",
use_movements = FALSE,
start_exposed = FALSE,
generate_stochasticity = TRUE,
establishment_stochasticity = TRUE,
movement_stochasticity = TRUE,
dispersal_stochasticity = TRUE,
establishment_probability = 0.5,
dispersal_percentage = 0.99,
quarantine_areas_file = "",
use_quarantine = FALSE,
use_spreadrates = FALSE,
use_overpopulation_movements = FALSE,
overpopulation_percentage = 0,
leaving_percentage = 0,
leaving_scale_coefficient = 1,
exposed_file_list = "",
write_outputs = "None",
output_folder_path = "",
point_file = "",
network_filename = "",
network_movement = "walk",
use_distance = FALSE,
use_configuration = FALSE,
use_initial_condition_uncertainty = FALSE,
use_host_uncertainty = FALSE,
weather_type = "deterministic",
temperature_coefficient_sd_file = "",
precipitation_coefficient_sd_file = "",
dispersers_to_soils_percentage = 0,
quarantine_directions = "",
multiple_random_seeds = FALSE,
file_random_seeds = NULL,
use_soils = FALSE,
soil_starting_pest_file = "",
start_with_soil_populations = FALSE,
county_level_infection_data = FALSE
)
years of initial infection/infestation as individual locations of a pest or pathogen in raster format
how many iterations do you want to run to allow the calibration to converge at least 10
enter how many cores you want to use (default = NA). If not set uses the # of CPU cores - 1. must be an integer >= 1
the parameter means from the abc calibration function (posterior means)
the parameter covariance matrix from the ABC calibration function (posterior covariance matrix)
The file path to a csv that has these columns in this order: host, susceptibility_mean, susceptibility_sd, mortality_rate, mortality_rate_mean, and mortality_time_lag as columns with each row being the species. Host species must be in the same order in the host_file_list, infected_file_list, pest_host_table rows, and competency_table columns. The host column is character string of the species name, and is only used for metadata and labeling output files. Susceptibility and mortality_rate values must be between 0 and 1.
A csv with the hosts as the first n columns (n being the number of hosts) and the last column being the competency value. Each row is a set of Boolean for host presence and the competency value (between 0 and 1) for that combination of hosts in a cell. #'
paths to raster files with initial infections and standard deviation for each host can be based in 2 formats (a single file with number of hosts or a single file with 2 layers number of hosts and standard deviation).. Units for infections are based on data availability and the way the units used for your host file is created (e.g. percent area, # of hosts per cell, etc.).
paths to raster files with number of hosts and standard deviation on those estimates can be based in 2 formats (a single file with number of hosts or a single file with 2 layers number of hosts and standard deviation). The units for this can be of many formats the two most common that we use are either percent area (0 to 100) or # of hosts in the cell. Usually depends on data available and estimation methods.
path to raster file with number of total populations of all hosts and non-hosts. This depends on how your host data is set up. If host is percent area then this should be a raster with values that are 100 anywhere with host. If host file is # of hosts in a cell then this should be a raster with values that are the max of the host raster any where the # of hosts is greater than 0.
boolean that allows the use of temperature coefficients to modify spread (TRUE or FALSE)
path to raster file with temperature coefficient data for the timestep and and time period specified (e.g. if timestep = week and start_date = 2017_01_01 and end_date = 2019_12_31 this file would have 52 * 3 bands = 156 bands with data being weekly precipitation coefficients). We convert raw precipitation values to coefficients that affect the reproduction and survival of the pest all values in the raster are between 0 and 1.
boolean that allows the use of precipitation coefficients to modify spread (TRUE or FALSE)
Raster file with precipitation coefficient data for the timestep and time period specified (e.g. if timestep = week and start_date = 2017_01_01 and end_date = 2019_12_31 this file would have 52 * 3 bands = 156 bands with data being weekly precipitation coefficients). We convert raw precipitation values to coefficients that affect the reproduction and survival of the pest all values in the raster are between 0 and 1.
What type of model most represents your system. Options are "SEI" (Susceptible - Exposed - Infected/Infested) or "SI" (Susceptible - Infected/Infested). Default value is "SI".
How many times steps does it take to for exposed populations become infected/infested. This is an integer value and must be greater than 0 if model type is SEI.
How often should spread occur options: ('day', 'week', 'month').
When does spread first start occurring in the year for your pest or pathogen (integer value between 1 and 12)
When does spread end during the year for your pest or pathogen (integer value between 1 and 12)
Date to start the simulation with format ('YYYY_MM_DD')
Date to end the simulation with format ('YYYY_MM_DD')
Boolean to indicate if the model will use survival rates to limit the survival or emergence of overwintering generations.
What month do over wintering generations emerge. We suggest using the month before for this parameter as it is when the survival rates raster will be applied.
What day should the survival rates be applied
Raster file with survival rates from 0 to 1 representing the percentage of emergence for a cell.
A boolean to answer the question: does your pest or pathogen have a temperature at which it cannot survive? (TRUE or FALSE)
Path to raster file with temperature data for minimum temperature
The temperature in degrees C at which lethal temperature related mortality occurs for your pest or pathogen (-50 to 60)
The month in which lethal temperature related mortality occurs for your pest or pathogen integer value between 1 and 12
Sets the frequency of mortality calculations occur either ('year', 'month', week', 'day', 'time step', or 'every_n_steps')
Sets number of units from mortality_frequency in which to run the mortality calculation if mortality_frequency is 'every_n_steps'. Must be an integer >= 1.
Boolean to allow use of management (TRUE or FALSE)
Dates in which to apply treatment list with format ('YYYY_MM_DD') (needs to be the same length as treatment_file and pesticide_duration)
Path to raster files with treatment data by dates. Needs to be a list of files the same length as treatment_dates and pesticide_duration.
What method to use when applying treatment one of ("ratio" or "all infected"). ratio removes a portion of all infected and susceptibles, all infected removes all infected a portion of susceptibles.
What type of dispersal kernel should be used for natural dispersal. Current dispersal kernel options are ('Cauchy', 'exponential', 'uniform', 'deterministic neighbor','power law', 'hyperbolic secant', 'gamma', 'weibull', 'logistic')
What type of dispersal kernel should be used for anthropogenic dispersal. Current dispersal kernel options are ('cauchy', 'exponential', 'uniform', 'deterministic neighbor','power law', 'hyperbolic secant', 'gamma', 'weibull', 'logistic', 'network')
Sets the predominate direction of natural dispersal usually due to wind values ('N', 'NW', 'W', 'SW', 'S', 'SE', 'E', 'NE', 'NONE')
Sets the predominate direction of anthropogenic dispersal usually due to human movement typically over long distances (e.g. nursery trade, movement of firewood, etc..) ('N', 'NW', 'W', 'SW', 'S', 'SE', 'E', 'NE', 'NONE')
How long does the pesticide (herbicide, vaccine, etc..) last before the host is susceptible again. If value is 0 treatment is a culling (i.e. host removal) not a pesticide treatment. (needs to be the same length as treatment_dates and treatment_file)
How effective is the pesticide at preventing the disease or killing the pest (if this is 0.70 then when applied it successfully treats 70 percent of the plants or animals).
Raster file used to provide a mask to remove 0's that are not true negatives from comparisons (e.g. mask out lakes and oceans from statics if modeling terrestrial species).
Sets when outputs occur either ('year', 'month', week', 'day', 'time step', or 'every_n_steps')
Sets number of units from output_frequency in which to export model results if mortality_frequency is 'every_n_steps'. Must be an integer >= 1.
This is a csv file with columns lon_from, lat_from, lon_to, lat_to, number of animals, and date.
This is a boolean to turn on use of the movement module.
Do your initial conditions start as exposed or infected (only used if model_type is "SEI"). Default False. If this is TRUE need to have both infected_files (this can be a raster of all 0's) and exposed_files
Boolean to indicate whether to use stochasticity in reproductive functions default is TRUE
Boolean to indicate whether to use stochasticity in establishment functions default is TRUE
Boolean to indicate whether to use stochasticity in movement functions default is TRUE
Boolean to indicate whether to use a stochasticity in the dispersal kernel default is TRUE
Threshold to determine establishment if establishment_stochasticity is FALSE (range 0 to 1, default = 0.5)
Percentage of dispersal used to calculate the bounding box for deterministic dispersal
Path to raster file with quarantine boundaries used in calculating likelihood of quarantine escape if use_quarantine is TRUE
Boolean to indicate whether or not there is a quarantine area if TRUE must pass in a raster file indicating the quarantine areas (default = FALSE)
Boolean to indicate whether or not to calculate spread rates
Boolean to indicate whether to use the overpopulation pest movement module (driven by the natural kernel with its scale parameter modified by a coefficient)
Percentage of occupied hosts when the cell is considered to be overpopulated
Percentage of pests leaving an overpopulated cell
Coefficient to multiply scale parameter of the natural kernel (if applicable)
paths to raster files with initial exposeds and standard deviation for each host can be based in 2 formats (a single file with number of hosts or a single file with 2 layers number of hosts and standard deviation).. Units for infections are based on data availability and the way the units used for your host file is created (e.g. percent area, # of hosts per cell, etc.).
Either c("summary_outputs", "all_simulations", or "None"). If not "None" output folder path must be provided.
this is the full path with either / or \ (e.g., "C:/user_name/desktop/pops_sod_2020_2023/outputs/")
file for point comparison if not provided skips calculations
The entire file path for the network file. Used if anthropogenic_kernel_type = 'network'.
What movement type do you want to use in the network kernel either "walk", "jump", or "teleport". "walk" allows dispersing units to leave the network at any cell along the edge. "jump" automatically moves to the nearest node when moving through the network. "teleport" moves from node to node most likely used for airport and seaport networks.
Boolean if you want to compare distance between simulations and observations. Default is FALSE.
Boolean if you want to use configuration disagreement for comparing model runs. Default is FALSE.
Boolean to indicate whether or not to propagate and partition uncertainty from initial conditions. If TRUE the infected_files needs to have 2 layers one with the mean value and one with the standard deviation. If an SEI model is used the exposed_file needs to have 2 layers one with the mean value and one with the standard deviation
Boolean to indicate whether or not to propagate and partition uncertainty from host data. If TRUE the host_file needs to have 2 layers one with the mean value and one with the standard deviation.
string indicating how the weather data is passed in either as a mean and standard deviation to represent uncertainty ("probabilistic") or as a time series ("deterministic")
Raster file with temperature coefficient standard deviation data for the timestep and time period specified (e.g. if timestep = week this file would have 52 bands with data being weekly temperature coefficient standard deviations). We convert raw temperature values to coefficients that affect the reproduction and survival of the pest all values in the raster are between 0 and 1.
Raster file with precipitation coefficient standard deviation data for the timestep and time period specified (e.g. if timestep = week this file would have 52 bands with data being weekly precipitation coefficient standard deviations). We convert raw precipitation values to coefficients that affect the reproduction and survival of the pest all values in the raster are between 0 and 1.
Range from 0 to 1 representing the percentage of dispersers that fall to the soil and survive.
String with comma separated directions to include in the quarantine direction analysis, e.g., 'N,E'. By default all directions (N, S, E, W) are considered
Boolean to indicate if the model should use multiple random seeds (allows for performing uncertainty partitioning) or a single random seed (backwards compatibility option). Default is FALSE.
A file path to the file with the .csv file containing random_seeds table. Use if you are trying to recreate an exact analysis otherwise we suggest leaving the default. Default is Null which draws the seed numbers for each.
Boolean to indicate if pests establish in the soil and spread out from there. Typically used for soil borne pathogens.
path to the raster file with the starting amount of pest or pathogen.
Boolean to indicate whether to use a starting soil pest or pathogen population if TRUE then soil_starting_pest_file is required.
Boolean to indicate if infection data is at the county level. If TRUE then the infected_file should be a polygon raster with county level infection/infestation counts.
a data frame of statistical measures of model performance.