Title: | Fit Dynamic Structural Equation Models |
---|---|
Description: | Applies dynamic structural equation models to time-series data with generic and simplified specification for simultaneous and lagged effects. Methods are described in Thorson et al. (2024) "Dynamic structural equation models synthesize ecosystem dynamics constrained by ecological mechanisms." |
Authors: | James Thorson [aut, cre] |
Maintainer: | James Thorson <[email protected]> |
License: | GPL-3 |
Version: | 1.3.0 |
Built: | 2024-11-16 02:51:22 UTC |
Source: | https://github.com/james-thorson-noaa/dsem |
Convert dsem to phylopath output
as_fitted_DAG( fit, lag = 0, what = c("Estimate", "Std_Error", "p_value"), direction = 1 )
as_fitted_DAG( fit, lag = 0, what = c("Estimate", "Std_Error", "p_value"), direction = 1 )
fit |
Output from |
lag |
which lag to output |
what |
whether to output estimates |
direction |
whether to include one-sided arrows |
Convert output to format supplied by est_DAG
Convert output from package dsem to sem
as_sem(object, lag = 0)
as_sem(object, lag = 0)
object |
Output from |
lag |
what lag to extract and visualize |
Convert output to format supplied by sem
Data used to demonstrate and test ecosystem synthesis
data(bering_sea)
data(bering_sea)
classify_variables
is copied from sem:::classifyVariables
classify_variables(model)
classify_variables(model)
model |
SEM model |
Copied from package 'sem' under licence GPL (>= 2) with permission from John Fox
Tagged-list defining exogenous and endogenous variables
Fits a dynamic structural equation model
dsem( sem, tsdata, family = rep("fixed", ncol(tsdata)), estimate_delta0 = FALSE, control = dsem_control(), covs = colnames(tsdata) )
dsem( sem, tsdata, family = rep("fixed", ncol(tsdata)), estimate_delta0 = FALSE, control = dsem_control(), covs = colnames(tsdata) )
sem |
Specification for time-series structural equation model structure
including lagged or simultaneous effects. See Details section in
|
tsdata |
time-series data, as outputted using |
family |
Character-vector listing the distribution used for each column of |
estimate_delta0 |
Boolean indicating whether to estimate deviations from equilibrium in initial year as fixed effects, or alternatively to assume that dynamics start at some stochastic draw away from the stationary distribution |
control |
Output from |
covs |
optional: a character vector of one or more elements, with each element giving a string of variable names, separated by commas. Variances and covariances among all variables in each such string are added to the model. Warning: covs="x1, x2" and covs=c("x1", "x2") are not equivalent: covs="x1, x2" specifies the variance of x1, the variance of x2, and their covariance, while covs=c("x1", "x2") specifies the variance of x1 and the variance of x2 but not their covariance. These same covariances can be added manually via argument 'sem', but using argument 'covs' might save time for models with many variables. |
A DSEM involves (at a minimum):
a matrix where column
for variable c is
a time-series;
a user-supplied specification for the path coefficients, which
define the precision (inverse covariance) for a matrix of state-variables
and see
make_dsem_ram
for more details on the math involved.
The model also estimates the time-series mean for each variable.
The mean and precision matrix therefore define a Gaussian Markov random field for
:
Users can the specify
a distribution for measurement errors (or assume that variables are measured without error) using
argument family
. This defines the link-function and distribution
for each time-series
:
dsem
then estimates all specified coefficients, time-series means , and distribution
measurement errors
via maximizing a log-marginal likelihood, while
also estimating state-variables
.
summary.dsem
then assembles estimates and standard errors in an easy-to-read format.
Standard errors for fixed effects (path coefficients, exogenoux variance parameters, and measurement error parameters)
are estimated from the matrix of second derivatives of the log-marginal likelihod,
and standard errors for random effects (i.e., missing or state-space variables) are estimated
from a generalization of this method (see sdreport
for details).
An object (list) of class 'dsem'. Elements include:
TMB object from MakeADFun
RAM parsed by make_dsem_ram
SEM structure parsed by make_dsem_ram
as intermediate description of model linkages
The list of inputs passed to MakeADFun
The output from nlminb
The output from sdreport
Objects useful for package function, i.e., all arguments passed during the call
**Introducing the package, its features, and comparison with other software (to cite when using dsem):**
Thorson, J. T., Andrews, A., Essington, T., Large, S. (In review). Dynamic structural equation models synthesize ecosystem dynamics constrained by ecological mechanisms.
# Define model sem = " # Link, lag, param_name cprofits -> consumption, 0, a1 cprofits -> consumption, 1, a2 pwage -> consumption, 0, a3 gwage -> consumption, 0, a3 cprofits -> invest, 0, b1 cprofits -> invest, 1, b2 capital -> invest, 0, b3 gnp -> pwage, 0, c2 gnp -> pwage, 1, c3 time -> pwage, 0, c1 " # Load data data(KleinI, package="AER") TS = ts(data.frame(KleinI, "time"=time(KleinI) - 1931)) tsdata = TS[,c("time","gnp","pwage","cprofits",'consumption', "gwage","invest","capital")] # Fit model fit = dsem( sem=sem, tsdata = tsdata, estimate_delta0 = TRUE, control = dsem_control(quiet=TRUE) ) summary( fit ) plot( fit ) plot( fit, edge_label="value" )
# Define model sem = " # Link, lag, param_name cprofits -> consumption, 0, a1 cprofits -> consumption, 1, a2 pwage -> consumption, 0, a3 gwage -> consumption, 0, a3 cprofits -> invest, 0, b1 cprofits -> invest, 1, b2 capital -> invest, 0, b3 gnp -> pwage, 0, c2 gnp -> pwage, 1, c3 time -> pwage, 0, c1 " # Load data data(KleinI, package="AER") TS = ts(data.frame(KleinI, "time"=time(KleinI) - 1931)) tsdata = TS[,c("time","gnp","pwage","cprofits",'consumption', "gwage","invest","capital")] # Fit model fit = dsem( sem=sem, tsdata = tsdata, estimate_delta0 = TRUE, control = dsem_control(quiet=TRUE) ) summary( fit ) plot( fit ) plot( fit, edge_label="value" )
Define a list of control parameters. Note that
the format of this input is likely to change more rapidly than that of
dsem
dsem_control( nlminb_loops = 1, newton_loops = 1, trace = 0, eval.max = 1000, iter.max = 1000, getsd = TRUE, quiet = FALSE, run_model = TRUE, gmrf_parameterization = c("separable", "projection"), constant_variance = c("conditional", "marginal", "diagonal"), use_REML = TRUE, profile = NULL, parameters = NULL, map = NULL, getJointPrecision = FALSE, extra_convergence_checks = TRUE )
dsem_control( nlminb_loops = 1, newton_loops = 1, trace = 0, eval.max = 1000, iter.max = 1000, getsd = TRUE, quiet = FALSE, run_model = TRUE, gmrf_parameterization = c("separable", "projection"), constant_variance = c("conditional", "marginal", "diagonal"), use_REML = TRUE, profile = NULL, parameters = NULL, map = NULL, getJointPrecision = FALSE, extra_convergence_checks = TRUE )
nlminb_loops |
Integer number of times to call |
newton_loops |
Integer number of Newton steps to do after running
|
trace |
Parameter values are printed every 'trace' iteration
for the outer optimizer. Passed to
'control' in |
eval.max |
Maximum number of evaluations of the objective function
allowed. Passed to 'control' in |
iter.max |
Maximum number of iterations allowed. Passed to 'control' in
|
getsd |
Boolean indicating whether to call |
quiet |
Boolean indicating whether to run model printing messages to terminal or not; |
run_model |
Boolean indicating whether to estimate parameters (the default), or instead to return the model inputs and compiled TMB object without running; |
gmrf_parameterization |
Parameterization to use for the Gaussian Markov random field, where the default 'separable' constructs a precision matrix that must be full rank, and the alternative 'projection' constructs a full-rank and IID precision for variables over time, and then projects this using the inverse-cholesky of the precision, where this projection can be rank-deficient. |
constant_variance |
Whether to specify a constant conditional variance
|
use_REML |
Boolean indicating whether to treat non-variance fixed effects as random, either to motigate bias in estimated variance parameters or improve efficiency for parameter estimation given correlated fixed and random effects |
profile |
Parameters to profile out of the likelihood (this subset will be appended to |
parameters |
list of fixed and random effects, e.g., as constructed by |
map |
list of fixed and mirrored parameters, constructed by |
getJointPrecision |
whether to get the joint precision matrix. Passed
to |
extra_convergence_checks |
Boolean indicating whether to run extra checks on model convergence. |
An S3 object of class "dsem_control" that specifies detailed model settings, allowing user specification while also specifying default values
Data used to demonstrate and test cross-lagged (vector autoregressive) models
data(isle_royale)
data(isle_royale)
Data extracted from file "Data_wolves_moose_Isle_Royale_June2019.csv" available at https://isleroyalewolf.org/data/data/home.html and obtained 2023-06-23. Reproduced with permission from John Vucetich, and generated by the Wolves and Moose of Isle Royale project.
Vucetich, JA and Peterson RO. 2012. The population biology of Isle Royale wolves and moose: an overview. https://www.isleroyalewolf.org
list_parameters
lists all fixed and random effects
list_parameters(Obj, verbose = TRUE)
list_parameters(Obj, verbose = TRUE)
Obj |
Compiled TMB object |
verbose |
Boolean, whether to print messages to terminal |
Tagged-list of fixed and random effects, returned invisibly and printed to screen
Extract the (marginal) log-likelihood of a dsem model
## S3 method for class 'dsem' logLik(object, ...)
## S3 method for class 'dsem' logLik(object, ...)
object |
Output from |
... |
Not used |
object of class logLik
with attributes
val |
log-likelihood |
df |
number of parameters |
Returns an object of class logLik. This has attributes
"df" (degrees of freedom) giving the number of (estimated) fixed effects
in the model, abd "val" (value) giving the marginal log-likelihood.
This class then allows AIC
to work as expected.
Make the text string for a dynamic factor analysis expressed using arrow-and-lag notation for DSEM.
make_dfa(variables, n_factors, factor_names = paste0("F", seq_len(n_factors)))
make_dfa(variables, n_factors, factor_names = paste0("F", seq_len(n_factors)))
variables |
Character string of variables (i.e., column names of |
n_factors |
Number of factors. |
factor_names |
Optional character-vector of factor names,
which must match NA columns in |
A text string to be passed to dsem
make_dsem_ram
converts SEM arrow notation to ram
describing SEM parameters
make_dsem_ram( sem, times, variables, covs = NULL, quiet = FALSE, remove_na = TRUE )
make_dsem_ram( sem, times, variables, covs = NULL, quiet = FALSE, remove_na = TRUE )
sem |
Specification for time-series structural equation model structure
including lagged or simultaneous effects. See Details section in
|
times |
A character vector listing the set of times in order |
variables |
A character vector listing the set of variables |
covs |
A character vector listing variables for which to estimate a standard deviation |
quiet |
Boolean indicating whether to print messages to terminal |
remove_na |
Boolean indicating whether to remove NA values from RAM (default) or not.
|
RAM specification using arrow-and-lag notation
Each line of the RAM specification for make_dsem_ram
consists of four (unquoted) entries,
separated by commas:
This is a simple formula, of the form
A -> B
or, equivalently, B <- A
for a regression
coefficient (i.e., a single-headed or directional arrow);
A <-> A
for a variance or A <-> B
for a covariance
(i.e., a double-headed or bidirectional arrow). Here, A
and
B
are variable names in the model. If a name does not correspond
to an observed variable, then it is assumed to be a latent variable.
Spaces can appear freely in an arrow specification, and
there can be any number of hyphens in the arrows, including zero: Thus,
e.g., A->B
, A --> B
, and A>B
are all legitimate
and equivalent.
An integer specifying whether the linkage
is simultaneous (lag=0
) or lagged (e.g., X -> Y, 1, XtoY
indicates that X in time T affects Y in time T+1), where
only one-headed arrows can be lagged. Using positive values to indicate lags
then matches the notational convention used in package dynlm.
The name of the regression coefficient, variance,
or covariance specified by the arrow. Assigning the same name to two or
more arrows results in an equality constraint. Specifying the parameter name
as NA
produces a fixed parameter.
start value for a free parameter or value of a fixed parameter.
If given as NA
(or simply omitted), the model is provide a default
starting value.
Lines may end in a comment following #. The function extends code copied from package 'sem' under licence GPL (>= 2) with permission from John Fox.
Simultaneous autoregressive process for simultaneous and lagged effects
This text then specifies linkages in a multivariate time-series model for variables
with dimensions
for
times and
variables.
make_dsem_ram
then parses this text to build a path matrix with
dimensions
, where element
represents the impact of
on
, where
and
. This path matrix defines a simultaneous equation
where is a matrix of exogenous errors with covariance
,
where
is the Cholesky of exogenous covariance. This
simultaneous autoregressive (SAR) process then results in
having covariance:
Usefully, computing the inverse-covariance (precision) matrix does not require inverting
:
Example: univariate first-order autoregressive model
This simultaneous autoregressive (SAR) process across variables and times
allows the user to specify both simutanous effects (effects among variables within
year ) and lagged effects (effects among variables among years
).
As one example, consider a univariate and first-order autoregressive process where
.
with independent errors. This is specified by passing
sem = "X -> X, 1, rho \n X <-> X, 0, sigma"
to make_dsem_ram
.
This is then parsed to a RAM:
heads | to | from | paarameter | start |
1 | 2 | 1 | 1 | <NA> |
1 | 3 | 2 | 1 | <NA> |
1 | 4 | 3 | 1 | <NA> |
2 | 1 | 1 | 2 | <NA> |
2 | 2 | 2 | 2 | <NA> |
2 | 3 | 3 | 2 | <NA> |
2 | 4 | 4 | 2 | <NA> |
Rows of this RAM where heads=1
are then interpreted to construct the path matrix , where column "from"
in the RAM indicates column number in the matrix, column "to" in the RAM indicates row number in the matrix:
While rows where heads=2
are interpreted to construct the Cholesky of exogenous covariance
and column "parameter" in the RAM associates each nonzero element of those
two matrices with an element of a vector of estimated parameters:
with two estimated parameters . This then results in covariance:
Which converges on the stationary covariance for an AR1 process for times :
except having a lower pointwise variance for the initial times, which arises as a "boundary effect".
Similarly, the arrow-and-lag notation can be used to specify a SAR representing a conventional structural equation model (SEM), cross-lagged (a.k.a. vector autoregressive) models (VAR), dynamic factor analysis (DFA), or many other time-series models.
A reticular action module (RAM) describing dependencies
# Univariate AR1 sem = " X -> X, 1, rho X <-> X, 0, sigma " make_dsem_ram( sem=sem, variables="X", times=1:4 ) # Univariate AR2 sem = " X -> X, 1, rho1 X -> X, 2, rho2 X <-> X, 0, sigma " make_dsem_ram( sem=sem, variables="X", times=1:4 ) # Bivariate VAR sem = " X -> X, 1, XtoX X -> Y, 1, XtoY Y -> X, 1, YtoX Y -> Y, 1, YtoY X <-> X, 0, sdX Y <-> Y, 0, sdY " make_dsem_ram( sem=sem, variables=c("X","Y"), times=1:4 ) # Dynamic factor analysis with one factor and two manifest variables # (specifies a random-walk for the factor, and miniscule residual SD) sem = " factor -> X, 0, loadings1 factor -> Y, 0, loadings2 factor -> factor, 1, NA, 1 X <-> X, 0, NA, 0.01 # Fix at negligible value Y <-> Y, 0, NA, 0.01 # Fix at negligible value " make_dsem_ram( sem=sem, variables=c("X","Y","factor"), times=1:4 ) # ARIMA(1,1,0) sem = " factor -> factor, 1, rho1 # AR1 component X -> X, 1, NA, 1 # Integrated component factor -> X, 0, NA, 1 X <-> X, 0, NA, 0.01 # Fix at negligible value " make_dsem_ram( sem=sem, variables=c("X","factor"), times=1:4 ) # ARIMA(0,0,1) sem = " factor -> X, 0, NA, 1 factor -> X, 1, rho1 # MA1 component X <-> X, 0, NA, 0.01 # Fix at negligible value " make_dsem_ram( sem=sem, variables=c("X","factor"), times=1:4 )
# Univariate AR1 sem = " X -> X, 1, rho X <-> X, 0, sigma " make_dsem_ram( sem=sem, variables="X", times=1:4 ) # Univariate AR2 sem = " X -> X, 1, rho1 X -> X, 2, rho2 X <-> X, 0, sigma " make_dsem_ram( sem=sem, variables="X", times=1:4 ) # Bivariate VAR sem = " X -> X, 1, XtoX X -> Y, 1, XtoY Y -> X, 1, YtoX Y -> Y, 1, YtoY X <-> X, 0, sdX Y <-> Y, 0, sdY " make_dsem_ram( sem=sem, variables=c("X","Y"), times=1:4 ) # Dynamic factor analysis with one factor and two manifest variables # (specifies a random-walk for the factor, and miniscule residual SD) sem = " factor -> X, 0, loadings1 factor -> Y, 0, loadings2 factor -> factor, 1, NA, 1 X <-> X, 0, NA, 0.01 # Fix at negligible value Y <-> Y, 0, NA, 0.01 # Fix at negligible value " make_dsem_ram( sem=sem, variables=c("X","Y","factor"), times=1:4 ) # ARIMA(1,1,0) sem = " factor -> factor, 1, rho1 # AR1 component X -> X, 1, NA, 1 # Integrated component factor -> X, 0, NA, 1 X <-> X, 0, NA, 0.01 # Fix at negligible value " make_dsem_ram( sem=sem, variables=c("X","factor"), times=1:4 ) # ARIMA(0,0,1) sem = " factor -> X, 0, NA, 1 factor -> X, 1, rho1 # MA1 component X <-> X, 0, NA, 0.01 # Fix at negligible value " make_dsem_ram( sem=sem, variables=c("X","factor"), times=1:4 )
parse_path
is copied from sem::parse.path
parse_path(path)
parse_path(path)
path |
text to parse |
Copied from package 'sem' under licence GPL (>= 2) with permission from John Fox
Tagged-list defining variables and direction for a specified path coefficient
Plot from a fitted dsem
model
## S3 method for class 'dsem' plot(x, y, edge_label = c("name", "value"), digits = 2, ...)
## S3 method for class 'dsem' plot(x, y, edge_label = c("name", "value"), digits = 2, ...)
x |
Output from |
y |
Not used |
edge_label |
Whether to plot parameter names or estimated values |
digits |
integer indicating the number of decimal places to be used |
... |
arguments passed to |
This function coerces output from a graph and then plots the graph.
Invisibly returns the output from graph_from_data_frame
which was passed to plot.igraph
for plotting.
Predict variables given new (counterfactual) values of data, or for future or past times
## S3 method for class 'dsem' predict(object, newdata = NULL, type = c("link", "response"), ...)
## S3 method for class 'dsem' predict(object, newdata = NULL, type = c("link", "response"), ...)
object |
Output from |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the fitted data are used to create predictions. If desiring predictions after the fitted data, the user must append rows with NAs for those future times. Similarly, if desiring predictions given counterfactual values for time-series data, then those individual observations can be edited while keeping other observations at their original fitted values. |
type |
the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a Poisson-distributed variable the default predictions are of log-intensity and type = "response" gives the predicted intensity. |
... |
Not used |
A matrix of predicted values with dimensions and order corresponding to
argument newdata
is provided, or tsdata
if not.
Predictions are provided on either link or response scale, and
are generated by re-optimizing random effects condition on MLE
for fixed effects, given those new data.
Prints output from fitted dsem model
## S3 method for class 'dsem' print(x, ...)
## S3 method for class 'dsem' print(x, ...)
x |
Output from |
... |
Not used |
No return value, called to provide clean terminal output when calling fitted object in terminal.
Calculate deviance or response residuals for dsem
## S3 method for class 'dsem' residuals(object, type = c("deviance", "response"), ...)
## S3 method for class 'dsem' residuals(object, type = c("deviance", "response"), ...)
object |
Output from |
type |
which type of residuals to compute (only option is |
... |
Not used |
A matrix of residuals, with same order and dimensions as argument tsdata
that was passed to dsem
.
Data used to demonstrate and test trophic cascades options
data(sea_otter)
data(sea_otter)
Simulate from a fitted dsem
model
## S3 method for class 'dsem' simulate( object, nsim = 1, seed = NULL, variance = c("none", "random", "both"), resimulate_gmrf = FALSE, ... )
## S3 method for class 'dsem' simulate( object, nsim = 1, seed = NULL, variance = c("none", "random", "both"), resimulate_gmrf = FALSE, ... )
object |
Output from |
nsim |
number of simulated data sets |
seed |
random seed |
variance |
whether to ignore uncertainty in fixed and random effects, include estimation uncertainty in random effects, or include estimation uncertainty in both fixed and random effects |
resimulate_gmrf |
whether to resimulate the GMRF based on estimated or
simulated random effects (determined by argument |
... |
Not used |
This function conducts a parametric bootstrap, i.e., simulates new data conditional upon estimated values for fixed and random effects. The user can optionally simulate new random effects conditional upon their estimated covariance, or simulate new fixed and random effects conditional upon their imprecision.
Note that simulate
will have no effect on states x_tj
for which there
is a measurement and when those measurements are fitted using family="fixed"
, unless
resimulate_gmrf=TRUE
. In this latter case, the GMRF is resimulated given
estimated path coefficients
Simulated data, either from obj$simulate
where obj
is the compiled
TMB object, first simulating a new GMRF and then calling obj$simulate
.
summarize parameters from a fitted dynamic structural equation model
## S3 method for class 'dsem' summary(object, ...)
## S3 method for class 'dsem' summary(object, ...)
object |
Output from |
... |
Not used |
A DSEM is specified using "arrow and lag" notation, which specifies the set of
path coefficients and exogenous variance parameters to be estimated. Function dsem
then estimates the maximum likelihood value for those coefficients and parameters
by maximizing the log-marginal likelihood. Standard errors for parameters are calculated
from the matrix of second derivatives of this log-marginal likelihood (the "Hessian matrix").
However, many users will want to associate individual parameters and standard errors
with the path coefficients that were specified using the "arrow and lag" notation.
This task is complicated in
models where some path coefficients or variance parameters are specified to share a single value a priori,
or were assigned a name of NA and hence assumed to have a fixed value a priori (such that
these coefficients or parameters have an assigned value but no standard error).
The summary
function therefore compiles the MLE for coefficients (including duplicating
values for any path coefficients that assigned the same value) and standard error
estimates, and outputs those in a table that associates them with the user-supplied path and parameter names.
It also outputs the z-score and a p-value arising from a two-sided Wald test (i.e.
comparing the estimate divided by standard error against a standard normal distribution).
Returns a data.frame summarizing estimated path coefficients, containing columns:
The parsed path coefficient
The lag, where e.g. 1 means the predictor in time t effects the response in time t+1
Parameter name
Start value if supplied, and NA otherwise
Parameter number
Variable in path treated as predictor
Variable in path treated as response
Whether the path is one-headed or two-headed
Maximum likelihood estimate
Estimated standard error from the Hessian matrix
Estimate divided by Std_Error
P-value associated with z_value using a two-sided Wald test
TMBAIC
calculates AIC for a given model fit
TMBAIC(opt, k = 2, n = Inf)
TMBAIC(opt, k = 2, n = Inf)
opt |
the output from |
k |
the penalty on additional fixed effects (default=2, for AIC) |
n |
the sample size, for use in AICc calculation (default=Inf, for which AICc=AIC) |
AIC, where a parsimonious model has a AIC relative to other candidate models
extract the covariance of fixed effects, or both fixed and random effects.
## S3 method for class 'dsem' vcov(object, which = c("fixed", "random", "both"), ...)
## S3 method for class 'dsem' vcov(object, which = c("fixed", "random", "both"), ...)
object |
output from |
which |
whether to extract the covariance among fixed effects, random effects, or both |
... |
ignored, for method compatibility |
A square matrix containing the estimated covariances among the parameter estimates in the model.
The dimensions dependend upon the argument which
, to determine whether fixed, random effects,
or both are outputted.